Troubleshooting firmware upgrade timeout issues on HA800 G2/G3 series server models

Unified Compute Platform (UCP) Advisor Administration Guide

Version
4.6.x
Audience
anonymous
Part Number
MK-92UCP119-15
ft:lastEdition
2024-09-24

Use the following guidelines and suggested steps to resolve firmware upgrade timeout issues on HA800 G2/G3 series server models.

Condition:
On HA800 G2/G3 series server models, firmware upgrade times out and does not complete, resulting in the following error:
Unable to upgrade the firmware on server 172.25.92.106 because of upgrade_status (workFlowId: 102) is still NOT in ERROR/COMPLETED state after long waiting time. Check configurations.
What it Means:
The firmware upgrade on the server does not complete because the wait time limit has been reached.
Corrective Action:
Increase the wait time on the UCP Advisor VM.
  1. Log on to the UCP Advisor VM as ucpadmin.
  2. Open the infrastructure-operator-config file:
    kubectl edit cm infrastructure-operator-config -n ucp
  3. Under serverfirmware.properties, locate the retryNumberToWaitCompleteStateAfterUpgradeIOCard parameter.
    The default values are shown below:
    serverfirmware.properties: |
        retryIntervalToCheckPowerStateAfterSendPowerOffReq=60 # seconds
        retryNumbeToCheckPowerStateAfterSendPowerOffReq=2 # times
        waitTimeAfterSendPowerOnReq=6 # minutes
        retryIntervalToCheckPowerStateAfterSendPowerOnReq=30 # seconds
        retryNumbeToCheckPowerStateAfterSendPowerOnReq=20 # times
        retryIntervalToWaitCompleteStateAfterUpgradeIOCard=3 # minutes
        retryNumberToWaitCompleteStateAfterUpgradeIOCard=20 # times
        waitTimeAfterUpgradeIOCard=5 # minutes
  4. Modify the retryNumberToWaitCompleteStateAfterUpgradeIOCard value from 20 times to 25 times. This increases the wait time from 60 minutes to 75 minutes.
  5. Save the infrastructure-operator-config file.
  6. Retry the firmware upgrade operation.
    Note: Restarting the pod is not required.