Troubleshooting Kubernetes container restart issues caused by deployment resource limitations

Unified Compute Platform (UCP) Advisor Administration Guide

Version
4.5.x
Audience
anonymous
Part Number
MK-92UCP119-13
ft:lastEdition
2024-03-11

Use the following guidelines and suggested steps to help resolve Kubernetes container restart issues.

Condition:
The Kubernetes container enters a restart loop.
What it Means:
The deployment resource limits are over utilized and must be increased.
Corrective Action:
Increase the deployment resource limits by performing the following steps on the Master Node or by using kube-config with deployment privileges.
  1. View the existing resource limits of the deployment and the container. Enter:
    kubectl get deployment <deployment name> -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="<container name>")].resources.limits}'
    For example:
    kubectl get deployment common-operator -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].resources.limits}'

    Sample output:

    {"cpu":"500m","ephemeral-storage":"10Gi","memory":"1Gi"}
  2. Patch the resource limits of the deployment. Enter:
    kubectl patch deployment <deployment name> -n ucp -p '{"spec": {"template":{"spec":{"containers": [{"name":"<container name>","resources":{"limits":{"cpu":"550m","ephemeral-storage":"11Gi","memory":"2Gi"}}}]}}}}'
    For example:
    #kubectl patch deployment common-operator -n ucp -p '{"spec": {"template":{"spec":{"containers": [{"name":"manager","resources":{"limits":{"cpu":"550m","memory":"2Gi"}}}]}}}}'
  3. Verify if the resource limits were successfully patched. Enter:
    kubectl get deployment <deployment name> -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="<container name>")].resources.limits}
    For example:
    kubectl get deployment common-operator -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].resources.limits}'
    Sample output:
    {"cpu":"550m","ephemeral-storage":"10Gi","memory":"2Gi"}
  4. Verify that the patch was successfully applied. Enter:
    kubectl get pods -n ucp -w | grep <deployment name>
    For example:
    kubectl get pods -n ucp -w | grep common-operator
    Sample output:
    [root@c79-20-157 helm-chart]# kubectl get pods -n ucp -w | grep common-operator
    common-operator-559848c45b-zgq2s  2/2 Running             0 96m
    common-operator-6b65ddc58b-6tz7p  0/2 Pending             0 0s
    common-operator-6b65ddc58b-6tz7p  0/2 Pending             0 0s
    common-operator-6b65ddc58b-6tz7p  0/2 ContainerCreating   0 1s
    common-operator-6b65ddc58b-6tz7p  1/2 Running             0 3s
    common-operator-6b65ddc58b-6tz7p  2/2 Running             0 11s
    common-operator-559848c45b-zgq2s  2/2 Terminating         0 96m
    common-operator-559848c45b-zgq2s  0/2 Terminating         0 96m
    common-operator-559848c45b-zgq2s  0/2 Terminating         0 96m
    common-operator-559848c45b-zgq2s  0/2 Terminating         0 96m
    Note: The new pods created must be in the Running state and the old pods must be in the Terminated state.

    The following table lists the deployment names and their corresponding container names:

    Deployment Name Container Name
    common-operator manager
    storage-operator storage-operator
    server-operator server-operator
    network-operator network-operator
    converged-operator converged-operator
    hypervisor-operator hypervisor-operator
    day0 discover
    day0 pcicardupgrade
    filemanager filemanager
    idm idm-container
    porcelain porcelain-container
    scp-server scp-server-container
    nfs-server nfs-server-container
    tasks tasks-container
    vcenter-remote-plugin rpcontainer
    ucpadvisor-kong (UCP Advisor) proxy