Use the following guidelines and suggested steps to help resolve Kubernetes container restart issues.
- Condition:
- The Kubernetes container enters a restart loop.
- What it Means:
- The deployment resource limits are over utilized and must be increased.
- Corrective Action:
- Increase the deployment resource limits by performing the following steps on the Master Node or by using kube-config with deployment privileges.
- View the existing resource limits of the deployment and the container. Enter:
kubectl get deployment <deployment name> -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="<container name>")].resources.limits}'For example:kubectl get deployment common-operator -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].resources.limits}'Sample output:
{"cpu":"500m","ephemeral-storage":"10Gi","memory":"1Gi"} - Patch the resource limits of the deployment. Enter:
kubectl patch deployment <deployment name> -n ucp -p '{"spec": {"template":{"spec":{"containers": [{"name":"<container name>","resources":{"limits":{"cpu":"550m","ephemeral-storage":"11Gi","memory":"2Gi"}}}]}}}}'For example:#kubectl patch deployment common-operator -n ucp -p '{"spec": {"template":{"spec":{"containers": [{"name":"manager","resources":{"limits":{"cpu":"550m","memory":"2Gi"}}}]}}}}' - Verify if the resource limits were successfully patched. Enter:
kubectl get deployment <deployment name> -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="<container name>")].resources.limits}For example:kubectl get deployment common-operator -n ucp -o jsonpath='{.spec.template.spec.containers[?(@.name=="manager")].resources.limits}'Sample output:{"cpu":"550m","ephemeral-storage":"10Gi","memory":"2Gi"} - Verify that the patch was successfully applied. Enter:
kubectl get pods -n ucp -w | grep <deployment name>
For example:kubectl get pods -n ucp -w | grep common-operator
Sample output:[root@c79-20-157 helm-chart]# kubectl get pods -n ucp -w | grep common-operator common-operator-559848c45b-zgq2s 2/2 Running 0 96m common-operator-6b65ddc58b-6tz7p 0/2 Pending 0 0s common-operator-6b65ddc58b-6tz7p 0/2 Pending 0 0s common-operator-6b65ddc58b-6tz7p 0/2 ContainerCreating 0 1s common-operator-6b65ddc58b-6tz7p 1/2 Running 0 3s common-operator-6b65ddc58b-6tz7p 2/2 Running 0 11s common-operator-559848c45b-zgq2s 2/2 Terminating 0 96m common-operator-559848c45b-zgq2s 0/2 Terminating 0 96m common-operator-559848c45b-zgq2s 0/2 Terminating 0 96m common-operator-559848c45b-zgq2s 0/2 Terminating 0 96m
Note: The new pods created must be in the Running state and the old pods must be in the Terminated state.The following table lists the deployment names and their corresponding container names:
Deployment Name Container Name common-operator manager storage-operator storage-operator server-operator server-operator network-operator network-operator converged-operator converged-operator hypervisor-operator hypervisor-operator day0 discover day0 pcicardupgrade filemanager filemanager idm idm-container porcelain porcelain-container scp-server scp-server-container nfs-server nfs-server-container tasks tasks-container vcenter-remote-plugin rpcontainer ucpadvisor-kong (UCP Advisor) proxy
- View the existing resource limits of the deployment and the container. Enter: