X-Git-Url: https://gerrit.akraino.org/r/gitweb?a=blobdiff_plain;f=doc%2Ftroubleshooting.md;h=c440e8ca512516d668d4523b7519a8deca172ad5;hb=HEAD;hp=60c131e7a70ef8152d54c81e050f792cb11ce0cf;hpb=dd7088efd6a1cbdc3071dfd48944d15ccd4a3dac;p=icn.git diff --git a/doc/troubleshooting.md b/doc/troubleshooting.md index 60c131e..c440e8c 100644 --- a/doc/troubleshooting.md +++ b/doc/troubleshooting.md @@ -27,6 +27,10 @@ Examining the BareMetalHost resource of the failing machine and the logs of Bare Metal Operator and Ironic Pods may also provide a description of why the provisioning is failing. +A description of the BareMetalHost states can be found in the [Bare +Metal Operator +documentation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/baremetalhost-states.md). + ### openstack baremetal In rare cases, the Ironic and Bare Metal Operator information may get @@ -82,6 +86,22 @@ The general procedure (shown on the jump server) is: | uuid | 93366f0a-aa12-4815-b524-b95839bfa05d | +-----------------------+--------------------------------------+ +## Pod deployment fails due to Docker rate limits + +If a Pod fails to start and the Pod status (`kubectl describe pod +...`) shows that the Docker pull rate limit has been reached, it is +possible to point ICN to a [Docker registry +mirror](https://docs.docker.com/registry/recipes/mirror/). + +To enable the mirror on the jump server set `DOCKER_REGISTRY_MIRROR` +in `user_config.sh` before installing the jump server or following the +Docker's +[instructions](https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon) +to configure the daemon. + +To enable the mirror in the provisioned cluster, set the +`dockerRegistryMirrors` value of the cluster chart. + ## Helm release stuck in 'pending-install' If the HelmRelease status for a chart in the workload cluster shows @@ -99,3 +119,32 @@ this, Flux will complete reconcilation succesfully. Provisioning can take a fair amount of time, refer to [Monitoring progress](installation-guide.md#monitoring-progress) to see where the process is. + +A description of the BareMetalHost states can be found in the [Bare +Metal Operator +documentation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/baremetalhost-states.md). + +## BareMetalHost never transitions from Available to Provisioned + +If the BareMetalHost has an owner but is not transitioning from +Available to Provisioned, it is possible that the chart values are +misconfigured. Examine the capm3-controller-manager logs for error +messages: + + # kubectl -n capm3-system logs capm3-controller-manager-7db896996c-7dls7 | grep ^E + ... + E0512 18:00:24.781426 1 controller.go:304] controller/metal3data "msg"="Reconciler error" "error"="Failed to create secrets: Nic name not found ens5" "name"="icn-nodepool-0" "namespace"="metal3" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="Metal3Data" + +In the above instance, the NIC name in the chart values (`ens5`) was +incorrect and setting the correct name resolved the issue. + +## Vagrant destroy fails with `cannot undefine domain with nvram` + +The fix is to destroy each machine individually. For the default ICN +virtual machine deployment: + + vagrant destroy -f jump + virsh -c qemu:///system destroy vm-machine-1 + virsh -c qemu:///system undefine --nvram --remove-all-storage vm-machine-1 + virsh -c qemu:///system destroy vm-machine-2 + virsh -c qemu:///system undefine --nvram --remove-all-storage vm-machine-2