X-Git-Url: https://gerrit.akraino.org/r/gitweb?a=blobdiff_plain;f=doc%2Ftroubleshooting.md;h=c440e8ca512516d668d4523b7519a8deca172ad5;hb=HEAD;hp=6dc0011d4c694cbc0a2f3f1eb59af587b873b12f;hpb=641f56a07791b0a3eabd23c0a0696b7aa0cb675c;p=icn.git diff --git a/doc/troubleshooting.md b/doc/troubleshooting.md index 6dc0011..c440e8c 100644 --- a/doc/troubleshooting.md +++ b/doc/troubleshooting.md @@ -27,6 +27,10 @@ Examining the BareMetalHost resource of the failing machine and the logs of Bare Metal Operator and Ironic Pods may also provide a description of why the provisioning is failing. +A description of the BareMetalHost states can be found in the [Bare +Metal Operator +documentation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/baremetalhost-states.md). + ### openstack baremetal In rare cases, the Ironic and Bare Metal Operator information may get @@ -115,3 +119,32 @@ this, Flux will complete reconcilation succesfully. Provisioning can take a fair amount of time, refer to [Monitoring progress](installation-guide.md#monitoring-progress) to see where the process is. + +A description of the BareMetalHost states can be found in the [Bare +Metal Operator +documentation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/baremetalhost-states.md). + +## BareMetalHost never transitions from Available to Provisioned + +If the BareMetalHost has an owner but is not transitioning from +Available to Provisioned, it is possible that the chart values are +misconfigured. Examine the capm3-controller-manager logs for error +messages: + + # kubectl -n capm3-system logs capm3-controller-manager-7db896996c-7dls7 | grep ^E + ... + E0512 18:00:24.781426 1 controller.go:304] controller/metal3data "msg"="Reconciler error" "error"="Failed to create secrets: Nic name not found ens5" "name"="icn-nodepool-0" "namespace"="metal3" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="Metal3Data" + +In the above instance, the NIC name in the chart values (`ens5`) was +incorrect and setting the correct name resolved the issue. + +## Vagrant destroy fails with `cannot undefine domain with nvram` + +The fix is to destroy each machine individually. For the default ICN +virtual machine deployment: + + vagrant destroy -f jump + virsh -c qemu:///system destroy vm-machine-1 + virsh -c qemu:///system undefine --nvram --remove-all-storage vm-machine-1 + virsh -c qemu:///system destroy vm-machine-2 + virsh -c qemu:///system undefine --nvram --remove-all-storage vm-machine-2