Update documentation for Cluster-API and Flux 28/4628/1
authorTodd Malsbary <todd.malsbary@intel.com>
Wed, 15 Dec 2021 23:33:39 +0000 (15:33 -0800)
committerTodd Malsbary <todd.malsbary@intel.com>
Sat, 22 Jan 2022 00:07:45 +0000 (16:07 -0800)
Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
Change-Id: I66eb020f8c91091bc240aa5eb01002280d1c0e6c

13 files changed:
README.md
doc/figure-3.png [moved from figure-3.png with 100% similarity]
doc/installation-guide.md [new file with mode: 0644]
doc/pod11-topology.odg [new file with mode: 0644]
doc/pod11-topology.png [new file with mode: 0644]
doc/quick-start.md [new file with mode: 0644]
doc/sw-diagram.odg [new file with mode: 0644]
doc/sw-diagram.png [new file with mode: 0644]
doc/troubleshooting.md [new file with mode: 0644]
figure-1.odg [deleted file]
figure-1.png [deleted file]
figure-2.odg [deleted file]
figure-2.png [deleted file]

index 4681e1d..62b4137 100644 (file)
--- a/README.md
+++ b/README.md
 # Introduction
-ICN strives to automate the process of installing the local cluster
-controller to the greatest degree possible – "zero touch
-installation". Once the jump server (Local Controller) is booted and
-the compute cluster-specific values are provided, the controller
-begins to inspect and provision the bare metal servers until the
-cluster is entirely configured. This document shows step-by-step how
-to configure the network and deployment architecture for the ICN
-blueprint.
 
-# License
-Apache license v2.0
-
-# Deployment Architecture
-The Local Controller is provisioned with the Cluster API controllers
-and the Metal3 infrastructure provider, which enable provisioning of
-bare metal servers. The controller has three network connections to
-the bare metal servers: network A connects bare metal servers, network
-B is a private network used for provisioning the bare metal servers
-and network C is the IPMI network, used for control during
-provisioning. In addition, the bare metal servers connect to the
-network D, the SRIOV network.
-
-![Figure 1](figure-1.png)*Figure 1: Deployment Architecture*
-
-- Net A -- Bare metal network, lab networking for ssh. It is used as
-  the control plane for K8s, used by OVN and Flannel for the overlay
-  networking.
-- Net B (internal network) -- Provisioning network used by Ironic to
-  do inspection.
-- Net C (internal network) -- IPMI LAN to do IPMI protocol for the OS
-  provisioning. The NICs support IPMI. The IP address should be
-  statically assigned via the IPMI tool or other means.
-- Net D (internal network) -- Data plane network for the Akraino
-  application. Using the SR-IOV networking and fiber cables.  Intel
-  25GB and 40GB FLV NICs.
-
-In some deployment models, you can combine Net C and Net A to be the
-same networks, but the developer should take care of IP address
-management between Net A and IPMI address of the server.
-
-Also note that the IPMI NIC may share the same RJ-45 jack with another
-one of the NICs.
-
-# Pre-installation Requirements
-There are two main components in ICN Infra Local Controller - Local
-Controller and K8s compute cluster.
-
-### Local Controller
-The Local Controller will reside in the jump server to run the Cluster
-API controllers with the Kubeadm bootstrap provider and Metal3
-infrastructure provider.
-
-### K8s Compute Cluster
-The K8s compute cluster will actually run the workloads and is
-installed on bare metal servers.
-
-## Hardware Requirements
-
-### Minimum Hardware Requirement
-All-in-one VM based deployment requires servers with at least 32 GB
-RAM and 32 CPUs.
-
-### Recommended Hardware Requirements
-Recommended hardware requirements are servers with 64GB Memory, 32
-CPUs and SRIOV network cards.
-
-## Software Prerequisites
-The jump server is required to be pre-installed with Ubuntu 18.04.
-
-## Database Prerequisites
-No prerequisites for ICN blueprint.
-
-## Other Installation Requirements
-
-### Jump Server Requirements
-
-#### Jump Server Hardware Requirements
-- Local Controller: at least three network interfaces.
-- Bare metal servers: four network interfaces, including one IPMI interface.
-- Four or more hubs, with cabling, to connect four networks.
-
-(Tested as below)
-Hostname | CPU Model | Memory | Storage | 1GbE: NIC#, VLAN, (Connected extreme 480 switch) | 10GbE: NIC# VLAN, Network (Connected with IZ1 switch)
----------|-----------|--------|---------|--------------------------------------------------|------------------------------------------------------
-jump0 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 |
-
-#### Jump Server Software Requirements
-ICN supports Ubuntu 18.04. The ICN blueprint installs all required
-software during `make jump_server`.
-
-### Network Requirements
-Please refer to figure 1 for all the network requirements of the ICN
-blueprint.
-
-Please make sure you have 3 distinguished networks - Net A, Net B and
-Net C as mentioned in figure 1. Local Controller uses the Net B and
-Net C to provision the bare metal servers to do the OS provisioning.
-
-### Bare Metal Server Requirements
-
-### K8s Compute Cluster
-
-#### Compute Server Hardware Requirements
-(Tested as below)
-Hostname | CPU Model | Memory | Storage | 1GbE: NIC#, VLAN, (Connected extreme 480 switch) | 10GbE: NIC# VLAN, Network (Connected with IZ1 switch)
----------|-----------|--------|---------|--------------------------------------------------|------------------------------------------------------
-node1 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
-node2 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
-node3 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
+ICN addresses the infrastructure orchestration needed to bring up a
+site using baremetal servers. It strives to automate the process of
+installing a jump server (Local Controller) to the greatest degree
+possible – "zero touch installation". Once the jump server is booted
+and the compute cluster-specific values are provided, the controller
+begins to inspect and provision the baremetal servers until the
+cluster is entirely configured.
 
-#### Compute Server Software Requirements
-The Local Controller will install all the software in compute servers
-from the OS to the software required to bring up the K8s cluster.
+# Table of Contents
+1. [Quick start](doc/quick-start.md)
+2. [Installation guide](doc/installation-guide.md)
+3. [Troubleshooting](doc/troubleshooting.md)
 
-### Execution Requirements (Bare Metal Only)
-The ICN blueprint checks all the precondition and execution
-requirements for bare metal.
+# Reporting a bug
 
-# Installation High-Level Overview
-Installation is two-step process:
-- Installation of the Local Controller.
-- Installation of a compute cluster.
-
-## Bare Metal Deployment Guide
-
-### Install Bare Metal Jump Server
-
-#### Creating the Settings Files
-
-##### Local Controller Network Configuration Reference
-The user will find the network configuration file named as
-"user_config.sh" in the ICN parent directory.
-
-`user_config.sh`
-``` shell
-#!/bin/bash
-
-#Ironic Metal3 settings for provisioning network (Net B)
-export IRONIC_INTERFACE="eno2"
-```
-
-#### Running
-After configuring the network configuration file, please run `make
-jump_server` from the ICN parent directory as shown below:
-
-``` shell
-root@jump0:# git clone "https://gerrit.akraino.org/r/icn"
-Cloning into 'icn'...
-remote: Counting objects: 69, done
-remote: Finding sources: 100% (69/69)
-remote: Total 4248 (delta 13), reused 4221 (delta 13)
-Receiving objects: 100% (4248/4248), 7.74 MiB | 21.84 MiB/s, done.
-Resolving deltas: 100% (1078/1078), done.
-root@jump0:# cd icn/
-root@jump0:# make jump_server
-```
-
-The following steps occurs once the `make jump_server` command is
-given.
-1. All the software required to run the bootstrap cluster is
-   downloaded and installed.
-2. K8s cluster to maintain the bootstrap cluster and all the servers
-   in the edge location is installed.
-3. Metal3 specific network configuration such as local DHCP server
-   networking for each edge location, Ironic networking for both
-   provisioning network and IPMI LAN network are identified and
-   created.
-4. The Cluster API controllers, bootstrap, and infrastructure
-   providers and configured and installed.
-5. The Flux controllers are installed.
-
-#### Creating a compute cluster
-A compute cluster is composed of installations of two types of Helm
-charts: machine and cluster. The specific installations of these Helm
-charts are defined in HelmRelease resources consumed by the Flux
-controllers in the jump server. The user is required to provide the
-machine and cluster specific values in the HelmRelease resources.
-
-##### Preconfiguration for the compute cluster in Jump Server
-The user is required to provide the IPMI information of the servers
-and the values of the compute cluster they connect to the Local
-Controller.
-
-If the baremetal network provides a DHCP server with gateway and DNS
-server information, and each server has identical hardware then a
-cluster template can be used. Otherwise these values must also be
-provided with the values for each server. Refer to the machine chart
-in icn/deploy/machine for more details. In the example below, no DHCP
-server is present in the baremetal network.
-
-> *NOTE:* To assist in the migration of R5 and earlier release's use
-> from `nodes.json` and the Provisioning resource to a site YAML, a
-> helper script is provided at `tools/migration/to_r6.sh`.
-
-`site.yaml`
-``` yaml
-apiVersion: v1
-kind: Namespace
-metadata:
-    name: metal3
----
-apiVersion: source.toolkit.fluxcd.io/v1beta1
-kind: GitRepository
-metadata:
-    name: icn
-    namespace: metal3
-spec:
-    gitImplementation: go-git
-    interval: 1m0s
-    ref:
-        branch: master
-    timeout: 20s
-    url: https://gerrit.akraino.org/r/icn
----
-apiVersion: helm.toolkit.fluxcd.io/v2beta1
-kind: HelmRelease
-metadata:
-    name: machine-node1
-    namespace: metal3
-spec:
-    interval: 5m
-    chart:
-        spec:
-            chart: deploy/machine
-            sourceRef:
-                kind: GitRepository
-                name: icn
-            interval: 1m
-    values:
-        machineName: node1
-        machineLabels:
-            machine: node1
-        bmcAddress: ipmi://10.10.110.11
-        bmcUsername: admin
-        bmcPassword: password
-        networks:
-            baremetal:
-                macAddress: 00:1e:67:fe:f4:19
-                type: ipv4
-                ipAddress: 10.10.110.21/24
-                gateway: 10.10.110.1
-                nameservers: ["8.8.8.8"]
-            provisioning:
-                macAddress: 00:1e:67:fe:f4:1a
-                type: ipv4_dhcp
-            sriov:
-                macAddress: 00:1e:67:f8:6a:41
-                type: ipv4
-                ipAddress: 10.10.113.3/24
----
-apiVersion: helm.toolkit.fluxcd.io/v2beta1
-kind: HelmRelease
-metadata:
-    name: machine-node2
-    namespace: metal3
-spec:
-    interval: 5m
-    chart:
-        spec:
-            chart: deploy/machine
-            sourceRef:
-                kind: GitRepository
-                name: icn
-            interval: 1m
-    values:
-        machineName: node2
-        machineLabels:
-            machine: node2
-        bmcAddress: ipmi://10.10.110.12
-        bmcUsername: admin
-        bmcPassword: password
-        networks:
-            baremetal:
-                macAddress: 00:1e:67:f1:5b:90
-                type: ipv4
-                ipAddress: 10.10.110.22/24
-                gateway: 10.10.110.1
-                nameservers: ["8.8.8.8"]
-            provisioning:
-                macAddress: 00:1e:67:f1:5b:91
-                type: ipv4_dhcp
-            sriov:
-                macAddress: 00:1e:67:f8:69:81
-                type: ipv4
-                ipAddress: 10.10.113.4/24
----
-apiVersion: helm.toolkit.fluxcd.io/v2beta1
-kind: HelmRelease
-metadata:
-    name: cluster-compute
-    namespace: metal3
-spec:
-    interval: 5m
-    chart:
-        spec:
-            chart: deploy/cluster
-            sourceRef:
-                kind: GitRepository
-                name: icn
-            interval: 1m
-    values:
-        clusterName: compute
-        controlPlaneEndpoint: 10.10.110.21
-        controlPlaneHostSelector:
-            matchLabels:
-                machine: node1
-        workersHostSelector:
-            matchLabels:
-                machine: node2
-        userData:
-            hashedPassword: $6$rounds=10000$PJLOBdyTv23pNp$9RpaAOcibbXUMvgJScKK2JRQioXW4XAVFMRKqgCB5jC4QmtAdbA70DU2jTcpAd6pRdEZIaWFjLCNQMBmiiL40.
-            sshAuthorizedKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrxu+fSrU51vgAO5zP5xWcTU8uLv4MkUZptE2m1BJE88JdQ80kz9DmUmq2AniMkVTy4pNeUW5PsmGJa+anN3MPM99CR9I37zRqy5i6rUDQgKjz8W12RauyeRMIBrbdy7AX1xasoTRnd6Ta47bP0egiFb+vUGnlTFhgfrbYfjbkJhVfVLCTgRw8Yj0NSK16YEyhYLbLXpix5udRpXSiFYIyAEWRCCsWJWljACr99P7EF82vCGI0UDGCCd/1upbUwZeTouD/FJBw9qppe6/1eaqRp7D36UYe3KzLpfHQNgm9AzwgYYZrD4tNN6QBMq/VUIuam0G1aLgG8IYRLs41HYkJ root@jump0
-        flux:
-            url: https://gerrit.akraino.org/r/icn
-            branch: master
-            path: ./deploy/site/cluster-icn
-```
-
-A brief overview of the values is below. Refer to the machine and
-cluster charts in deploy/machine and deploy/cluster respectively for
-more details.
-
-- *machineName*: This will be the hostname for the machine, once it is
-  provisioned by Metal3.
-- *bmcUsername*: BMC username required to be provided for Ironic.
-- *bmcPassword*: BMC password required to be provided for Ironic.
-- *bmcAddress*: BMC server IPMI LAN IP address.
-- *networks*: A dictionary of the networks used by ICN.  For more
-  information, refer to the *networkData* field of the BareMetalHost
-  resource definition.
-  - *macAddress*: The MAC address of the interface.
-  - *type*: The type of network, either dynamic ("ipv4_dhcp") or
-    static ("ipv4").
-  - *ipAddress*: Only valid for type "ipv4"; the IP address of the
-    interface.
-  - *gateway*: Only valid for type "ipv4"; the gateway of this
-    network.
-  - *nameservers*: Only valid for type "ipv4"; an array of DNS
-     servers.
-- *clusterName*: The name of the cluster.
-- *controlPlaneEndpoint*: The K8s control plane endpoint. This works
-  in cooperation with the *controlPlaneHostSelector* to ensure that it
-  addresses the control plane node.
-- *controlPlaneHostSelector*: A K8s match expression against labels on
-  the *BareMetalHost* machine resource (from the *machineLabels* value
-  of the machine Helm chart).  This will be used by Cluster API to
-  select machines for the control plane.
-- *workersHostSelector*: A K8s match expression selecting worker
-  machines.
-- *userData*: User data values to be provisioned into each machine in
-  the cluster.
-  - *hashedPassword*: The hashed password of the default user on each
-    machine.
-  - *sshAuthorizedKey*: An authorized public key of the *root* user on
-    each machine.
-- *flux*: An optional repository to continuously reconcile the created
-  K8s cluster against.
-
-#### Running
-After configuring the machine and cluster site values, the next steps
-are to encrypt the secrets contained in the file, commit the file to
-source control, and create the Flux resources on the jump server
-pointing to the committed files.
-
-1. Create a key protect the secrets in the values if one does not
-   already exist. The key created below will be named "site-secrets".
-
-``` shell
-root@jump0:# ./deploy/site/site.sh create-gpg-key site-secrets
-```
-
-2. Encrypt the secrets in the site values.
-
-``` shell
-root@jump0:# ./deploy/site/site.sh sops-encrypt-site site.yaml site-secrets
-```
-
-3. Commit the site.yaml and additional files (sops.pub.asc,
-   .sops.yaml) created by sops-encrypt-site to a Git repository. For
-   the purposes of the next step, site.yaml will be committed to a Git
-   repository hosted at URL, on the specified BRANCH, and at location
-   PATH inside the source tree.
-
-4. Create the Flux resources to deploy the resources described by the
-   repository in step 3. This creates a GitRepository resource
-   containing the URL and BRANCH to synchronize, a Secret resource
-   containing the private key used to decrypt the secrets in the site
-   values, and a Kustomization resource with the PATH to the site.yaml
-   file at the GitRepository.
-
-```shell
-root@jump0:# ./deploy/site/site.sh flux-create-site URL BRANCH PATH site-secrets
-```
-
-The progress of the deployment may be monitored in a number of ways:
-
-``` shell
-root@jump0:# kubectl -n metal3 get baremetalhost
-root@jump0:# kubectl -n metal3 get cluster compute
-root@jump0:# clusterctl -n metal3 describe cluster compute
-```
-
-When the control plane is ready, the kubeconfig can be obtained with
-clusterctl and used to access the compute cluster:
-
-``` shell
-root@jump0:# clusterctl -n metal3 get kubeconfig compute >compute-admin.conf
-root@jump0:# kubectl --kubeconfig=compute-admin.conf cluster-info
-```
-
-## Virtual Deployment Guide
-
-### Standard Deployment Overview
-![Figure 2](figure-2.png)*Figure 2: Virtual Deployment Architecture*
-
-Virtual deployment is used for the development environment using
-Vagrant to create VMs with PXE boot. No setting is required from the
-user to deploy the virtual deployment.
-
-### Snapshot Deployment Overview
-No snapshot is implemented in ICN R6.
-
-### Special Requirements for Virtual Deployment
-
-#### Install Jump Server
-Jump server is required to be installed with Ubuntu 18.04. This will
-install all the VMs and install the K8s clusters.
-
-#### Verifying the Setup - VMs
-To verify the virtual deployment, execute the following commands:
-``` shell
-$ vagrant up --no-parallel
-$ vagrant ssh jump
-vagrant@jump:~$ sudo su
-root@jump:/home/vagrant# cd /icn
-root@jump:/icn# make jump_server
-root@jump:/icn# make vm_cluster
-```
-`vagrant up --no-parallel` creates three VMs: vm-jump, vm-machine-1,
-and vm-machine-2, each with 16GB RAM and 8 vCPUs. `make jump_server`
-installs the jump server components into vm-jump, and `make
-vm_cluster` installs a K8s cluster on the vm-machine VMs using Cluster
-API. The cluster is configured to use Flux to bring up the cluster
-with all addons and plugins.
-
-# Verifying the Setup
-ICN blueprint checks all the setup in both bare metal and VM
-deployment. Verify script will first confirm that the cluster control
-plane is ready then run self tests of all addons and plugins.
-
-**Bare Metal Verifier**: Run the `make bm_verifer`, it will verify the
-bare-metal deployment.
-
-**Verifier**: Run the `make vm_verifier`, it will verify the virtual
-deployment.
-
-# Developer Guide and Troubleshooting
-For development uses the virtual deployment, it take up to 10 mins to
-bring up the virtual BMC VMs with PXE boot.
-
-## Utilization of Images
-No images provided in this ICN release.
-
-## Post-deployment Configuration
-No post-deployment configuration required in this ICN release.
-
-## Debugging Failures
-* For first time installation enable KVM console in the trial or lab
-  servers using Raritan console or use Intel web BMC console.
-
-  ![Figure 3](figure-3.png)
-* Deprovision state will result in Ironic agent sleeping before next
-  heartbeat - it is not an error. It results in bare metal server
-  without OS and installed with ramdisk.
-* Deprovision in Metal3 is not straight forward - Metal3 follows
-  various stages from provisioned, deprovisioning and ready. ICN
-  blueprint take care navigating the deprovisioning states and
-  removing the BareMetalHost (BMH) custom resouce in case of cleaning.
-* Manual BMH cleaning of BMH or force cleaning of BMH resource result
-  in hang state - use `make bmh_clean` to remove the BMH state.
-* Logs of Ironic, openstack baremetal command to see the state of the
-  server.
-* Logs of baremetal operator gives failure related to images or images
-  md5sum errors.
-* It is not possible to change the state from provision to deprovision
-  or deprovision to provision without completing that state. All the
-  issues are handled in ICN scripts.
-
-## Reporting a Bug
-Required Linux Foundation ID to launch bug in ICN:
-https://jira.akraino.org/projects/ICN/issues
-
-# Uninstall Guide
-
-## Bare Metal deployment
-The command `make clean_all` uninstalls all the components installed by
-`make install`
-* It de-provision all the servers provisioned and removes them from
-  Ironic database.
-* Baremetal operator is deleted followed by Ironic database and
-  container.
-* Network configuration such internal DHCP server, provisioning
-  interfaces and IPMI LAN interfaces are deleted.
-* It will reset the bootstrap cluster - K8s cluster is torn down in
-  the jump server and all the associated docker images are removed.
-* All software packages installed by `make jump_server` are removed,
-  such as Ironic, openstack utility tool, docker packages and basic
-  prerequisite packages.
-
-## Virtual deployment
-The command `vagrant destroy -f` uninstalls all the components for the
-virtual deployments.
-
-# Troubleshooting
-
-## Error Message Guide
-The error message is explicit, all messages are captured in log
-directory.
-
-# Maintenance
-
-## Blueprint Package Maintenance
-No packages are maintained in ICN.
-
-## Software maintenance
-Not applicable.
-
-## Hardware maintenance
-Not applicable.
-
-## BluePrint Deployment Maintenance
-Not applicable.
-
-# Frequently Asked Questions
-**How to setup IPMI?**
-
-First, make sure the IPMI tool is installed in your servers, if not
-install them using `apt install ipmitool`. Then, check for the
-ipmitool information of each servers using the command `ipmitool lan
-print 1`. If the above command doesn't show the IPMI information, then
-setup the IPMI static IP address using the following instructions:
-- Mostl easy way to set up IPMI topology in your lab setup is by
-  using IPMI tool.
-- Using IPMI tool -
-  https://www.thomas-krenn.com/en/wiki/Configuring_IPMI_under_Linux_using_ipmitool
-- IPMI information can be considered during the BIOS setting as well.
-
-**BMC web console URL is not working?**
-
-It is hard to find issues or reason. Check the ipmitool bmc info to
-find the issues, if the URL is not available.
-
-**No change in BMH state - provisioning state is for more than 40min?**
-
-Generally, Metal3 provision for bare metal takes 20 - 30 mins. Look at
-the Ironic logs and baremetal operator to look at the state of
-servers. Openstack baremetal node shows all state of the server right
-from power, storage.
-
-**Why provider network (baremetal network configuration) is required?**
-
-Generally, provider network DHCP servers in a lab provide the router
-and DNS server details. In some labs, there is no DHCP server or the
-DHCP server does not provide this information.
+Please report any issues found in the [ICN
+JIRA](https://jira.akraino.org/projects/ICN/issues).  A Linux
+Foundation ID must be created first.
 
 # License
+Apache license v2.0
 
 ```
 /*
-* Copyright 2019 Intel Corporation, Inc
+* Copyright 2019-2022 Intel Corporation, Inc
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@@ -583,7 +39,3 @@ DHCP server does not provide this information.
 * limitations under the License.
 */
 ```
-
-# References
-
-# Definitions, acronyms and abbreviations
similarity index 100%
rename from figure-3.png
rename to doc/figure-3.png
diff --git a/doc/installation-guide.md b/doc/installation-guide.md
new file mode 100644 (file)
index 0000000..f823b3d
--- /dev/null
@@ -0,0 +1,778 @@
+# Installation guide
+
+
+## Hardware
+
+
+### Overview
+
+Due to the almost limitless number of possible hardware
+configurations, this installation guide has chosen a concrete
+configuration to use in the examples that follow.
+
+The configuration contains the following three machines.
+
+<table id="orgf44d94a" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
+
+
+<colgroup>
+<col  class="org-left" />
+
+<col  class="org-right" />
+
+<col  class="org-left" />
+
+<col  class="org-left" />
+
+<col  class="org-left" />
+
+<col  class="org-left" />
+
+<col  class="org-left" />
+</colgroup>
+<thead>
+<tr>
+<th scope="col" class="org-left">Hostname</th>
+<th scope="col" class="org-right">CPU Model</th>
+<th scope="col" class="org-left">Memory</th>
+<th scope="col" class="org-left">Storage</th>
+<th scope="col" class="org-left">IPMI: IP/MAC, U/P</th>
+<th scope="col" class="org-left">1GbE: NIC#, IP, MAC, VLAN, Network</th>
+<th scope="col" class="org-left">10GbE: NIC#, IP, MAC, VLAN, Network</th>
+</tr>
+</thead>
+
+<tbody>
+<tr>
+<td class="org-left">pod11-node5</td>
+<td class="org-right">2xE5-2699</td>
+<td class="org-left">64GB</td>
+<td class="org-left">3TB (Sata)&lt;br/&gt;180 (SSD)</td>
+<td class="org-left">IF0: 10.10.110.15 00:1e:67:fc:ff:18&lt;br/&gt;U/P: root/root</td>
+<td class="org-left">IF0: 10.10.110.25 00:1e:67:fc:ff:16 VLAN 110&lt;br/&gt;IF1: 172.22.0.1 00:1e:67:fc:ff:17 VLAN 111</td>
+<td class="org-left">&#xa0;</td>
+</tr>
+
+
+<tr>
+<td class="org-left">pod11-node3</td>
+<td class="org-right">2xE5-2699</td>
+<td class="org-left">64GB</td>
+<td class="org-left">3TB (Sata)&lt;br/&gt;180 (SSD)</td>
+<td class="org-left">IF0: 10.10.110.13 00:1e:67:f1:5b:92&lt;br/&gt;U/P: root/root</td>
+<td class="org-left">IF0: 10.10.110.23 00:1e:67:f1:5b:90 VLAN 110&lt;br/&gt;IF1: 172.22.0.0/24 00:1e:67:f1:5b:91 VLAN 111</td>
+<td class="org-left">IF3: 10.10.113.4 00:1e:67:f8:69:81 VLAN 113</td>
+</tr>
+
+
+<tr>
+<td class="org-left">pod11-node2</td>
+<td class="org-right">2xE5-2699</td>
+<td class="org-left">64GB</td>
+<td class="org-left">3TB (Sata)&lt;br/&gt;180 (SSD)</td>
+<td class="org-left">IF0: 10.10.110.12 00:1e:67:fe:f4:1b&lt;br/&gt;U/P: root/root</td>
+<td class="org-left">IF0: 10.10.110.22 00:1e:67:fe:f4:19 VLAN 110&lt;br/&gt;IF1: 172.22.0.0/14 00:1e:67:fe:f4:1a VLAN 111</td>
+<td class="org-left">IF3: 10.10.113.3 00:1e:67:f8:6a:41 VLAN 113</td>
+</tr>
+</tbody>
+</table>
+
+`pod11-node5` will be the Local Controller or *jump server*. The other
+two machines will form a two-node K8s cluster.
+
+Recommended hardware requirements are servers with 64GB Memory, 32
+CPUs and SR-IOV network cards.
+
+The machines are connected in the following topology.
+
+![img](./pod11-topology.png "Topology")
+
+There are three networks required by ICN:
+
+-   The `baremetal` network, used as the control plane for K8s and for
+    overlay networking.
+-   The `provisioning` network, used during the infrastructure
+    provisioning (OS installation) phase.
+-   The `IPMI` network, also used during the infrastructure provisioning
+    phase.
+
+In this configuration, the IPMI and baremetal interfaces share the
+same port and network. Care has been taken to ensure that the IP
+addresses do not conflict between the two interfaces.
+
+There is an additional network connected to a high-speed switch:
+
+-   The `sriov` network, available for the application data plane.
+
+
+### Configuration
+
+#### Baseboard Management Controller (BMC) configuration
+
+The BMC IP address should be statically assigned using the machine's
+BMC tool or application.
+    
+To verify IPMI is configured correctly for each cluster machine, use
+ipmitool:
+
+    # ipmitool -I lanplus -H 10.10.110.13 -L ADMINISTRATOR -U root -R 7 -N 5 -P root power status
+    Chassis Power is on
+
+If the ipmitool output looks like the following, enable the *RMCP+
+Cipher Suite3 Configuration* using the machine's BMC tool or application.
+
+    # ipmitool -I lanplus -H 10.10.110.13 -L ADMINISTRATOR -U root -R 7 -N 5 -P root power status
+    Error in open session response message : insufficient resources for session
+    Error: Unable to establish IPMI v2 / RMCP+ session
+
+If the ipmitool output looks like the following, enable *IPMI over lan*
+using the machine's BMC tool or application.
+
+    # ipmitool -I lan -H 10.10.110.13 -L ADMINISTRATOR -U root -R 7 -N 5 -P root power status
+    Error: Unable to establish LAN session
+
+Additional information on ipmitool may be found at [Configuring IPMI
+under Linux using
+ipmitool](https://www.thomas-krenn.com/en/wiki/Configuring_IPMI_under_Linux_using_ipmitool).
+
+#### PXE Boot configuration
+
+Each cluster machine must be configured to PXE boot from the interface
+attached to the `provisioning` network.
+
+One method of verifying PXE boot is configured correctly is to access
+the remote console of the machine and observe the boot process. If
+the machine is not attempting PXE boot or it is attempting to PXE boot
+on the wrong interface, reboot the machine into the BIOS and select
+the correct interface in the boot options.
+
+Additional verification can be done on the jump server using the
+tcpdump tool. The following command looks for DHCP or TFTP traffic
+arriving on any interface. Replace `any` with the interface attached to
+the provisioning network to verify end-to-end connectivity between the
+jump server and cluster machine.
+
+    # tcpdump -i any port 67 or port 68 or port 69
+
+If tcpdump does not show any traffic, verify that the any switches are
+configured properly to forward PXE boot requests (i.e. VLAN
+configuration).
+
+
+## Jump server
+
+
+### Configure the jump server
+
+The jump server is required to be pre-installed with an OS. ICN
+supports Ubuntu 20.04.
+
+Before provisioning the jump server, first edit `user_config.sh` to
+provide the name of the interface connected to the provisioning
+network.
+
+    # ip --brief link show
+    ...
+    enp4s0f3         UP             00:1e:67:fc:ff:17 <BROADCAST,MULTICAST,UP,LOWER_UP>
+    ...
+    # cat user_config.sh
+    #!/usr/bin/env bash
+    export IRONIC_INTERFACE="enp4s0f3"
+
+
+### Install the jump server components
+
+    make jump_server
+
+
+### Uninstallation
+
+    make clean_jump_server
+
+
+## Compute clusters
+
+
+### Overview
+
+Before proceeding with the configuration, a basic understanding of the
+essential components used in ICN is required.
+
+![img](./sw-diagram.png "Software Overview")
+
+#### Flux
+
+[Flux](https://fluxcd.io/) is a tool for implementing GitOps workflows where infrastructure
+and application configuration is committed to source control and
+continuously deployed in a K8s cluster.
+    
+The important Flux resources ICN uses are:
+    
+-   GitRepository, which describes where configuration data is committed
+-   HelmRelease, which describes an installation of a Helm chart
+-   Kustomization, which describes application of K8s resources
+    customized with a kustomization file
+
+#### Cluster API (CAPI)
+
+[Cluster API](https://cluster-api.sigs.k8s.io/) provides declarative APIs and tooling for provisioning,
+upgrading, and operating K8s clusters.
+    
+There are a number of important CAPI resources that ICN uses. To ease
+deployment, ICN captures the resources into a Helm chart.
+
+#### Bare Metal Operator (BMO)
+
+Central to CAPI are infrastructure and bootstrap providers. There are
+pluggable components for configuring the OS and K8s installation
+respectively.
+    
+ICN uses the [Cluster API Provider Metal3 for Managed Bare Metal
+Hardware](https://github.com/metal3-io/cluster-api-provider-metal3) for infrastructure provisioning, which in turn depends on the
+[Metal3 Bare Metal Operator](https://github.com/metal3-io/baremetal-operator) to do the actual work. The Bare Metal
+Operator uses [Ironic](https://ironicbaremetal.org/) to execute the low-level provisioning tasks.
+    
+Similar to the CAPI resources that ICN uses, ICN captures the Bare
+Metal Operator resources it uses into a Helm chart.
+
+
+### Configuration
+
+> NOTE:/ To assist in the migration of R5 and earlier release's use from
+> nodes.json and the Provisioning resource to the site YAML described
+> below, a helper script is provided at tools/migration/to<sub>r6.sh</sub>.
+
+#### Define the compute cluster
+
+The first step in provisioning a site with ICN is to define the
+desired day-0 configuration of the workload clusters.
+    
+A [configuration](https://gerrit.akraino.org/r/gitweb?p=icn.git;a=tree;f=deploy/site/cluster-icn) containing all supported ICN components is available
+in the ICN repository. End-users may use this as a base and add or
+remove components as desired. Each YAML file in this configuration is
+one of the Flux resources described in the overview: GitRepository,
+HelmRelease, or Kustomization.
+
+#### Define the site
+
+A site definition is composed of BMO and CAPI resources, describing
+machines and clusters respectively. These resources are captured into
+the ICN machine and cluster Helm charts. Defining the site is
+therefore a matter of specifying the values needed by the charts.
+    
+##### Site-specific Considerations
+    
+Documentation for the machine chart may be found in its [values.yaml](https://gerrit.akraino.org/r/gitweb?p=icn.git;a=blob;f=deploy/machine/values.yaml),
+and documentation for the cluster chart may be found in its
+[values.yaml](https://gerrit.akraino.org/r/gitweb?p=icn.git;a=blob;f=deploy/cluster/values.yaml). Please review those for more information; what follows
+is some site-specific considerations to be aware of.
+        
+Note that there are a large number of ways to configure machines and
+especially clusters; for large scale deployments it may make sense to
+create custom charts to eliminate duplication in the values
+specification.
+        
+###### Control plane endpoint
+        
+The K8s control plane endpoint address must be provided to the cluster
+chart.
+            
+For a highly-available control plane, this would typically be a
+load-balanced virtual IP address. Configuration of an external load
+balancer is out of scope for this document. The chart also provides
+another mechanism to accomplish this using the VRRP protocol to assign
+the control plane endpoint among the selected control plane nodes; see
+the `keepalived` dictionary in the cluster chart values.
+            
+For a single control plane node with a static IP address, some care
+must be taken to ensure that CAPI chooses the correct machine to
+provision as the control plane node. To do this, add a label to the
+`machineLabels` dictionary in the machine chart and specify a K8s match
+expression in the `controlPlaneHostSelector` dictionary of the cluster
+chart. Once done, the IP address of the labeled and selected machine
+can be used as the control plane endpoint address.
+        
+###### Static or dynamic baremetal network IPAM
+        
+The cluster and machine charts support either static or dynamic IPAM
+in the baremetal network.
+            
+Dynamic IPAM is configured by specifying the `networks` dictionary in
+the cluster chart. At least two entries must be included, the
+`baremetal` and `provisioning` networks. Under each entry, provide the
+predictable network interface name as the value of `interface` key.
+            
+Note that this is in the cluster chart and therefore is in the form of
+a template for each machine used in the cluster. If the machines are
+sufficiently different such that the same interface name is not used
+on each machine, then the static approach below must be used instead.
+            
+Static IPAM is configured by specifying the `networks` dictionary in the
+machine chart. At least two entries must be included, the `baremetal`
+and `provisioning` networks. From the chart example values:
+            
+    networks:
+      baremetal:
+        macAddress: 00:1e:67:fe:f4:19
+        # type is either ipv4 or ipv4_dhcp
+        type: ipv4
+        # ipAddress is only valid for type ipv4
+        ipAddress: 10.10.110.21/24
+        # gateway is only valid for type ipv4
+        gateway: 10.10.110.1
+        # nameservers is an array of DNS servers; only valid for type ipv4
+        nameservers: ["8.8.8.8"]
+      provisioning:
+        macAddress: 00:1e:67:fe:f4:1a
+        type: ipv4_dhcp
+            
+The provisioning network must always be type `ipv4_dhcp`.
+            
+In either the static or dynamic case additional networks may be
+included, however the static assignment option for an individual
+network exists only when the machine chart approach is used.
+    
+##### Prerequisites
+    
+The first thing done is to create a `site.yaml` file containing a
+Namespace to hold the site resources and a GitRepository pointing to
+the ICN repository where the machine and cluster Helm charts are
+located.
+        
+Note that when definining multiple sites it is only necessary to apply
+the Namespace and GitRepository once on the jump server managing the
+sites.
+        
+    ---
+    apiVersion: v1
+    kind: Namespace
+    metadata:
+      name: metal3
+    ---
+    apiVersion: source.toolkit.fluxcd.io/v1beta1
+    kind: GitRepository
+    metadata:
+      name: icn
+      namespace: metal3
+    spec:
+      gitImplementation: go-git
+      interval: 1m0s
+      ref:
+        branch: master
+      timeout: 20s
+      url: https://gerrit.akraino.org/r/icn
+    
+##### Define a machine
+    
+Important values in machine definition include:
+        
+-   **machineName:** the host name of the machine
+-   **bmcAddress, bmcUsername, bmcPassword:** the bare metal controller
+    (e.g. IPMI) access values
+        
+Capture each machine's values into a HelmRelease in the site YAML:
+        
+    ---
+    apiVersion: helm.toolkit.fluxcd.io/v2beta1
+    kind: HelmRelease
+    metadata:
+      name: pod11-node2
+      namespace: metal3
+    spec:
+      interval: 5m
+      chart:
+        spec:
+          chart: deploy/machine
+          sourceRef:
+            kind: GitRepository
+            name: icn
+          interval: 1m
+      values:
+        machineName: pod11-node2
+        machineLabels:
+          machine: pod11-node2
+        bmcAddress: ipmi://10.10.110.12
+        bmcUsername: root
+        bmcPassword: root
+        networks:
+          baremetal:
+            macAddress: 00:1e:67:fe:f4:19
+            type: ipv4
+            ipAddress: 10.10.110.22/24
+            gateway: 10.10.110.1
+            nameservers:
+              - 8.8.8.8
+          provisioning:
+            macAddress: 00:1e:67:fe:f4:1a
+            type: ipv4_dhcp
+          private:
+            macAddress: 00:1e:67:f8:6a:40
+            type: ipv4
+            ipAddress: 10.10.112.3/24
+          storage:
+            macAddress: 00:1e:67:f8:6a:41
+            type: ipv4
+            ipAddress: 10.10.113.3/24
+    
+##### Define a cluster
+    
+Important values in cluster definition include:
+        
+-   **clusterName:** the name of the cluster
+-   **numControlPlaneMachines:** the number of control plane nodes
+-   **numWorkerMachines:** the number of worker nodes
+-   **controlPlaneEndpoint:** see [Site-specific Considerations](#site-specific-considerations) above
+-   **userData:** dictionary containing default username, password, and
+    authorized SSH key
+-   **flux:** dictionary containing location of day-0 configuration of
+    cluster; see [Define the compute cluster](#define-the-compute-cluster) above
+        
+Capture each cluster's values into a HelmRelease in the site YAML:
+        
+    ---
+    apiVersion: helm.toolkit.fluxcd.io/v2beta1
+    kind: HelmRelease
+    metadata:
+      name: cluster-icn
+      namespace: metal3
+    spec:
+      interval: 5m
+      chart:
+        spec:
+          chart: deploy/cluster
+          sourceRef:
+            kind: GitRepository
+            name: icn
+          interval: 1m
+      values:
+        clusterName: icn
+        clusterLabels:
+          site: pod11
+        controlPlaneEndpoint: 10.10.110.23
+        controlPlaneHostSelector:
+          matchLabels:
+            machine: pod11-node3
+        workersHostSelector:
+          matchLabels:
+            machine: pod11-node2
+        userData:
+          hashedPassword: $6$rounds=10000$PJLOBdyTv23pNp$9RpaAOcibbXUMvgJScKK2JRQioXW4XAVFMRKqgCB5jC4QmtAdbA70DU2jTcpAd6pRdEZIaWFjLCNQMBmiiL40.
+          sshAuthorizedKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwLj/ekRDjp354W8kcGLagjudjTBZO8qBffJ4mNb01EJueUbLvM8EwCv2zu9lFKHD+nGkc1fkB3RyCn5OqzQDTAIpp82nOHXtrbKAZPg2ob8BlfVAz34h5r1bG78lnMH1xk7HKNbf73h9yzUEKiyrd8DlhJcJrsOZTPuTdRrIm7jxScDJpHFjy8tGISNMcnBGrNS9ukaRLK+PiEfDpuRtw/gOEf58NXgu38BcNm4tYfacHYuZFUbNCqj9gKi3btZawgybICcqrNqF36E/XXMfCS1qxZ7j9xfKjxWFgD9gW/HkRtV6K11NZFEvaYBFBA9S/GhLtk9aY+EsztABthE0J root@pod11-node5
+        flux:
+          url: https://gerrit.akraino.org/r/icn
+          branch: master
+          path: ./deploy/site/cluster-icn
+    
+##### Encrypt secrets in site definition
+    
+This step is optional, but recommended to protect sensitive
+information stored in the site definition. The site script is
+configured to protect the `bmcPassword` and `hashedPassword` values.
+
+Use an existing GPG key pair or create a new one, then encrypt the
+secrets contained in the site YAML using site.sh. The public key and
+SOPS configuration is created in the site YAML directory; these may be
+used to encrypt (but not decrypt) future secrets.
+        
+    # ./deploy/site/site.sh create-gpg-key site-secrets-key
+    # ./deploy/site/site.sh sops-encrypt-site site.yaml site-secrets-key
+    
+##### Example site definitions
+    
+Refer to the [pod11 site.yaml](https://gerrit.akraino.org/r/gitweb?p=icn.git;a=blob;f=deploy/site/pod11/site.yaml) and the [vm site.yaml](https://gerrit.akraino.org/r/gitweb?p=icn.git;a=blob;f=deploy/site/vm/site.yaml) for complete
+examples of site definitions for a static and dynamic baremetal
+network respectively. These site definitions are for simple two
+machine clusters used in ICN testing.
+
+#### Inform the Flux controllers of the site definition
+
+The final step is inform the jump server Flux controllers of the site
+definition be creating three resources:
+    
+-   a GitRepository containing the location where the site definition is
+    committed
+-   a Secret holding the GPG private key used to encrypt the secrets in
+    the site definition
+-   a Kustomization referencing the GitRepository, Secret, and path in
+    the repository where the site definition is located
+    
+This may be done with the help of the `site.sh` script:
+    
+    # ./deploy/site/site.sh flux-create-site URL BRANCH PATH KEY_NAME
+
+
+<a id="org6324e82"></a>
+
+### Deployment
+
+#### Monitoring progress
+
+The overall status of the cluster deployment can be monitored with
+`clusterctl`.
+    
+    # clusterctl -n metal3 describe cluster icn
+    NAME                                                                READY  SEVERITY  REASON                           SINCE  MESSAGE
+    /icn                                                                False  Warning   ScalingUp                        4m14s  Scaling up control plane to 1 replicas (actual 0)
+    ├─ClusterInfrastructure - Metal3Cluster/icn
+    ├─ControlPlane - KubeadmControlPlane/icn                            False  Warning   ScalingUp                        4m14s  Scaling up control plane to 1 replicas (actual 0)
+    │ └─Machine/icn-9sp7z                                               False  Info      WaitingForInfrastructure         4m17s  1 of 2 completed
+    │   └─MachineInfrastructure - Metal3Machine/icn-controlplane-khtsk
+    └─Workers
+      └─MachineDeployment/icn                                           False  Warning   WaitingForAvailableMachines      4m49s  Minimum availability requires 1 replicas, current 0 available
+        └─Machine/icn-6b8dfc7f6f-tmgv7                                  False  Info      WaitingForInfrastructure         4m49s  0 of 2 completed
+          ├─BootstrapConfig - KubeadmConfig/icn-workers-79pl9           False  Info      WaitingForControlPlaneAvailable  4m19s
+          └─MachineInfrastructure - Metal3Machine/icn-workers-m7vb8
+    
+The status of OS provisioning can be monitored by inspecting the
+`BareMetalHost` resources.
+    
+    # kubectl -n metal3 get bmh
+    NAME          STATE        CONSUMER   ONLINE   ERROR   AGE
+    pod11-node2   inspecting              true             5m15s
+    pod11-node3   inspecting              true             5m14s
+    
+Once the OS is installed, the status of K8s provisioning can be
+monitored by logging into the machine using the credentials from the
+`userData` section of the site values and inspecting the cloud-init
+logs.
+    
+    root@pod11-node2:~# tail -f /var/log/cloud-init-output.log
+    ...
+    Cloud-init v. 21.4-0ubuntu1~20.04.1 running 'modules:final' at Wed, 05 Jan 2022 01:34:41 +0000. Up 131.66 seconds.
+    Cloud-init v. 21.4-0ubuntu1~20.04.1 finished at Wed, 05 Jan 2022 01:34:41 +0000. Datasource DataSourceConfigDrive [net,ver=2][source=/dev/sda2].  Up 132.02 seconds
+    
+Once the cluster's control plane is ready, its kubeconfig can be
+obtained with `clusterctl` and the status of the cluster can be
+monitored with `kubectl`.
+    
+    # clusterctl -n metal3 get kubeconfig icn >icn-admin.conf
+    # kubectl --kubeconfig=icn-admin.conf get pods -A
+    NAMESPACE     NAME                                                 READY   STATUS    RESTARTS   AGE
+    emco          db-emco-mongo-0                                      1/1     Running   0          15h
+    emco          emco-etcd-0                                          1/1     Running   0          15h
+    ...
+
+#### Examining the deployment process
+
+The deployment resources can be examined with the kubectl and helm
+tools.  The below example provides pointers to the resources in the
+jump server.
+    
+    # kubectl -n flux-system get GitRepository
+    NAME         URL                                READY   STATUS                                                              AGE
+    icn-master   https://gerrit.akraino.org/r/icn   True    Fetched revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50   16h
+        
+    # kubectl -n flux-system get Kustomization
+    NAME                    READY   STATUS                                                              AGE
+    icn-master-site-pod11   True    Applied revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50   7m4s
+        
+    # kubectl -n metal3 get GitRepository
+    NAME   URL                                READY   STATUS                                                              AGE
+    icn    https://gerrit.akraino.org/r/icn   True    Fetched revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50   7m22s
+        
+    # kubectl -n metal3 get HelmRelease
+    NAME          READY   STATUS                             AGE
+    cluster-icn   True    Release reconciliation succeeded   7m54s
+    pod11-node2   True    Release reconciliation succeeded   7m54s
+    pod11-node3   True    Release reconciliation succeeded   7m54s
+        
+    # kubectl -n metal3 get HelmChart
+    NAME                 CHART            VERSION   SOURCE KIND     SOURCE NAME   READY   STATUS                                 AGE
+    metal3-cluster-icn   deploy/cluster   *         GitRepository   icn           True    Fetched and packaged revision: 0.1.0   8m9s
+    metal3-pod11-node2   deploy/machine   *         GitRepository   icn           True    Fetched and packaged revision: 0.1.0   8m9s
+    metal3-pod11-node3   deploy/machine   *         GitRepository   icn           True    Fetched and packaged revision: 0.1.0   8m9s
+    
+    # helm -n metal3 ls
+    NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
+    cluster-icn     metal3          2               2022-01-05 01:03:51.075860871 +0000 UTC deployed        cluster-0.1.0
+    pod11-node2     metal3          2               2022-01-05 01:03:49.365432 +0000 UTC    deployed        machine-0.1.0
+    pod11-node3     metal3          2               2022-01-05 01:03:49.463726617 +0000 UTC deployed        machine-0.1.0
+    
+    # helm -n metal3 get values --all cluster-icn
+    COMPUTED VALUES:
+    clusterLabels:
+      provider: icn
+      site: pod11
+    clusterName: icn
+    cni: flannel
+    containerRuntime: containerd
+    containerdVersion: 1.4.11-1
+    controlPlaneEndpoint: 10.10.110.23
+    controlPlaneHostSelector:
+      matchLabels:
+        machine: pod11-node3
+    controlPlanePrefix: 24
+    dockerVersion: 5:20.10.10~3-0~ubuntu-focal
+    flux:
+      branch: master
+      path: ./deploy/site/cluster-icn
+      repositoryName: icn
+      url: https://gerrit.akraino.org/r/icn
+    imageName: focal-server-cloudimg-amd64.img
+    k8sVersion: v1.21.6
+    kubeVersion: 1.21.6-00
+    numControlPlaneMachines: 1
+    numWorkerMachines: 1
+    podCidr: 10.244.64.0/18
+    userData:
+      hashedPassword: $6$rounds=10000$bhRsNADLl$BzCcBaQ7Tle9AizUHcMKN2fygyPMqBebOuvhApI8B.pELWyFUaAWRasPOz.5Gf9bvCihakRnBTwsi217n2qQs1
+      name: ubuntu
+      sshAuthorizedKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwLj/ekRDjp354W8kcGLagjudjTBZO8qBffJ4mNb01EJueUbLvM8EwCv2zu9lFKHD+nGkc1fkB3RyCn5OqzQDTAIpp82nOHXtrbKAZPg2ob8BlfVAz34h5r1bG78lnMH1xk7HKNbf73h9yzUEKiyrd8DlhJcJrsOZTPuTdRrIm7jxScDJpHFjy8tGISNMcnBGrNS9ukaRLK+PiEfDpuRtw/gOEf58NXgu38BcNm4tYfacHYuZFUbNCqj9gKi3btZawgybICcqrNqF36E/XXMfCS1qxZ7j9xfKjxWFgD9gW/HkRtV6K11NZFEvaYBFBA9S/GhLtk9aY+EsztABthE0J root@pod11-node5
+    workersHostSelector:
+      matchLabels:
+        machine: pod11-node2
+        
+    # helm -n metal3 get values --all pod11-node2
+    COMPUTED VALUES:
+    bmcAddress: ipmi://10.10.110.12
+    bmcPassword: root
+    bmcUsername: root
+    machineLabels:
+      machine: pod11-node2
+    machineName: pod11-node2
+    networks:
+      baremetal:
+        gateway: 10.10.110.1
+        ipAddress: 10.10.110.22/24
+        macAddress: 00:1e:67:fe:f4:19
+        nameservers:
+        - 8.8.8.8
+        type: ipv4
+      provisioning:
+        macAddress: 00:1e:67:fe:f4:1a
+        type: ipv4_dhcp
+      sriov:
+        ipAddress: 10.10.113.3/24
+        macAddress: 00:1e:67:f8:6a:41
+        type: ipv4
+    
+    # helm -n metal3 get values --all pod11-node3
+    COMPUTED VALUES:
+    bmcAddress: ipmi://10.10.110.13
+    bmcPassword: root
+    bmcUsername: root
+    machineLabels:
+      machine: pod11-node3
+    machineName: pod11-node3
+    networks:
+      baremetal:
+        gateway: 10.10.110.1
+        ipAddress: 10.10.110.23/24
+        macAddress: 00:1e:67:f1:5b:90
+        nameservers:
+        - 8.8.8.8
+        type: ipv4
+      provisioning:
+        macAddress: 00:1e:67:f1:5b:91
+        type: ipv4_dhcp
+      sriov:
+        ipAddress: 10.10.113.4/24
+        macAddress: 00:1e:67:f8:69:81
+        type: ipv4
+
+Once the workload cluster is ready, the deployment resources may be
+examined similarly.
+
+    root@jump:/icn# clusterctl -n metal3 get kubeconfig icn >icn-admin.conf
+    root@pod11-node5:# kubectl --kubeconfig=icn-admin.conf get GitRepository -A
+    NAMESPACE     NAME   URL                                        READY   STATUS                                                                         AGE
+    emco          emco   https://github.com/open-ness/EMCO          True    Fetched revision: openness-21.03.06/18ec480f755119d54aa42c1bc3bd248dfd477165   16h
+    flux-system   icn    https://gerrit.akraino.org/r/icn           True    Fetched revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50              16h
+    kud           kud    https://gerrit.onap.org/r/multicloud/k8s   True    Fetched revision: master/8157bf63753839ce4e9006978816fad3f63ca2de              16h
+    
+    root@pod11-node5:# kubectl --kubeconfig=icn-admin.conf get Kustomization -A
+    NAMESPACE     NAME            READY   STATUS                                                              AGE
+    flux-system   icn-flux-sync   True    Applied revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50   16h
+    flux-system   kata            True    Applied revision: master/0e93643e74f26bfc062a81c2f05ad947550f8d50   16h
+    
+    root@pod11-node5:# kubectl --kubeconfig=icn-admin.conf get HelmRelease -A
+    NAMESPACE   NAME                     READY   STATUS                                                                             AGE
+    emco        db                       True    Release reconciliation succeeded                                                   16h
+    emco        monitor                  True    Release reconciliation succeeded                                                   16h
+    emco        podsecurity              True    Release reconciliation succeeded                                                   16h
+    emco        services                 True    Release reconciliation succeeded                                                   16h
+    emco        tools                    True    Release reconciliation succeeded                                                   16h
+    kud         cdi                      True    Release reconciliation succeeded                                                   16h
+    kud         cdi-operator             True    Release reconciliation succeeded                                                   16h
+    kud         cpu-manager              True    Release reconciliation succeeded                                                   16h
+    kud         kubevirt                 True    Release reconciliation succeeded                                                   16h
+    kud         kubevirt-operator        True    Release reconciliation succeeded                                                   16h
+    kud         multus-cni               True    Release reconciliation succeeded                                                   16h
+    kud         node-feature-discovery   True    Release reconciliation succeeded                                                   16h
+    kud         ovn4nfv                  True    Release reconciliation succeeded                                                   16h
+    kud         ovn4nfv-network          True    Release reconciliation succeeded                                                   16h
+    kud         podsecurity              True    Release reconciliation succeeded                                                   16h
+    kud         qat-device-plugin        True    Release reconciliation succeeded                                                   16h
+    kud         sriov-network            True    Release reconciliation succeeded                                                   16h
+    kud         sriov-network-operator   True    Release reconciliation succeeded                                                   16h
+    
+    root@pod11-node5:# kubectl --kubeconfig=icn-admin.conf get HelmChart -A
+    NAMESPACE     NAME                         CHART                                              VERSION   SOURCE KIND     SOURCE NAME   READY   STATUS                                 AGE
+    emco          emco-db                      deployments/helm/emcoOpenNESS/emco-db              *         GitRepository   emco          True    Fetched and packaged revision: 0.1.0   16h
+    emco          emco-monitor                 deployments/helm/monitor                           *         GitRepository   emco          True    Fetched and packaged revision: 0.1.0   16h
+    emco          emco-services                deployments/helm/emcoOpenNESS/emco-services        *         GitRepository   emco          True    Fetched and packaged revision: 0.1.0   16h
+    emco          emco-tools                   deployments/helm/emcoOpenNESS/emco-tools           *         GitRepository   emco          True    Fetched and packaged revision: 0.1.0   16h
+    flux-system   emco-podsecurity             deploy/podsecurity                                 *         GitRepository   icn           True    Fetched and packaged revision: 0.1.0   16h
+    flux-system   kud-podsecurity              deploy/podsecurity                                 *         GitRepository   icn           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-cdi                      kud/deployment_infra/helm/cdi                      *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-cdi-operator             kud/deployment_infra/helm/cdi-operator             *         GitRepository   kud           True    Fetched and packaged revision: 0.1.1   16h
+    kud           kud-cpu-manager              kud/deployment_infra/helm/cpu-manager              *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-kubevirt                 kud/deployment_infra/helm/kubevirt                 *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-kubevirt-operator        kud/deployment_infra/helm/kubevirt-operator        *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-multus-cni               kud/deployment_infra/helm/multus-cni               *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-node-feature-discovery   kud/deployment_infra/helm/node-feature-discovery   *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-ovn4nfv                  kud/deployment_infra/helm/ovn4nfv                  *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-ovn4nfv-network          kud/deployment_infra/helm/ovn4nfv-network          *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-qat-device-plugin        kud/deployment_infra/helm/qat-device-plugin        *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-sriov-network            kud/deployment_infra/helm/sriov-network            *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    kud           kud-sriov-network-operator   kud/deployment_infra/helm/sriov-network-operator   *         GitRepository   kud           True    Fetched and packaged revision: 0.1.0   16h
+    
+    root@pod11-node5:# helm --kubeconfig=icn-admin.conf ls -A
+    NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                                                                                                       APP VERSION
+    cdi                     kud             2               2022-01-05 01:54:28.39195226 +0000 UTC  deployed        cdi-0.1.0                                                                                                                   v1.34.1
+    cdi-operator            kud             2               2022-01-05 01:54:04.904465491 +0000 UTC deployed        cdi-operator-0.1.1                                                                                                          v1.34.1
+    cpu-manager             kud             2               2022-01-05 01:54:01.911819055 +0000 UTC deployed        cpu-manager-0.1.0                                                                                                           v1.4.1-no-taint
+    db                      emco            2               2022-01-05 01:53:36.096690949 +0000 UTC deployed        emco-db-0.1.0                                                                                                                
+    kubevirt                kud             2               2022-01-05 01:54:12.563840437 +0000 UTC deployed        kubevirt-0.1.0                                                                                                              v0.41.0
+    kubevirt-operator       kud             2               2022-01-05 01:53:59.190388299 +0000 UTC deployed        kubevirt-operator-0.1.0                                                                                                     v0.41.0
+    monitor                 emco            2               2022-01-05 01:53:36.085180458 +0000 UTC deployed        monitor-0.1.0                                                                                                               1.16.0
+    multus-cni              kud             2               2022-01-05 01:54:03.494462704 +0000 UTC deployed        multus-cni-0.1.0                                                                                                            v3.7
+    node-feature-discovery  kud             2               2022-01-05 01:53:58.489616047 +0000 UTC deployed        node-feature-discovery-0.1.0                                                                                                v0.7.0
+    ovn4nfv                 kud             2               2022-01-05 01:54:07.488105774 +0000 UTC deployed        ovn4nfv-0.1.0                                                                                                               v3.0.0
+    ovn4nfv-network         kud             2               2022-01-05 01:54:31.79127155 +0000 UTC  deployed        ovn4nfv-network-0.1.0                                                                                                       v2.2.0
+    podsecurity             kud             2               2022-01-05 01:53:37.400019369 +0000 UTC deployed        podsecurity-0.1.0                                                                                                            
+    podsecurity             emco            2               2022-01-05 01:53:35.993351972 +0000 UTC deployed        podsecurity-0.1.0                                                                                                            
+    qat-device-plugin       kud             2               2022-01-05 01:54:03.598022943 +0000 UTC deployed        qat-device-plugin-0.1.0                                                                                                     0.19.0-kerneldrv
+    sriov-network           kud             2               2022-01-05 01:54:31.695963579 +0000 UTC deployed        sriov-network-0.1.0                                                                                                         4.8.0
+    sriov-network-operator  kud             2               2022-01-05 01:54:07.787596951 +0000 UTC deployed        sriov-network-operator-0.1.0                                                                                                4.8.0
+    tools                   emco            2               2022-01-05 01:53:58.317119097 +0000 UTC deployed        emco-tools-0.1.0
+        
+    root@pod11-node5:# kubectl --kubeconfig=icn-admin.conf get pods -A -o wide
+    NAMESPACE     NAME                                                 READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
+    emco          db-emco-mongo-0                                      1/1     Running   0          16h   10.244.65.53   pod11-node2   <none>           <none>
+    emco          emco-etcd-0                                          1/1     Running   0          16h   10.244.65.57   pod11-node2   <none>           <none>
+    emco          monitor-monitor-74649c5c64-dxhfn                     1/1     Running   0          16h   10.244.65.65   pod11-node2   <none>           <none>
+    emco          services-clm-7ff876dfc-vgncs                         1/1     Running   3          16h   10.244.65.58   pod11-node2   <none>           <none>
+    ...
+
+
+### Verification
+
+Basic self-tests of Kata, EMCO, and the other addons may be performed
+with the `kata.sh` and `addons.sh` test scripts once the workload cluster
+is ready.
+
+    root@pod11-node5:# CLUSTER_NAME=icn ./deploy/kata/kata.sh test
+    root@pod11-node5:# CLUSTER_NAME=icn ./deploy/addons/addons.sh test
+
+
+### Uninstallation
+
+To destroy the workload cluster and deprovision its machines, it is
+only necessary to delete the site Kustomization.  Uninstallation
+progress can be monitored similar to deployment with `clusterctl`,
+examining the `BareMetalHost` resources, etc.
+
+    root@pod11-node5:# kubectl -n flux-system delete Kustomization icn-master-site-pod11
+
+
diff --git a/doc/pod11-topology.odg b/doc/pod11-topology.odg
new file mode 100644 (file)
index 0000000..4bdb7f0
Binary files /dev/null and b/doc/pod11-topology.odg differ
diff --git a/doc/pod11-topology.png b/doc/pod11-topology.png
new file mode 100644 (file)
index 0000000..f3eb435
Binary files /dev/null and b/doc/pod11-topology.png differ
diff --git a/doc/quick-start.md b/doc/quick-start.md
new file mode 100644 (file)
index 0000000..1d46f7d
--- /dev/null
@@ -0,0 +1,259 @@
+# Quick start
+
+To get a taste of ICN, this guide will walk through creating a simple
+two machine cluster using virtual machines.
+
+A total of 3 virtual machines will be used: each with 8 CPUs, 24 GB
+RAM, and 30 GB disk. So grab a host machine, [install Vagrant with the
+libvirt provider](https://github.com/vagrant-libvirt/vagrant-libvirt#installation), and let's get started.
+
+TL;DR
+
+    $ vagrant up --no-parallel
+    $ vagrant ssh jump
+    vagrant@jump:~$ sudo su
+    root@jump:/home/vagrant# cd /icn
+    root@jump:/icn# make jump_server
+    root@jump:/icn# make vm_cluster
+
+
+## Create the virtual environment
+
+    $ vagrant up --no-parallel
+
+Now let's take a closer look at what was created.
+
+    $ virsh -c qemu:///system list
+     Id    Name                           State
+    ----------------------------------------------------
+     1207  vm-jump                        running
+     1208  vm-machine-1                   running
+     1209  vm-machine-2                   running
+
+    $ virsh -c qemu:///system net-list
+     Name                 State      Autostart     Persistent
+    ----------------------------------------------------------
+     vm-baremetal         active     yes           yes
+     vm-provisioning      active     no            yes
+
+    $ vbmc list
+    +--------------+---------+---------+------+
+    | Domain name  | Status  | Address | Port |
+    +--------------+---------+---------+------+
+    | vm-machine-1 | running | ::      | 6230 |
+    | vm-machine-2 | running | ::      | 6231 |
+    +--------------+---------+---------+------+
+
+We've created a jump server and the two machines that will form the
+cluster. The jump server will be responsible for creating the
+cluster.
+
+We also created two networks, baremetal and provisioning, and a third
+network overlaid upon the baremetal network using [VirtualBMC](https://opendev.org/openstack/virtualbmc) for
+issuing IPMI commands to the virtual machines.
+
+It's worth looking at these networks in more detail as they will be
+important during configuration of the jump server and cluster.
+
+    $ virsh -c qemu:///system net-dumpxml vm-baremetal
+    <network connections='3' ipv6='yes'>
+      <name>vm-baremetal</name>
+      <uuid>216db810-de49-4122-a284-13fd2e44da4b</uuid>
+      <forward mode='nat'>
+        <nat>
+          <port start='1024' end='65535'/>
+        </nat>
+      </forward>
+      <bridge name='virbr3' stp='on' delay='0'/>
+      <mac address='52:54:00:a3:e7:09'/>
+      <ip address='192.168.151.1' netmask='255.255.255.0'>
+        <dhcp>
+          <range start='192.168.151.1' end='192.168.151.254'/>
+        </dhcp>
+      </ip>
+    </network>
+
+The baremetal network provides outbound network access through the
+host and also assigns DHCP addresses in the range `192.168.151.2` to
+`192.168.151.254` to the virtual machines (the host itself is
+`192.168.151.1`).
+
+    $ virsh -c qemu:///system net-dumpxml vm-provisioning
+    <network connections='3'>
+      <name>vm-provisioning</name>
+      <uuid>d06de3cc-b7ca-4b09-a49d-a1458c45e072</uuid>
+      <bridge name='vm0' stp='on' delay='0'/>
+      <mac address='52:54:00:3e:38:a5'/>
+    </network>
+
+The provisioning network is a private network; only the virtual
+machines may communicate over it. Importantly, no DHCP server is
+present on this network. The `ironic` component of the jump server will
+be managing DHCP requests.
+
+The virtual baseband management controller controllers provided by
+VirtualBMC are listening at the address and port listed above on the
+host. To issue an IPMI command to `vm-machine-1` for example, the
+command will be issued to `192.168.151.1:6230`, and VirtualBMC will
+translate the the IPMI command into libvirt calls.
+
+Now let's look at the networks from inside the virtual machines.
+
+    $ virsh -c qemu:///system dumpxml vm-jump
+    ...
+        <interface type='network'>
+          <mac address='52:54:00:a8:97:6d'/>
+          <source network='vm-baremetal' bridge='virbr3'/>
+          <target dev='vnet0'/>
+          <model type='virtio'/>
+          <alias name='ua-net-0'/>
+          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
+        </interface>
+        <interface type='network'>
+          <mac address='52:54:00:80:3d:4c'/>
+          <source network='vm-provisioning' bridge='vm0'/>
+          <target dev='vnet1'/>
+          <model type='virtio'/>
+          <alias name='ua-net-1'/>
+          <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
+        </interface>
+    ...
+
+The baremetal network NIC in the jump server is the first NIC present
+in the machine and depending on the device naming scheme in place will
+be called `ens5` or `eth0`. Similarly, the provisioning network NIC will
+be `ens6` or `eth1`.
+
+    $ virsh -c qemu:///system dumpxml vm-machine-1
+    ...
+        <interface type='network'>
+          <mac address='52:54:00:c6:75:40'/>
+          <source network='vm-provisioning' bridge='vm0'/>
+          <target dev='vnet2'/>
+          <model type='virtio'/>
+          <alias name='ua-net-0'/>
+          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
+        </interface>
+        <interface type='network'>
+          <mac address='52:54:00:20:a3:0a'/>
+          <source network='vm-baremetal' bridge='virbr3'/>
+          <target dev='vnet4'/>
+          <model type='virtio'/>
+          <alias name='ua-net-1'/>
+          <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
+        </interface>
+    ...
+
+In contrast to the jump server, the provisioning network NIC is the
+first NIC present in the machine and will be named `ens5` or `eth0` and
+the baremetal network NIC will be `ens6` or `eth1`.
+
+The order of NICs is crucial here: the provisioning network NIC must
+be the NIC that the machine PXE boots from, and the BIOS used in this
+virtual machine is configured to use the first NIC in the machine. A
+physical machine will typically provide this as a configuration option
+in the BIOS settings.
+
+
+## Install the jump server components
+
+    $ vagrant ssh jump
+    vagrant@jump:~$ sudo su
+    root@jump:/home/vagrant# cd /icn
+
+Before telling ICN to start installing the components, it must first
+know which is the IPMI network NIC and which is the provisioning
+network NIC. Recall that in the jump server the IPMI network is
+overlaid onto the baremetal network and that the baremetal network NIC
+is `eth0`, and also that the provisioning network NIC is `eth1`.
+
+Edit `user_config.sh` to the below.
+
+    #!/usr/bin/env bash
+    export IRONIC_INTERFACE="eth1"
+
+Now install the jump server components.
+
+    root@jump:/icn# make jump_server
+
+Let's walk quickly through some of the components installed. The
+first, and most fundamental, is that the jump server is now a
+single-node Kubernetes cluster.
+
+    root@jump:/icn# kubectl cluster-info
+    Kubernetes control plane is running at https://192.168.151.45:6443
+    
+    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
+
+The next is that [Cluster API](https://cluster-api.sigs.k8s.io/) is installed, with the [Metal3](https://github.com/metal3-io/cluster-api-provider-metal3)
+infrastructure provider and Kubeadm bootstrap provider. These
+components provide the base for creating clusters with ICN.
+
+    root@jump:/icn# kubectl get deployments -A
+    NAMESPACE                           NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
+    baremetal-operator-system           baremetal-operator-controller-manager           1/1     1            1           96m
+    capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager       1/1     1            1           96m
+    capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager   1/1     1            1           96m
+    capi-system                         capi-controller-manager                         1/1     1            1           96m
+    capm3-system                        capm3-controller-manager                        1/1     1            1           96m
+    capm3-system                        capm3-ironic                                    1/1     1            1           98m
+    capm3-system                        ipam-controller-manager                         1/1     1            1           96m
+    ...
+
+A closer look at the above deployments shows two others of interest:
+`baremetal-operator-controller-manager` and `capm3-ironic`. These
+components are from the [Metal3](https://metal3.io/) project and are dependencies of the
+Metal3 infrastructure provider.
+
+Before moving on to the next step, let's take one last look at the
+provisioning NIC we set in `user_config.sh`.
+
+    root@jump:/icn# ip link show dev eth1
+    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master provisioning state UP mode DEFAULT group default qlen 1000
+        link/ether 52:54:00:80:3d:4c brd ff:ff:ff:ff:ff:ff
+
+The `master provisioning` portion indicates that this interface is now
+attached to the `provisioning` bridge. The `provisioning` bridge was
+created during installation and is how the `capm3-ironic` deployment
+will communicate with the machines to be provisioned when it is time
+to install an operating system.
+
+
+## Create a cluster
+
+    root@jump:/icn# make vm_cluster
+
+Once complete, we'll have a K8s cluster up and running on the machines
+created earlier with all of the ICN addons configured and validated.
+
+    root@jump:/icn# clusterctl -n metal3 describe cluster icn
+    NAME                                                                READY  SEVERITY  REASON  SINCE  MESSAGE
+    /icn                                                                True                     81m
+    ├─ClusterInfrastructure - Metal3Cluster/icn
+    ├─ControlPlane - KubeadmControlPlane/icn                            True                     81m
+    │ └─Machine/icn-qhg4r                                               True                     81m
+    │   └─MachineInfrastructure - Metal3Machine/icn-controlplane-r8g2f
+    └─Workers
+      └─MachineDeployment/icn                                           True                     73m
+        └─Machine/icn-6b8dfc7f6f-qvrqv                                  True                     76m
+          └─MachineInfrastructure - Metal3Machine/icn-workers-bxf52
+
+    root@jump:/icn# clusterctl -n metal3 get kubeconfig icn >icn-admin.conf
+    root@jump:/icn# kubectl --kubeconfig=icn-admin.conf cluster-info
+    Kubernetes control plane is running at https://192.168.151.254:6443
+    CoreDNS is running at https://192.168.151.254:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
+    
+    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
+
+
+## Next steps
+
+At this point you may proceed with the [Installation
+guide](installation-guide.md) to learn more about the hardware and
+software configuration in a physical environment or jump directly to
+the [Deployment](installation-guide.md#Deployment) sub-section to
+examine the cluster creation process in more detail.
+
+
+<a id="org48e2dc9"></a>
+
diff --git a/doc/sw-diagram.odg b/doc/sw-diagram.odg
new file mode 100644 (file)
index 0000000..ea9e139
Binary files /dev/null and b/doc/sw-diagram.odg differ
diff --git a/doc/sw-diagram.png b/doc/sw-diagram.png
new file mode 100644 (file)
index 0000000..a00eac4
Binary files /dev/null and b/doc/sw-diagram.png differ
diff --git a/doc/troubleshooting.md b/doc/troubleshooting.md
new file mode 100644 (file)
index 0000000..60c131e
--- /dev/null
@@ -0,0 +1,101 @@
+# Troubleshooting
+
+## Where are the logs?
+
+In addition to the monitoring and examining instructions in the
+[Deployment](installation-guide.md#deployment) section of the
+installation guide, ICN records its execution in various log files.
+These logs can be found in the 'logs' subdirectory of the component,
+for example 'deploy/ironic/logs'.
+
+The logs of the Bare Metal Operator, Cluster API, and Flux controllers
+can be examined using standard K8s tools.
+
+## Early provisioning fails
+
+First confirm that the BMC and PXE Boot configuration are correct as
+described in the [Configuration](installation-guide.md#configuration)
+section of the installation guide.
+
+It is also recommended to enable the KVM console in the machine using
+Raritan console or Intel web BMC console to observe early boot output
+during provisioning.
+
+  ![BMC console](figure-3.png)
+
+Examining the BareMetalHost resource of the failing machine and the
+logs of Bare Metal Operator and Ironic Pods may also provide a
+description of why the provisioning is failing.
+
+### openstack baremetal
+
+In rare cases, the Ironic and Bare Metal Operator information may get
+out of sync. In this case, using the 'openstack baremetal' tool can be
+used to delete the stale information.
+
+The general procedure (shown on the jump server) is:
+
+- Locate UUID of active node.
+
+      # kubectl -n metal3 get bmh -o json | jq '.items[]|.status.provisioning.ID'
+      "99f64101-04f3-47bf-89bd-ef374097fcdc"
+
+- Examine ironic information for stale node and port values.
+
+      # OS_TOKEN=fake-token OS_URL=http://localhost:6385/ openstack baremetal node list
+      +--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
+      | UUID                                 | Name        | Instance UUID                        | Power State | Provisioning State | Maintenance |
+      +--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
+      | 0ec36f3b-80d1-41e6-949a-9ba40a87f625 | None        | None                                 | None        | enroll             | False       |
+      | 99f64101-04f3-47bf-89bd-ef374097fcdc | pod11-node3 | 6e16529d-a1a4-450c-8052-46c82c87ca7b | power on    | manageable         | False       |
+      +--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
+      # OS_TOKEN=fake-token OS_URL=http://localhost:6385/ openstack baremetal port list
+      +--------------------------------------+-------------------+
+      | UUID                                 | Address           |
+      +--------------------------------------+-------------------+
+      | c65b1324-2cdd-44d0-8d25-9372068add02 | 00:1e:67:f1:5b:91 |
+      +--------------------------------------+-------------------+
+
+- Delete the stale node and port.
+
+      # OS_TOKEN=fake-token OS_URL=http://localhost:6385/ openstack baremetal node delete 0ec36f3b-80d1-41e6-949a-9ba40a87f625
+      Deleted node 0ec36f3b-80d1-41e6-949a-9ba40a87f625
+      # OS_TOKEN=fake-token OS_URL=http://localhost:6385/ openstack baremetal port delete c65b1324-2cdd-44d0-8d25-9372068add02
+      Deleted port c65b1324-2cdd-44d0-8d25-9372068add02
+
+- Create a new port.
+
+      # OS_TOKEN=fake-token OS_URL=http://localhost:6385/ openstack baremetal port create --node 99f64101-04f3-47bf-89bd-ef374097fcdc 00:1e:67:f1:5b:91
+      +-----------------------+--------------------------------------+
+      | Field                 | Value                                |
+      +-----------------------+--------------------------------------+
+      | address               | 00:1e:67:f1:5b:91                    |
+      | created_at            | 2021-04-27T22:24:08+00:00            |
+      | extra                 | {}                                   |
+      | internal_info         | {}                                   |
+      | local_link_connection | {}                                   |
+      | node_uuid             | 99f64101-04f3-47bf-89bd-ef374097fcdc |
+      | physical_network      | None                                 |
+      | portgroup_uuid        | None                                 |
+      | pxe_enabled           | True                                 |
+      | updated_at            | None                                 |
+      | uuid                  | 93366f0a-aa12-4815-b524-b95839bfa05d |
+      +-----------------------+--------------------------------------+
+
+## Helm release stuck in 'pending-install'
+
+If the HelmRelease status for a chart in the workload cluster shows
+that an install or upgrade is pending and e.g. no Pods are being
+created, it is possible the Helm controller was restarted during
+install of the HelmRelease.
+
+The fix is to remove the Helm Secret of the failing release.  After
+this, Flux will complete reconcilation succesfully.
+
+     kubectl --kubeconfig=icn-admin.conf -n emco delete secret sh.helm.release.v1.db.v1
+
+## No change in BareMetalHost state
+
+Provisioning can take a fair amount of time, refer to [Monitoring
+progress](installation-guide.md#monitoring-progress) to see where the
+process is.
diff --git a/figure-1.odg b/figure-1.odg
deleted file mode 100644 (file)
index 33e3d7a..0000000
Binary files a/figure-1.odg and /dev/null differ
diff --git a/figure-1.png b/figure-1.png
deleted file mode 100644 (file)
index 04232e1..0000000
Binary files a/figure-1.png and /dev/null differ
diff --git a/figure-2.odg b/figure-2.odg
deleted file mode 100644 (file)
index ac4b9b6..0000000
Binary files a/figure-2.odg and /dev/null differ
diff --git a/figure-2.png b/figure-2.png
deleted file mode 100644 (file)
index 6952583..0000000
Binary files a/figure-2.png and /dev/null differ