2 ICN strives to automate the process of installing the local cluster
3 controller to the greatest degree possible – "zero touch
4 installation". Once the jump server (Local Controller) is booted and
5 the compute cluster-specific values are provided, the controller
6 begins to inspect and provision the bare metal servers until the
7 cluster is entirely configured. This document shows step-by-step how
8 to configure the network and deployment architecture for the ICN
14 # Deployment Architecture
15 The Local Controller is provisioned with the Cluster API controllers
16 and the Metal3 infrastructure provider, which enable provisioning of
17 bare metal servers. The controller has three network connections to
18 the bare metal servers: network A connects bare metal servers, network
19 B is a private network used for provisioning the bare metal servers
20 and network C is the IPMI network, used for control during
21 provisioning. In addition, the bare metal servers connect to the
22 network D, the SRIOV network.
24 ![Figure 1](figure-1.png)*Figure 1: Deployment Architecture*
26 - Net A -- Bare metal network, lab networking for ssh. It is used as
27 the control plane for K8s, used by OVN and Flannel for the overlay
29 - Net B (internal network) -- Provisioning network used by Ironic to
31 - Net C (internal network) -- IPMI LAN to do IPMI protocol for the OS
32 provisioning. The NICs support IPMI. The IP address should be
33 statically assigned via the IPMI tool or other means.
34 - Net D (internal network) -- Data plane network for the Akraino
35 application. Using the SR-IOV networking and fiber cables. Intel
36 25GB and 40GB FLV NICs.
38 In some deployment models, you can combine Net C and Net A to be the
39 same networks, but the developer should take care of IP address
40 management between Net A and IPMI address of the server.
42 Also note that the IPMI NIC may share the same RJ-45 jack with another
45 # Pre-installation Requirements
46 There are two main components in ICN Infra Local Controller - Local
47 Controller and K8s compute cluster.
50 The Local Controller will reside in the jump server to run the Cluster
51 API controllers with the Kubeadm bootstrap provider and Metal3
52 infrastructure provider.
54 ### K8s Compute Cluster
55 The K8s compute cluster will actually run the workloads and is
56 installed on bare metal servers.
58 ## Hardware Requirements
60 ### Minimum Hardware Requirement
61 All-in-one VM based deployment requires servers with at least 32 GB
64 ### Recommended Hardware Requirements
65 Recommended hardware requirements are servers with 64GB Memory, 32
66 CPUs and SRIOV network cards.
68 ## Software Prerequisites
69 The jump server is required to be pre-installed with Ubuntu 18.04.
71 ## Database Prerequisites
72 No prerequisites for ICN blueprint.
74 ## Other Installation Requirements
76 ### Jump Server Requirements
78 #### Jump Server Hardware Requirements
79 - Local Controller: at least three network interfaces.
80 - Bare metal servers: four network interfaces, including one IPMI interface.
81 - Four or more hubs, with cabling, to connect four networks.
84 Hostname | CPU Model | Memory | Storage | 1GbE: NIC#, VLAN, (Connected extreme 480 switch) | 10GbE: NIC# VLAN, Network (Connected with IZ1 switch)
85 ---------|-----------|--------|---------|--------------------------------------------------|------------------------------------------------------
86 jump0 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 |
88 #### Jump Server Software Requirements
89 ICN supports Ubuntu 18.04. The ICN blueprint installs all required
90 software during `make jump_server`.
92 ### Network Requirements
93 Please refer to figure 1 for all the network requirements of the ICN
96 Please make sure you have 3 distinguished networks - Net A, Net B and
97 Net C as mentioned in figure 1. Local Controller uses the Net B and
98 Net C to provision the bare metal servers to do the OS provisioning.
100 ### Bare Metal Server Requirements
102 ### K8s Compute Cluster
104 #### Compute Server Hardware Requirements
106 Hostname | CPU Model | Memory | Storage | 1GbE: NIC#, VLAN, (Connected extreme 480 switch) | 10GbE: NIC# VLAN, Network (Connected with IZ1 switch)
107 ---------|-----------|--------|---------|--------------------------------------------------|------------------------------------------------------
108 node1 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
109 node2 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
110 node3 | Intel 2xE5-2699 | 64GB | 3TB (Sata)<br/>180 (SSD) | eth0: VLAN 110<br/>eno1: VLAN 110<br/>eno2: VLAN 111 | eno3: VLAN 113
112 #### Compute Server Software Requirements
113 The Local Controller will install all the software in compute servers
114 from the OS to the software required to bring up the K8s cluster.
116 ### Execution Requirements (Bare Metal Only)
117 The ICN blueprint checks all the precondition and execution
118 requirements for bare metal.
120 # Installation High-Level Overview
121 Installation is two-step process:
122 - Installation of the Local Controller.
123 - Installation of a compute cluster.
125 ## Bare Metal Deployment Guide
127 ### Install Bare Metal Jump Server
129 #### Creating the Settings Files
131 ##### Local Controller Network Configuration Reference
132 The user will find the network configuration file named as
133 "user_config.sh" in the ICN parent directory.
139 #Ironic Metal3 settings for provisioning network (Net B)
140 export IRONIC_INTERFACE="eno2"
144 After configuring the network configuration file, please run `make
145 jump_server` from the ICN parent directory as shown below:
148 root@jump0:# git clone "https://gerrit.akraino.org/r/icn"
149 Cloning into 'icn'...
150 remote: Counting objects: 69, done
151 remote: Finding sources: 100% (69/69)
152 remote: Total 4248 (delta 13), reused 4221 (delta 13)
153 Receiving objects: 100% (4248/4248), 7.74 MiB | 21.84 MiB/s, done.
154 Resolving deltas: 100% (1078/1078), done.
156 root@jump0:# make jump_server
159 The following steps occurs once the `make jump_server` command is
161 1. All the software required to run the bootstrap cluster is
162 downloaded and installed.
163 2. K8s cluster to maintain the bootstrap cluster and all the servers
164 in the edge location is installed.
165 3. Metal3 specific network configuration such as local DHCP server
166 networking for each edge location, Ironic networking for both
167 provisioning network and IPMI LAN network are identified and
169 4. The Cluster API controllers, bootstrap, and infrastructure
170 providers and configured and installed.
171 5. The Flux controllers are installed.
173 #### Creating a compute cluster
174 A compute cluster is composed of installations of two types of Helm
175 charts: machine and cluster. The specific installations of these Helm
176 charts are defined in HelmRelease resources consumed by the Flux
177 controllers in the jump server. The user is required to provide the
178 machine and cluster specific values in the HelmRelease resources.
180 ##### Preconfiguration for the compute cluster in Jump Server
181 The user is required to provide the IPMI information of the servers
182 and the values of the compute cluster they connect to the Local
185 If the baremetal network provides a DHCP server with gateway and DNS
186 server information, and each server has identical hardware then a
187 cluster template can be used. Otherwise these values must also be
188 provided with the values for each server. Refer to the machine chart
189 in icn/deploy/machine for more details. In the example below, no DHCP
190 server is present in the baremetal network.
192 > *NOTE:* To assist in the migration of R5 and earlier release's use
193 > from `nodes.json` and the Provisioning resource to a site YAML, a
194 > helper script is provided at `tools/migration/to_r6.sh`.
203 apiVersion: source.toolkit.fluxcd.io/v1beta1
209 gitImplementation: go-git
214 url: https://gerrit.akraino.org/r/icn
216 apiVersion: helm.toolkit.fluxcd.io/v2beta1
225 chart: deploy/machine
234 bmcAddress: ipmi://10.10.110.11
236 bmcPassword: password
239 macAddress: 00:1e:67:fe:f4:19
241 ipAddress: 10.10.110.21/24
243 nameservers: ["8.8.8.8"]
245 macAddress: 00:1e:67:fe:f4:1a
248 macAddress: 00:1e:67:f8:6a:41
250 ipAddress: 10.10.113.3/24
252 apiVersion: helm.toolkit.fluxcd.io/v2beta1
261 chart: deploy/machine
270 bmcAddress: ipmi://10.10.110.12
272 bmcPassword: password
275 macAddress: 00:1e:67:f1:5b:90
277 ipAddress: 10.10.110.22/24
279 nameservers: ["8.8.8.8"]
281 macAddress: 00:1e:67:f1:5b:91
284 macAddress: 00:1e:67:f8:69:81
286 ipAddress: 10.10.113.4/24
288 apiVersion: helm.toolkit.fluxcd.io/v2beta1
291 name: cluster-compute
297 chart: deploy/cluster
304 controlPlaneEndpoint: 10.10.110.21
305 controlPlaneHostSelector:
312 hashedPassword: $6$rounds=10000$PJLOBdyTv23pNp$9RpaAOcibbXUMvgJScKK2JRQioXW4XAVFMRKqgCB5jC4QmtAdbA70DU2jTcpAd6pRdEZIaWFjLCNQMBmiiL40.
313 sshAuthorizedKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrxu+fSrU51vgAO5zP5xWcTU8uLv4MkUZptE2m1BJE88JdQ80kz9DmUmq2AniMkVTy4pNeUW5PsmGJa+anN3MPM99CR9I37zRqy5i6rUDQgKjz8W12RauyeRMIBrbdy7AX1xasoTRnd6Ta47bP0egiFb+vUGnlTFhgfrbYfjbkJhVfVLCTgRw8Yj0NSK16YEyhYLbLXpix5udRpXSiFYIyAEWRCCsWJWljACr99P7EF82vCGI0UDGCCd/1upbUwZeTouD/FJBw9qppe6/1eaqRp7D36UYe3KzLpfHQNgm9AzwgYYZrD4tNN6QBMq/VUIuam0G1aLgG8IYRLs41HYkJ root@jump0
315 url: https://gerrit.akraino.org/r/icn
317 path: ./deploy/site/cluster-icn
320 A brief overview of the values is below. Refer to the machine and
321 cluster charts in deploy/machine and deploy/cluster respectively for
324 - *machineName*: This will be the hostname for the machine, once it is
325 provisioned by Metal3.
326 - *bmcUsername*: BMC username required to be provided for Ironic.
327 - *bmcPassword*: BMC password required to be provided for Ironic.
328 - *bmcAddress*: BMC server IPMI LAN IP address.
329 - *networks*: A dictionary of the networks used by ICN. For more
330 information, refer to the *networkData* field of the BareMetalHost
332 - *macAddress*: The MAC address of the interface.
333 - *type*: The type of network, either dynamic ("ipv4_dhcp") or
335 - *ipAddress*: Only valid for type "ipv4"; the IP address of the
337 - *gateway*: Only valid for type "ipv4"; the gateway of this
339 - *nameservers*: Only valid for type "ipv4"; an array of DNS
341 - *clusterName*: The name of the cluster.
342 - *controlPlaneEndpoint*: The K8s control plane endpoint. This works
343 in cooperation with the *controlPlaneHostSelector* to ensure that it
344 addresses the control plane node.
345 - *controlPlaneHostSelector*: A K8s match expression against labels on
346 the *BareMetalHost* machine resource (from the *machineLabels* value
347 of the machine Helm chart). This will be used by Cluster API to
348 select machines for the control plane.
349 - *workersHostSelector*: A K8s match expression selecting worker
351 - *userData*: User data values to be provisioned into each machine in
353 - *hashedPassword*: The hashed password of the default user on each
355 - *sshAuthorizedKey*: An authorized public key of the *root* user on
357 - *flux*: An optional repository to continuously reconcile the created
361 After configuring the machine and cluster site values, the next steps
362 are to encrypt the secrets contained in the file, commit the file to
363 source control, and create the Flux resources on the jump server
364 pointing to the committed files.
366 1. Create a key protect the secrets in the values if one does not
367 already exist. The key created below will be named "site-secrets".
370 root@jump0:# ./deploy/site/site.sh create-gpg-key site-secrets
373 2. Encrypt the secrets in the site values.
376 root@jump0:# ./deploy/site/site.sh sops-encrypt-site site.yaml site-secrets
379 3. Commit the site.yaml and additional files (sops.pub.asc,
380 .sops.yaml) created by sops-encrypt-site to a Git repository. For
381 the purposes of the next step, site.yaml will be committed to a Git
382 repository hosted at URL, on the specified BRANCH, and at location
383 PATH inside the source tree.
385 4. Create the Flux resources to deploy the resources described by the
386 repository in step 3. This creates a GitRepository resource
387 containing the URL and BRANCH to synchronize, a Secret resource
388 containing the private key used to decrypt the secrets in the site
389 values, and a Kustomization resource with the PATH to the site.yaml
390 file at the GitRepository.
393 root@jump0:# ./deploy/site/site.sh flux-create-site URL BRANCH PATH site-secrets
396 The progress of the deployment may be monitored in a number of ways:
399 root@jump0:# kubectl -n metal3 get baremetalhost
400 root@jump0:# kubectl -n metal3 get cluster compute
401 root@jump0:# clusterctl -n metal3 describe cluster compute
404 When the control plane is ready, the kubeconfig can be obtained with
405 clusterctl and used to access the compute cluster:
408 root@jump0:# clusterctl -n metal3 get kubeconfig compute >compute-admin.conf
409 root@jump0:# kubectl --kubeconfig=compute-admin.conf cluster-info
412 ## Virtual Deployment Guide
414 ### Standard Deployment Overview
415 ![Figure 2](figure-2.png)*Figure 2: Virtual Deployment Architecture*
417 Virtual deployment is used for the development environment using
418 Vagrant to create VMs with PXE boot. No setting is required from the
419 user to deploy the virtual deployment.
421 ### Snapshot Deployment Overview
422 No snapshot is implemented in ICN R6.
424 ### Special Requirements for Virtual Deployment
426 #### Install Jump Server
427 Jump server is required to be installed with Ubuntu 18.04. This will
428 install all the VMs and install the K8s clusters.
430 #### Verifying the Setup - VMs
431 To verify the virtual deployment, execute the following commands:
433 $ vagrant up --no-parallel
435 vagrant@jump:~$ sudo su
436 root@jump:/home/vagrant# cd /icn
437 root@jump:/icn# make jump_server
438 root@jump:/icn# make vm_cluster
440 `vagrant up --no-parallel` creates three VMs: vm-jump, vm-machine-1,
441 and vm-machine-2, each with 16GB RAM and 8 vCPUs. `make jump_server`
442 installs the jump server components into vm-jump, and `make
443 vm_cluster` installs a K8s cluster on the vm-machine VMs using Cluster
444 API. The cluster is configured to use Flux to bring up the cluster
445 with all addons and plugins.
447 # Verifying the Setup
448 ICN blueprint checks all the setup in both bare metal and VM
449 deployment. Verify script will first confirm that the cluster control
450 plane is ready then run self tests of all addons and plugins.
452 **Bare Metal Verifier**: Run the `make bm_verifer`, it will verify the
453 bare-metal deployment.
455 **Verifier**: Run the `make vm_verifier`, it will verify the virtual
458 # Developer Guide and Troubleshooting
459 For development uses the virtual deployment, it take up to 10 mins to
460 bring up the virtual BMC VMs with PXE boot.
462 ## Utilization of Images
463 No images provided in this ICN release.
465 ## Post-deployment Configuration
466 No post-deployment configuration required in this ICN release.
468 ## Debugging Failures
469 * For first time installation enable KVM console in the trial or lab
470 servers using Raritan console or use Intel web BMC console.
472 ![Figure 3](figure-3.png)
473 * Deprovision state will result in Ironic agent sleeping before next
474 heartbeat - it is not an error. It results in bare metal server
475 without OS and installed with ramdisk.
476 * Deprovision in Metal3 is not straight forward - Metal3 follows
477 various stages from provisioned, deprovisioning and ready. ICN
478 blueprint take care navigating the deprovisioning states and
479 removing the BareMetalHost (BMH) custom resouce in case of cleaning.
480 * Manual BMH cleaning of BMH or force cleaning of BMH resource result
481 in hang state - use `make bmh_clean` to remove the BMH state.
482 * Logs of Ironic, openstack baremetal command to see the state of the
484 * Logs of baremetal operator gives failure related to images or images
486 * It is not possible to change the state from provision to deprovision
487 or deprovision to provision without completing that state. All the
488 issues are handled in ICN scripts.
491 Required Linux Foundation ID to launch bug in ICN:
492 https://jira.akraino.org/projects/ICN/issues
496 ## Bare Metal deployment
497 The command `make clean_all` uninstalls all the components installed by
499 * It de-provision all the servers provisioned and removes them from
501 * Baremetal operator is deleted followed by Ironic database and
503 * Network configuration such internal DHCP server, provisioning
504 interfaces and IPMI LAN interfaces are deleted.
505 * It will reset the bootstrap cluster - K8s cluster is torn down in
506 the jump server and all the associated docker images are removed.
507 * All software packages installed by `make jump_server` are removed,
508 such as Ironic, openstack utility tool, docker packages and basic
509 prerequisite packages.
511 ## Virtual deployment
512 The command `vagrant destroy -f` uninstalls all the components for the
517 ## Error Message Guide
518 The error message is explicit, all messages are captured in log
523 ## Blueprint Package Maintenance
524 No packages are maintained in ICN.
526 ## Software maintenance
529 ## Hardware maintenance
532 ## BluePrint Deployment Maintenance
535 # Frequently Asked Questions
536 **How to setup IPMI?**
538 First, make sure the IPMI tool is installed in your servers, if not
539 install them using `apt install ipmitool`. Then, check for the
540 ipmitool information of each servers using the command `ipmitool lan
541 print 1`. If the above command doesn't show the IPMI information, then
542 setup the IPMI static IP address using the following instructions:
543 - Mostl easy way to set up IPMI topology in your lab setup is by
546 https://www.thomas-krenn.com/en/wiki/Configuring_IPMI_under_Linux_using_ipmitool
547 - IPMI information can be considered during the BIOS setting as well.
549 **BMC web console URL is not working?**
551 It is hard to find issues or reason. Check the ipmitool bmc info to
552 find the issues, if the URL is not available.
554 **No change in BMH state - provisioning state is for more than 40min?**
556 Generally, Metal3 provision for bare metal takes 20 - 30 mins. Look at
557 the Ironic logs and baremetal operator to look at the state of
558 servers. Openstack baremetal node shows all state of the server right
561 **Why provider network (baremetal network configuration) is required?**
563 Generally, provider network DHCP servers in a lab provide the router
564 and DNS server details. In some labs, there is no DHCP server or the
565 DHCP server does not provide this information.
571 * Copyright 2019 Intel Corporation, Inc
573 * Licensed under the Apache License, Version 2.0 (the "License");
574 * you may not use this file except in compliance with the License.
575 * You may obtain a copy of the License at
577 * http://www.apache.org/licenses/LICENSE-2.0
579 * Unless required by applicable law or agreed to in writing, software
580 * distributed under the License is distributed on an "AS IS" BASIS,
581 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
582 * See the License for the specific language governing permissions and
583 * limitations under the License.
589 # Definitions, acronyms and abbreviations