In this article we are going to install and configure IBM Spectrum Scale (GPFS) 5.0.4.2 for dynamic PVs provisioning. In my OpenShift cluster I have 5 Workers with a 300GB disk installed for each of them which will be united in a storage quorum. As a GPFS GUI and API server I am going to use another machine which is neither in the OpenShift cluster nor the storage quorum.
Steps to run on EACH OF the 5 Workers
Create a partition for the installed block device (300GiB /dev/sdb in my case). You can list all partitions using fdisk -l
command.
fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0xc3525989.
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-629145600, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-629145600, default 209715199):
Using default value 629145600
Partition 1 of type Linux and of size 300 GiB is set
Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): 0
Changed type of partition 'Linux' to 'Empty'
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Please validate your steps with the following link from the official documentation:
https://www.ibm.com/support/knowledgecenter/SSCJDQ/com.ibm.swg.im.dashdb.doc/admin/GPFS-FPO_setup.html
Please ensure that communication between your Workers and a GPFS GUI machine is possible using passwordless SSH. Follow step 2 in the link above to configure this.
Partition the data across the Network Shared Disks (NSDs)
partprobe -s
parted /dev/sdb mklabel gpt
parted /dev/sdb -s -a optimal mkpart primary 0% 100%
Install the IBM Spectrum Scale FPO binary file
yum -y install unzip ksh perl libaio.x86_64 net-tools m4 kernel-devel gcc-c++ psmisc.x86_64 kernel-devel.x86_64
mkdir /root/spectrum_scale
cd /root/spectrum_scale
Download Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install. This manual suggests you have access and rights to download and use required software.
chmod +x Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install
./Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install
cd /usr/lpp/mmfs/5.0.4.2/gpfs_rpms
rpm -ivh gpfs.base*.rpm gpfs.gpl*rpm gpfs.license.adv*.rpm gpfs.gskit*rpm gpfs.msg*rpm gpfs.adv*rpm gpfs.crypto*rpm
sed -i 's/)/) Red Hat Enterprise Linux/g' /etc/redhat-release
/usr/lpp/mmfs/bin/mmbuildgpl
On a node which is reserved for Spectrum Scale GUI and API (a separate machine!!! - in my case called worker6) run the following commands:
yum install -y lsof postgresql-contrib postgresql-server boost-regex nc bzip2
mkdir /root/spectrum_scale && cd /root/spectrum_scale
chmod +x Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install
./Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install
yum -y install unzip ksh perl libaio.x86_64 net-tools m4 kernel-devel gcc-c++ psmisc.x86_64 kernel-devel.x86_64
cd /usr/lpp/mmfs/5.0.4.2/gpfs_rpms
rpm -ivh gpfs.base*.rpm gpfs.gpl*rpm gpfs.license.adv*.rpm gpfs.gskit*rpm gpfs.msg*rpm gpfs.adv*rpm gpfs.crypto*rpm
sed -i 's/)/) Red Hat Enterprise Linux/g' /etc/redhat-release
/usr/lpp/mmfs/bin/mmbuildgpl
Troubleshooting
I recognised that the last command can fail if the kernel version is 3.10.0-112 and you can catch a kernel related error as a result of the last command.
First of all, check installed kernel packages:
yum list installed | grep kernel
If kernel version is 3.10.0-1127.el7 then it should be set to 3.10.0-1062.12.1.el7.x86_64
This article describes how to do this:
https://access.redhat.com/solutions/3089
reboot is required after:shutdown -r now
If you get errors with headers like: "The required package kernel-devel (kernel-devel-3.10.0-1127.el7.x8664) is already installed but headers are missing in the expected location". Then reinstall kernel-devel with the correct version.
yum -y remove kernel-devel && yum -y install "kernel-devel-uname-r == $(uname -r)"
If you need to remove unused kernel packages you can userpm -qa | grep kernel
to find them out andyum remove <package_name>
These versions worked for me:
Steps to run on the head node ONLY
I chose Worker1 as a head node. Configure IBM Spectrum Scale FPO by performing the following substeps:
export PATH=$PATH:/usr/lpp/mmfs/bin
mkdir /root/gpfs && cd /root/gpfs
Create a nodes file with the following content. Worker6 is for GPFS GUI and inspite of its name it is not in OpenShift Cluster, it's just a standalone machine that accidentally got worker6 hostname.
vi gpfs-fpo-nodefile
worker1.ocp01.marukhno.com:quorum-manager:
worker2.ocp01.marukhno.com:quorum-manager:
worker3.ocp01.marukhno.com:quorum-manager:
worker4.ocp01.marukhno.com:quorum-manager:
worker5.ocp01.marukhno.com:quorum-manager:
worker6.ocp01.marukhno.com:manager-nonquorum:
Create the cluster by issuing the mmcrcluster command with the -N, -p, -s, -r, and -R parameters. Please check again that passwordless SSH is configured between your machines before running nex commands!!!
For more detailes follow this link:
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrcluster.htm
mmcrcluster -N gpfs-fpo-nodefile -p worker1.ocp01.marukhno.com -s worker2.ocp01.marukhno.com -C gpfs-fpo-cluster -A -r /usr/bin/ssh -R /usr/bin/scp
Set the license mode for each node by issuing the mmchlicense command.
mmchlicense server --accept -N worker1.ocp01.marukhno.com
mmchlicense server --accept -N worker2.ocp01.marukhno.com
mmchlicense server --accept -N worker3.ocp01.marukhno.com
mmchlicense server --accept -N worker4.ocp01.marukhno.com
mmchlicense server --accept -N worker5.ocp01.marukhno.com
mmchlicense server --accept -N worker6.ocp01.marukhno.com
Start the cluster
mmstartup -a
Issue the following command. Ensure that the output shows that all nodes are active:
mmgetstate -a -L

Create fpo-poolfile
vi fpo-poolfile
%pool:
pool=system
blocksize=1024K
usage=dataAndMetadata
layoutMap=cluster
allowWriteAffinity=yes
writeAffinityDepth=1
blockGroupFactor=10
%nsd: nsd=worker1_hdd_1 device=/dev/sdb servers=worker1.ocp01.marukhno.com failureGroup=1,0,1 pool=system
%nsd: nsd=worker2_hdd_1 device=/dev/sdb servers=worker2.ocp01.marukhno.com failureGroup=2,0,1 pool=system
%nsd: nsd=worker3_hdd_1 device=/dev/sdb servers=worker3.ocp01.marukhno.com failureGroup=3,0,1 pool=system
%nsd: nsd=worker4_hdd_1 device=/dev/sdb servers=worker4.ocp01.marukhno.com failureGroup=4,0,1 pool=system
%nsd: nsd=worker5_hdd_1 device=/dev/sdb servers=worker5.ocp01.marukhno.com failureGroup=5,0,1 pool=system
Configure the disks for the cluster by issuing the mmcrnsd command
mmcrnsd -F fpo-poolfile
Verify that the disks were added by issuing the following command
mmlsnsd -m

Create the cluster file system by issuing the mmcrfs command. An example follows. Please ensure that they are appropriate for your setup.
mmcrfs clusterfs -F fpo-poolfile -j scatter -B 1048576 -L 16M -A yes -i 4096 -m 3 -M 3 -n 32 -r 3 -R 3 -S relatime -E no -T /root/clusterfs
Mount the cluster file system by issuing the following command
mmmount clusterfs /root/clusterfs -N all
Verify that the cluster file system is mounted on all nodes
for i in {1..5}; do ssh worker$i -- 'mount | grep gpfs'; done

Install Spectrum Scale GUI and API
Now we will install Spectrum Scale GUI and API on a separate machine which is neither in OCP cluster nor GPFS quorum. In my case this machine has hostname worker6.
More information about this step is in the official documentation
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_manualinstallofgui.htm
Spectrum Scale must be already installed on this machine. Be sure that you completed the steps described in the block above "On a node which is reserved for Spectrum Scale GUI and API".
Install and start GPFS GUI
cd /usr/lpp/mmfs/5.0.4.2/zimon_rpms/rhel7/
rpm -ivh gpfs.gss.pmcollector-5.0.4-2.el7.x86_64.rpm
rpm -ivh gpfs.gss.pmsensors-5.0.4-2.el7.x86_64.rpm
cd /usr/lpp/mmfs/5.0.4.2/gpfs_rpms
rpm -ivh gpfs.java-5.0.4-2.x86_64.rpm
rpm -ivh gpfs.base-5.0.4-2.x86_64.rpm
rpm -ivh gpfs.gskit-8.0.50-86.x86_64.rpm
rpm -ivh gpfs.gui-5.0.4-2.noarch.rpm
systemctl start gpfsgui && systemctl enable gpfsgui
Create two required users. You will be prompted to create a password after each command.
/usr/lpp/mmfs/gui/cli/mkuser gpfsadmin -g SecurityAdmin
/usr/lpp/mmfs/gui/cli/mkuser csiadmin -g CsiAdmin
Check that GUI is available. Open the following link in a browser: https://<server_ip>/

Optionally you can configure metrics collector for GPFS GUI. In order to do this run the following commands:
cd /usr/lpp/mmfs/5.0.4.2/zimon_rpms/rhel7/
rpm -ivh gpfs.gss.pmcollector-5.0.4-2.el7.x86_64.rpm
rpm -ivh gpfs.gss.pmsensors-5.0.4-2.el7.x86_64.rpm
systemctl start pmsensors && systemctl enable pmsensors
systemctl start pmcollector && systemctl enable pmcollector
export PATH=$PATH:/usr/lpp/mmfs/bin
mmperfmon config generate --collectors worker1.ocp01.marukhno.com
mmchnode --perfmon -N worker1.ocp01.marukhno.com
Check the status of all services
systemctl status pmsensors
systemctl status pmcollector
systemctl status gpfsgui
Install IBM Spectrum Scale CSI Plugin Operator
More details can be found here: https://developer.ibm.com/storage/2020/01/20/how-to-use-ibm-spectrum-scale-with-csi-operator-1-0-on-openshift-4-2-sample-usage-scenario-with-tensorflow-deployment/
Make sure the filesystems you want to use in Openshift are mounted with quota option enabled
mmchfs clusterfs -Q yes
And filesetdf flag is active for correct ‘df’ output on pods with dynamically provisioned storage.
mmchfs clusterfs --filesetdf
Create a namespace in OpenShift and switch there
oc new-project ibm-spectrum-scale-csi-driver
Login to OpenShift console and install the IBM Spectrum Scale operator

Create a secret csisecret-local. For username and password use base64 encoded credentials for csiadmin user you created before.
vi gpfs-local-secret.yaml
apiVersion: v1
data:
username: Y3NpYWRtaW4=
password: cGFzc3cwcmQ=
kind: Secret
type: Opaque
metadata:
name: csisecret-local
namespace: ibm-spectrum-scale-csi-driver
labels:
app.kubernetes.io/name: ibm-spectrum-scale-csi-operator # Used by the operator to detect changes, set on load of CR change if secret matches name in CR and namespace.
Check that the secret was created
oc get secret -n ibm-spectrum-scale-csi-driver | grep csisecret-local
Create an instance of IBM Spectrum Scale Driver

You will be prompted to configure a yaml file for the driver. Set required values there. Use my yaml below as a reference.
apiVersion: csi.ibm.com/v1
kind: CSIScaleOperator
metadata:
labels:
app.kubernetes.io/instance: ibm-spectrum-scale-csi-operator
app.kubernetes.io/managed-by: ibm-spectrum-scale-csi-operator
app.kubernetes.io/name: ibm-spectrum-scale-csi-operator
name: ibm-spectrum-scale-csi
namespace: ibm-spectrum-scale-csi-driver
spec:
clusters:
- id: '9622861446451689873'
primary:
primaryFs: clusterfs
primaryFset: csifset
restApi:
- guiHost: 192.168.28.59
secrets: csisecret-local
secureSslMode: false
scaleHostpath: /root/clusterfs
status: {}
To get cluster id run this command: mmlscluster
If you don’t enforce quota on root user, then he can use all available gpfs storage on a filesystem (thus, ignoring the storage request value as stated in a PVC). To enforce quota for root user, execute the command
mmchconfig enforceFilesetQuotaOnRoot=yes -i
Test that our dynamic provisioning is working
Create a storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ibm-spectrum-scale-csi-lt
provisioner: spectrumscale.csi.ibm.com
parameters:
volBackendFs: "clusterfs"
volDirBasePath: "csifset"
reclaimPolicy: Delete
Create a test PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: scale-test-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: ibm-spectrum-scale-csi-lt
Check that a correlated PV was created dynamicaly and was bounded to a PVC
oc get pv
If everything is working as expected delete the test PVC. As the reclaim policy is Delete - it will also delete a corresponding PV together with the folder.
oc delete pvc scale-test-pvc