Using cephfs with kubernetes

Sat, Feb 11, 2023 5-minute read

Introduction

When having a kubernetes cluster running like I do, it is also nice to have a scalable and resilient filesystem where your nodes can share configuration files or other read only data.

Previously I had my cluster pods mount a nfs share inside the pods, but since this is a single point of failure I wanted to move to a distributed filesystem and since I recently installed a ceph cluster for VM storage I decided to also leverage the cephfs filesystem instead of NFS.

This will allow me to directly mount the cephfs filesystem on the kubernetes nodes and mount parts of the filesystem into the pods, which ensures that the pods will always have access to their filesystem unless my entire ceph cluster dies.

So this guide is how to enable ceph on your kubernetes nodes, mount the filesystem and simply having it prepared for your pods.

Assumptions

For the purpose of this blog article we assume that the following values are being used

Client name: kubedata

Cephfs file system name: kubedata

Ceph monitor IP: 192.168.210.10

Ceph version used is Quincy (17.x)

Kubernetes node is a RHEL variant linux, i.e. RHEL, CentOS, Rocky Linux.

Prerequisites

Packages

To help with installing ceph - you can download a python script that help you automate installing many things.

This script requires python3 - if that is not installed you need to install it by entering the following command in a shell

sudo dnf -y install python3

To install the ceph helper script enter the following commands in a shell

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
chmod +x ./cephadm
sudo ./cephadm add-repo --release quincy

These three lines downloads the cephadm script, makes it executable and finally adds the ceph repository.

To install the minimum required packages for cephfs enter the following command in a shell

sudo dnf -y install ceph-common

Now the prerequisite software are installed and we can move onto getting the data we require to be able to successfully connect to the ceph cluster.

Data

To be able to mount a cephfs on your kubernetes nodes you require the following information:

The cluster id
The cephfs filesystem name (You need to know this by looking it up on the cluster)
A client credential

All this information can be retrieved automatically by simply connecting to the ceph cluster from the kubernetes node

Actions

Grab required cluster information

To retrieve the information you require to connect successfully to the cluster the following commands should be done

To copy a ceph config you can do the following:

ssh root@192.168.210.10 "sudo ceph config generate-minimal-conf" | sudo tee /etc/ceph/ceph.conf

This will connect to the monitor host, generate a configuration file and write it to the local machine in /etc/ceph/ceph.conf

This configuration file contains the list of monitor hosts for the ceph cluster and the cluster id. An excample could look like the below

# minimal ceph.conf for da5cbdc2-5c9b-48ab-908a-f03d6b2e6024
[global]
        fsid = da5cbdc2-5c9b-48ab-908a-f03d6b2e6024
        mon_host = [v2:192.168.210.10:3300/0,v1:192.168.210.10:6789/0] [v2:192.168.210.11:3300/0,v1:192.168.210.11:6789/0] [v2:192.168.210.12:3300/0,v1:192.168.210.12:6789/0]

The last bit we require is to set up authorization, so we need a client secret - this is done by connecting to the ceph monitor again - and authorizing the user kubedata and writing the secret to local disk.

ssh root@192.168.210.10 "sudo ceph fs authorize kubedata client.kubedata / rw" | sudo tee /etc/ceph/ceph.client.kubedata.keyring

This command connects to the monitor, authorizes the kubedata client with read/write permissions on the root of the kubedata filesystem. Then it writes the secret to /etc/ceph/ceph.client.kubedata.keyring

The contents of the file might look like the below

[client.kubedata]
        key = ACDCeedjqx9JMxAABrEmXNxQkWaKfyEAO/AqcQ==

Now all the data required to connect to the cluster is on the local machine, so its simply just grabbing the required information from either /etc/ceph/ceph.conf or /etc/ceph/ceph.client.kubedata.keyring

Finishing up

Verification

To verify that you have done everything correctly you can mount the cephfs filesystem.

First set up a mount point, i.e. kubedata sudo mkdir /mnt/kubedata.

Then you mount the filesystem at the mount point

sudo mount.ceph kubedata@da5cbdc2-5c9b-48ab-908a-f03d6b2e6024.kubedata=/ /mnt/kubedata

If everything was done correctly, you should see your cephfs filesystem in /mnt/kubedata and be able to read/write data.

If everything works, you can move on to auto mounting the filesystem when the machine boots - and if its not working you need to find out if you skipped a step - or your cluster requires extra setup not part of this guide.

Auto mount

Auto mounting is easiest to set up in /etc/fstab

So open up the file in your favorite editor and add a line similar to the one below

kubedata@da5cbdc2-5c9b-48ab-908a-f03d6b2e6024.kubedata=/     /mnt/kubedata2    ceph    mon_addr=192.168.210.10:6789,rw,noatime,_netdev    0       0

The important bits to notice here are the mountpoint /mnt/kubedata which should match the mount point where you want to mount the filesystem.

The kubedata@da5cbdc2-5c9b-48ab-908a-f03d6b2e6024.kubedata - which is the name of the client, clusterid and the name of the ceph filesystem, i.e. client@clusterid.filesystem

You do not need to add the clusterid and you can just leave it as clientid@.filesystem - but sometimes it is good to be explicit.

The mon_addr=192.168.210.10:6789 which is the ip/port of the ceph monitor - if you have multiple - you separate them with a /, ie. mon_addr=192.168.210.10:6789/192.168.210.11:6789/192.168.210.12:6789

When the line has been added to the /etc/fstab file its time to test that it works - this is done by simply calling sudo mount -a and if you get no errors, your filesystem should be mounted at the mount point and should automatically mount when the server boots.

Last words

All that needs to happen now is that the pods inside my kubernetes cluster needs to be changed, so instead of mounting their configuration and static data from a nfs server, they simply mount the /mnt/kubedata from the kubernetes node.

Which happens in the PersistentVolume/PersistentVolumeClaim definitions, i.e. something similar to the below:

#
# PersistentVolume
#
apiVersion: v1
kind: PersistentVolume
metadata:
  name: kubedata
  labels:
    type: local
spec:
  storageClassName: hostpath
  capacity:
    storage: 256Mi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: /mnt/kubedata
  persistentVolumeReclaimPolicy: Retain
---
#
# PersistentVolumeClaim
#
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name:kubedata
spec:
  storageClassName: hostpath
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 256Mi

I hope you enjoyed this post and if you spot errors, please let me know in the comments below on on email directly.