Resizing an OSD
Prerequisites
- You need a shell inside the Ceph toolbox. You can open such a shell with: - $ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash 
- The ceph cluster must be healthy. 
Change the size of a single OSD
In theory, there are two ways to go about this. In general, the plan is to increase the OSD size in the cluster configuration and then substitute an existing OSD. This can be done by either first removing the existing OSD or by first adding a new OSD.
It is important to remember the restrictions around removal of OSDs: only the OSD on the highest numbered volume can be permanently removed from the cluster; any other OSD would be recreated by the operator, despite a reduced OSD count.
This makes it impossible to first add a larger OSD and then (permanently) remove a smaller OSD: the newly added larger OSD would have the highest numbered volume, and would thus be the only OSD you can permanently remove from the cluster.
Thus, the only feasible way is to first remove any OSD and have the operator replace it with a larger one. Note that this replacement is exactly the effect which prevents us from permanently removing any OSD except the one on the highest numbered volume.
If you do not have enough space in the cluster to buffer away the data of an OSD, you can temporarily increase the OSD count by one and later remove that newly created OSD.
Note that all of this applies in the same way to resizing an OSD to a smaller volume size and to resizing many OSDs (though you probably shouldn’t resize many at once, for your own cognitive good).
The steps are in many places very similar to removing an OSD, but since we don’t want to change the overall amount of OSDs, we’ll never decrease the OSD count.
Caution
This process will create and attach Cinder Volumes. Due to a known kernel bug, this may crash the Ceph workers. Ceph will recover from this, however, it may lead to temporarily unavailable data.
- Ensure that the cluster is healthy and pick a victim OSD to remove: - toolbox# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 0.08690 1.00000 89 GiB 7.5 GiB 6.5 GiB 0 B 1 GiB 81 GiB 8.47 1.19 18 up 2 hdd 0.08690 1.00000 89 GiB 6.1 GiB 5.1 GiB 20 KiB 1024 MiB 83 GiB 6.80 0.95 18 up 1 hdd 0.08690 1.00000 89 GiB 5.5 GiB 4.5 GiB 24 KiB 1024 MiB 84 GiB 6.16 0.86 12 up TOTAL 267 GiB 19 GiB 16 GiB 44 KiB 3.0 GiB 248 GiB 7.14 MIN/MAX VAR: 0.86/1.19 STDDEV: 0.98 toolbox# ceph -s cluster: id: 9c61da6b-67e9-4acd-a25c-929db5cbb3f2 health: HEALTH_WARN 3 slow ops, oldest one blocked for 512 sec, mon.c has slow ops services: mon: 3 daemons, quorum a,b,c (age 8s) mgr: a(active, since 32m) mds: ceph-fs:1 {0=ceph-fs-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 29m), 3 in (since 44m) data: pools: 3 pools, 48 pgs objects: 4.21k objects, 16 GiB usage: 19 GiB used, 248 GiB / 267 GiB avail pgs: 48 active+clean io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr - Pick one of the OSDs. In this example, we’ll “resize” the OSD with ID 2. - We need to find the volume which the OSD uses, which we do with: - $ kubectl -n rook-ceph describe pod -l ceph-osd-id=2 […] cinder-1-ceph-data-8hd8s: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: cinder-1-ceph-data-8hd8s ReadOnly: false […] - $idshall now refer to- 2,- $nameshall refer to- osd.2,- $volumeshall refer to- cinder-1-ceph-data-8hd8s.- (This is also an excellent example of OSD IDs not mapping 1:1 to volume indices.) 
- Set the new target volume size. Update the config of the Kubernetes cluster by setting - k8s-service-layer.rook.osd_volume_sizeto the new desired size.
- Run apply-k8s-supplements.sh (possibly with - AFLAGS="-t rook --diff"to only apply rook changes).
- Evict all data from the victim OSD. - toolbox# ceph osd crush reweight $name 0
- Wait for the migration to finish. - You can run - watch ceph osd dfas well as- watch ceph -sto observe the migration status; the former will show how the number of placement groups (- PGScolumn) for that OSD decreases, while the latter will show the status of the cluster overall.- The migration is over when: - The number of placement groups for your victim OSD is 0 
- All placement groups show as active+clean in - ceph -s
 - Note - The - RAW USEcolumn of the- ceph osd dfoutput does not decrease for some reason. The column to look at is- DATA, which should reduce to something in the order of- 10 MiB.- toolbox# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 0.08690 1.00000 89 GiB 11 GiB 9.7 GiB 0 B 1 GiB 78 GiB 11.97 1.46 29 up 2 hdd 0 1.00000 89 GiB 3.0 GiB 24 MiB 20 KiB 1024 MiB 86 GiB 3.39 0.41 0 up 1 hdd 0.08690 1.00000 89 GiB 8.3 GiB 7.3 GiB 24 KiB 1024 MiB 81 GiB 9.27 1.13 19 up TOTAL 267 GiB 22 GiB 17 GiB 44 KiB 3.0 GiB 245 GiB 8.21 MIN/MAX VAR: 0.41/1.46 STDDEV: 3.58 toolbox# ceph -s cluster: id: 9c61da6b-67e9-4acd-a25c-929db5cbb3f2 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 9m) mgr: a(active, since 41m) mds: ceph-fs:1 {0=ceph-fs-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 38m), 3 in (since 2m) data: pools: 3 pools, 48 pgs objects: 4.21k objects, 16 GiB usage: 21 GiB used, 246 GiB / 267 GiB avail pgs: 48 active+clean io: client: 852 B/s rd, 1 op/s rd, 0 op/s wr 
- Mark the OSD as out. - toolbox# ceph osd out $name- ceph osd dfshould now show it with all zeros, and- ceph -sshould still be- HEALTH_OKwith all placement groups being- active+clean, since the data has been moved elsewhere. If this is not the case abort now and seek help immediately!- toolbox# ceph -s […] services: mon: 3 daemons, quorum a,b,c (age 10m) mgr: a(active, since 42m) mds: ceph-fs:1 {0=ceph-fs-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 39m), 2 in (since 22s) data: pools: 3 pools, 48 pgs objects: 4.21k objects, 16 GiB usage: 21 GiB used, 246 GiB / 267 GiB avail pgs: 48 active+clean […] 
- Restart the operator to trigger the removal of the evicted OSD: - $ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 0 $ sleep 5 $ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 1 
- Wait until the operator has deleted the OSD pod (should be at most 10 minutes). - $ watch kubectl -n rook-ceph get pod -l ceph-osd-id=$id - This command should print “No resources found in rook-ceph namespace.”. - (Rook will auto-delete OSDs which are marked as out and have no placement groups.) 
- Purge the OSD. If the data has not been moved, data loss will occur here! - toolbox# ceph osd purge $name- Note - You do not need - --yes-i-really-mean-itsince all data was moved to another device. If ceph asks you for- --yes-i-really-mean-itsomething is wrong!- ceph osd dfshould not list the OSD anymore, and- ceph -sshould say that there are now only 2 OSDs (if you started out with 3), all of which should be up and in.
- Delete the preparation job and the PVC. - Caution - Deleting the wrong job + pvc will inevitably lead to loss of data! Double-check that you’re killing the correct volume by first running: - $ kubectl -n rook-ceph describe pvc $volume - The - MountedBy:line should only list a single user, which is- rook-ceph-osd-prepare-$volume-$suffix(where- $suffixis a random thing).- Once you’ve verified that, you can delete the job and the PVC: - $ kubectl -n rook-ceph delete job rook-ceph-osd-prepare-$volume $ kubectl -n rook-ceph delete pvc $volume - Verify that the volume is gone with - kubectl get pvc -n rook-cephand in openstack.
- Restart the operator to trigger re-creation of the OSD. - $ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 0 $ sleep 5 $ kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas 1 - You can observe progress by watching the pod list and - ceph osd df. The newly created volume (- kubectl -n rook-ceph get pvc) should have the new size.- The process is done when you see: - all OSDs in - ceph -sas up and in, and
- all placement groups as - active+clean
 - toolbox# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 hdd 0.18459 1.00000 189 GiB 9.9 GiB 8.9 GiB 0 B 1 GiB 179 GiB 5.25 0.91 27 up 0 hdd 0.08690 1.00000 89 GiB 6.2 GiB 5.2 GiB 0 B 1 GiB 83 GiB 6.92 1.20 14 up 1 hdd 0.08690 1.00000 89 GiB 5.0 GiB 2.0 GiB 44 KiB 1024 MiB 84 GiB 5.66 0.98 7 up TOTAL 367 GiB 21 GiB 16 GiB 44 KiB 3.0 GiB 346 GiB 5.75 MIN/MAX VAR: 0.91/1.20 STDDEV: 0.73 toolbox# ceph -s cluster: id: 9c61da6b-67e9-4acd-a25c-929db5cbb3f2 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 27s) mgr: a(active, since 12m) mds: ceph-fs:1 {0=ceph-fs-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 12m), 3 in (since 12m) data: pools: 3 pools, 48 pgs objects: 4.21k objects, 16 GiB usage: 21 GiB used, 346 GiB / 367 GiB avail pgs: 48 active+clean io: client: 853 B/s rd, 1 op/s rd, 0 op/s wr