Changes for page Ceph
Last modified by Jonas Jelten on 2024/09/13 15:05
From version 2.1
edited by Jonas Jelten
on 2024/08/23 13:48
on 2024/08/23 13:48
Change comment:
There is no comment for this version
To version 4.1
edited by Jonas Jelten
on 2024/08/23 14:11
on 2024/08/23 14:11
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -20,7 +20,7 @@ 20 20 21 21 The name of an RBD is `ORG-name/namespacename/rbdname.` 22 22 23 -To request the creation (or extension) of an RBD, write to [ [support@ito.cit.tum.de|mailto:support@ito.cit.tum.de]]specifying **name**, **size**, **namespace** and **HDD/SSD**.23 +To request the creation (or extension) of an RBD, write to [support@ito.cit.tum.de](support@ito.cit.tum.de) specifying **name**, **size**, **namespace** and **HDD/SSD**. 24 24 25 25 You will get back a secret **keyring** to access the namespace. 26 26 ... ... @@ -40,9 +40,138 @@ 40 40 * Permissions: 700 41 41 * Owner: root 42 42 * Content: the client identifier and 28 byte key in base64 encoding. 43 - * [client.ORG.rbd.namespacename] 44 - key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg== 45 45 44 +``` 45 +[client.ORG.rbd.namespacename] 46 +key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg== 47 +``` 48 + 49 +* `systemctl enable --now rbdmap.service` so the RBD device is created and on system starts. 50 +* You should now have a `/dev/rbd0` device 51 +* You can list current mapping status with `rbd device list` 52 +* You can manually map/unmap with `rbd device map $rbdname` and `rbd device unmap $rbdname` 53 + 54 +Now you have a raw storage device, but you can't yet store files on it, since you are missing a filesystem. 55 + 46 46 ## RBD formatting 47 47 48 48 Now that you have mapped your RBD, we can create file system structures on it. 59 + 60 +This is as simple as running: 61 + 62 +``` 63 +mkfs.ext4 -E nodiscard,stride=1024,stripe_width=1024 /dev/rbdxxx 64 +``` 65 + 66 +get the newly created filesystem UUID: 67 + 68 +``` 69 +sudo blkid /dev/rbdxxx 70 +``` 71 + 72 +Now we create an entry in `/etc/fstab` with `noauto` so the below script triggers the mount, and the mount is not done too early in the boot. 73 + 74 +`/etc/fstab`: 75 + 76 +``` 77 +UUID=your-new-fs-uuid /your/mount/point ext4 defaults,_netdev,acl,noauto,nodev,nosuid,noatime,stripe=1024 0 0 78 +``` 79 + 80 +In order to mount this filesystem in your server, we need a mount helper script (otherwise the RBD is not yet mapped on system start when `/etc/fstab` tries to mount it directly during boot). 81 + 82 +`/etc/ceph/rbd.d/ORG-rbd/namespacename/rbdname`: 83 + 84 +```bash 85 +#!/bin/bash 86 + 87 +# lvm may disable vgs when not all blocks were available during scan 88 +pvscan 89 +vgchange -ay 90 + 91 +# mount all the filesystems 92 +mountpoint -q /your/mount/point || mount /your/mount/point 93 +``` 94 + 95 +Mark this script _executable_ so `rbdmap` can execute it as post-mapping hook! 96 + 97 +To test, either restart `rbdmap.service` or manually call `umount` and `mount` for `/your/mount/point`. 98 + 99 +## LVM on RBD 100 + 101 +You can create LVM `pvs` and `lvs` on your RBD. You can use this for read/write caching, for example (see below). This works like usual, just do `pvcreate` etc. 102 + 103 +## RBD tuning 104 + 105 +To get more performance, there's some useful tweaks 106 + 107 +### CPU Bugs 108 + 109 +When your server is sufficiently shielded behind firewalls and it isn't susceptible to attacks, disable the cpu bug mitigations for a performance boost as a kernel command line parameter: 110 + 111 +`/etc/default/grub`: 112 + 113 +``` 114 +GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off" 115 +``` 116 + 117 +### Read-Ahead 118 + 119 +We read ahead 1MiB, since Ceph stores the objects in 4MiB blocks anyway. We also allow more parallel requests and use no IO scheduler (since Ceph is distributed with equal latency anyway). 120 + 121 +`/etc/udev/rules.d/90-ceph-rbd.rules`: 122 + 123 +``` 124 +KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="disk", ACTION=="add|change", ATTR{bdi/read_ahead_kb}="1024" ATTR{queue/scheduler}="none" ATTR{queue/wbt_lat_usec}="0" ATTR{queue/nr_requests}="2048" 125 +``` 126 + 127 +### LVM-Cache 128 + 129 +see `man 7 lvmcache`. We can cache the RBD on a local NVMe for more performance. 130 + 131 +* `/dev/fastdevice` is the name of the local NVMe. 132 +* `/dev/datavg/datalv` is your name of your existing logical volume containing all the stored data on Ceph. 133 +* we recommend read and write caching, and a local fastdevice size of at least 50GiB. the more the better :) 134 + 135 +```bash 136 +## setup 137 +# cache device 138 +pvcreate /dev/fastdevice 139 + 140 +# add cache device to vg to cache 141 +vgextend datavg /dev/fastdevice 142 + 143 +# create cache pool (meta+data combined): 144 +lvcreate -n cache --type cache-pool -l '100%FREE' datavg /dev/fastdevice 145 + 146 +# enable caching 147 +# 148 +# --type cache (recommended): use dm-cache for read and writecache 149 +# --cachemode: do we cache writes? 150 +# buffer writes: writeback 151 +# no write buffering: writethrough 152 +# 153 +# --type writecache: only ever cache writes, not reads 154 +# 155 +# --chunksize data block management size 156 +lvconvert --type cache --cachepool cache --cachemode writeback --chunksize 1024KiB /dev/datavg/datalv 157 + 158 +## status 159 +# check status 160 +lvs -ao+devices 161 + 162 +## resizing 163 +lvconvert --splitcache /dev/datavg/datalv 164 +lvextend -l +100%FREE /dev/datavg/datalv 165 +lvconvert ... # to enable caching again 166 + 167 +## disabling 168 +# deactivate and keep cache lv 169 +lvconvert --splitcache /dev/datavg/datalv 170 + 171 +# disable and delete cache lv -> cache-pv still part of vg! 172 +# watch out when resizing the lv -> the cache-pv will get parts of the lv then, use pvmove to remove again. 173 +lvconvert --uncache /dev/datavg/datalv 174 + 175 +# remove pv from vg 176 +lvreduce datavg /dev/fastdevice 177 +```