Changes for page Ceph
Last modified by Jonas Jelten on 2024/09/13 15:05
From version 3.1
edited by Jonas Jelten
on 2024/08/23 14:09
on 2024/08/23 14:09
Change comment:
There is no comment for this version
To version 2.1
edited by Jonas Jelten
on 2024/08/23 13:48
on 2024/08/23 13:48
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -20,7 +20,7 @@ 20 20 21 21 The name of an RBD is `ORG-name/namespacename/rbdname.` 22 22 23 -To request the creation (or extension) of an RBD, write to [support@ito.cit.tum.de ](support@ito.cit.tum.de)specifying **name**, **size**, **namespace** and **HDD/SSD**.23 +To request the creation (or extension) of an RBD, write to [[support@ito.cit.tum.de|mailto:support@ito.cit.tum.de]] specifying **name**, **size**, **namespace** and **HDD/SSD**. 24 24 25 25 You will get back a secret **keyring** to access the namespace. 26 26 ... ... @@ -40,137 +40,9 @@ 40 40 * Permissions: 700 41 41 * Owner: root 42 42 * Content: the client identifier and 28 byte key in base64 encoding. 43 + * [client.ORG.rbd.namespacename] 44 + key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg== 43 43 44 -``` 45 -[client.ORG.rbd.namespacename] 46 -key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg== 47 -``` 48 - 49 -* `systemctl enable --now rbdmap.service` so the RBD device is created and on system starts. 50 -* You should now have a `/dev/rbd0` device 51 -* You can list current mapping status with `rbd device list` 52 -* You can manually map/unmap with `rbd device map $rbdname` and `rbd device unmap $rbdname` 53 - 54 -Now you have a raw storage device, but you can't yet store files on it, since you are missing a filesystem. 55 - 56 - 57 57 ## RBD formatting 58 58 59 59 Now that you have mapped your RBD, we can create file system structures on it. 60 - 61 -This is as simple as running: 62 - 63 -``` 64 -mkfs.ext4 -E nodiscard,stride=1024,stripe_width=1024 /dev/rbdxxx 65 -``` 66 - 67 -get the newly created filesystem UUID: 68 -``` 69 -sudo blkid /dev/rbdxxx 70 -``` 71 - 72 -Now we create an entry in `/etc/fstab` with `noauto` so the below script triggers the mount, and the mount is not done too early in the boot. 73 - 74 -`/etc/fstab`: 75 -``` 76 -UUID=your-new-fs-uuid /your/mount/point ext4 defaults,_netdev,acl,noauto,nodev,nosuid,noatime,stripe=1024 0 0 77 -``` 78 - 79 -In order to mount this filesystem in your server, we need a mount helper script (otherwise the RBD is not yet mapped on system start when `/etc/fstab` tries to mount it directly during boot). 80 - 81 -`/etc/ceph/rbd.d/ORG-rbd/namespacename/rbdname`: 82 -```bash 83 -#!/bin/bash 84 - 85 -# lvm may disable vgs when not all blocks were available during scan 86 -pvscan 87 -vgchange -ay 88 - 89 -# mount all the filesystems 90 -mountpoint -q /your/mount/point || mount /your/mount/point 91 -``` 92 -Mark this script *executable* so `rbdmap` can execute it as post-mapping hook! 93 - 94 -To test, either restart `rbdmap.service` or manually call `umount` and `mount` for `/your/mount/point`. 95 - 96 - 97 -## LVM on RBD 98 - 99 -You can create LVM `pvs` and `lvs` on your RBD. You can use this for read/write caching, for example (see below). 100 -This works like usual, just do `pvcreate` etc. 101 - 102 - 103 -## RBD tuning 104 - 105 -To get more performance, there's some useful tweaks 106 - 107 -### CPU Bugs 108 - 109 -When your server is sufficiently shielded behind firewalls and it isn't susceptible to attacks, disable the cpu bug mitigations for a performance boost as a kernel command line parameter: 110 - 111 -`/etc/default/grub`: 112 -``` 113 -GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off" 114 -``` 115 - 116 -### Read-Ahead 117 - 118 -We read ahead 1MiB, since Ceph stores the objects in 4MiB blocks anyway. 119 - 120 -`/etc/udev/rules.d/90-ceph-rbd.rules`: 121 -``` 122 -KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="disk", ACTION=="add|change", ATTR{bdi/read_ahead_kb}="1024" ATTR{queue/scheduler}="none" ATTR{queue/wbt_lat_usec}="0" ATTR{queue/nr_requests}="2048" 123 -``` 124 - 125 -### LVM-Cache 126 - 127 -see `man 7 lvmcache`. 128 -We can cache the RBD on a local NVMe for more performance. 129 - 130 -* `/dev/fastdevice` is the name of the local NVMe. 131 -* `/dev/datavg/datalv` is your name of your existing logical volume containing all the stored data on Ceph. 132 -* we recommend writeback caching 133 - 134 -```bash 135 -## setup 136 -# cache device 137 -pvcreate /dev/fastdevice 138 - 139 -# add cache device to vg to cache 140 -vgextend datavg /dev/fastdevice 141 - 142 -# create cache pool (meta+data combined): 143 -lvcreate -n cache --type cache-pool -l '100%FREE' datavg /dev/fastdevice 144 - 145 -# enable caching 146 -# 147 -# --type cache (recommended): use dm-cache for read and writecache 148 -# --cachemode: do we cache writes? 149 -# buffer writes: writeback 150 -# no write buffering: writethrough 151 -# 152 -# --type writecache: only ever cache writes, not reads 153 -# 154 -# --chunksize data block management size 155 -lvconvert --type cache --cachepool cache --cachemode writeback --chunksize 1024KiB /dev/datavg/datalv 156 - 157 -## status 158 -# check status 159 -lvs -ao+devices 160 - 161 -## resizing 162 -lvconvert --splitcache /dev/datavg/datalv 163 -lvextend -l +100%FREE /dev/datavg/datalv 164 -lvconvert ... # to enable caching again 165 - 166 -## disabling 167 -# deactivate and keep cache lv 168 -lvconvert --splitcache /dev/datavg/datalv 169 - 170 -# disable and delete cache lv -> cache-pv still part of vg! 171 -# watch out when resizing the lv -> the cache-pv will get parts of the lv then, use pvmove to remove again. 172 -lvconvert --uncache /dev/datavg/datalv 173 - 174 -# remove pv from vg 175 -lvreduce datavg /dev/fastdevice 176 -```