Changes for page Ceph

Last modified by Jonas Jelten on 2024/09/13 15:05

From version 3.1
edited by Jonas Jelten
on 2024/08/23 14:09
Change comment: There is no comment for this version
To version 2.1
edited by Jonas Jelten
on 2024/08/23 13:48
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -20,7 +20,7 @@
20 20  
21 21  The name of an RBD is `ORG-name/namespacename/rbdname.`
22 22  
23 -To request the creation (or extension) of an RBD, write to [support@ito.cit.tum.de](support@ito.cit.tum.de) specifying **name**, **size**, **namespace** and **HDD/SSD**.
23 +To request the creation (or extension) of an RBD, write to [[support@ito.cit.tum.de|mailto:support@ito.cit.tum.de]] specifying **name**, **size**, **namespace** and **HDD/SSD**.
24 24  
25 25  You will get back a secret **keyring** to access the namespace.
26 26  
... ... @@ -40,137 +40,9 @@
40 40   * Permissions: 700
41 41   * Owner: root
42 42   * Content: the client identifier and 28 byte key in base64 encoding.
43 + * [client.ORG.rbd.namespacename]
44 + key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg==
43 43  
44 -```
45 -[client.ORG.rbd.namespacename]
46 -key = ASD+OdlsdoTQJxFFljfCDEf/ASDFlYIbEbZatg==
47 -```
48 -
49 -* `systemctl enable --now rbdmap.service` so the RBD device is created and on system starts.
50 -* You should now have a `/dev/rbd0` device
51 -* You can list current mapping status with `rbd device list`
52 -* You can manually map/unmap with `rbd device map $rbdname` and `rbd device unmap $rbdname`
53 -
54 -Now you have a raw storage device, but you can't yet store files on it, since you are missing a filesystem.
55 -
56 -
57 57  ## RBD formatting
58 58  
59 59  Now that you have mapped your RBD, we can create file system structures on it.
60 -
61 -This is as simple as running:
62 -
63 -```
64 -mkfs.ext4 -E nodiscard,stride=1024,stripe_width=1024 /dev/rbdxxx
65 -```
66 -
67 -get the newly created filesystem UUID:
68 -```
69 -sudo blkid /dev/rbdxxx
70 -```
71 -
72 -Now we create an entry in `/etc/fstab` with `noauto` so the below script triggers the mount, and the mount is not done too early in the boot.
73 -
74 -`/etc/fstab`:
75 -```
76 -UUID=your-new-fs-uuid /your/mount/point ext4 defaults,_netdev,acl,noauto,nodev,nosuid,noatime,stripe=1024 0 0
77 -```
78 -
79 -In order to mount this filesystem in your server, we need a mount helper script (otherwise the RBD is not yet mapped on system start when `/etc/fstab` tries to mount it directly during boot).
80 -
81 -`/etc/ceph/rbd.d/ORG-rbd/namespacename/rbdname`:
82 -```bash
83 -#!/bin/bash
84 -
85 -# lvm may disable vgs when not all blocks were available during scan
86 -pvscan
87 -vgchange -ay
88 -
89 -# mount all the filesystems
90 -mountpoint -q /your/mount/point || mount /your/mount/point
91 -```
92 -Mark this script *executable* so `rbdmap` can execute it as post-mapping hook!
93 -
94 -To test, either restart `rbdmap.service` or manually call `umount` and `mount` for `/your/mount/point`.
95 -
96 -
97 -## LVM on RBD
98 -
99 -You can create LVM `pvs` and `lvs` on your RBD. You can use this for read/write caching, for example (see below).
100 -This works like usual, just do `pvcreate` etc.
101 -
102 -
103 -## RBD tuning
104 -
105 -To get more performance, there's some useful tweaks
106 -
107 -### CPU Bugs
108 -
109 -When your server is sufficiently shielded behind firewalls and it isn't susceptible to attacks, disable the cpu bug mitigations for a performance boost as a kernel command line parameter:
110 -
111 -`/etc/default/grub`:
112 -```
113 -GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off"
114 -```
115 -
116 -### Read-Ahead
117 -
118 -We read ahead 1MiB, since Ceph stores the objects in 4MiB blocks anyway.
119 -
120 -`/etc/udev/rules.d/90-ceph-rbd.rules`:
121 -```
122 -KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="disk", ACTION=="add|change", ATTR{bdi/read_ahead_kb}="1024" ATTR{queue/scheduler}="none" ATTR{queue/wbt_lat_usec}="0" ATTR{queue/nr_requests}="2048"
123 -```
124 -
125 -### LVM-Cache
126 -
127 -see `man 7 lvmcache`.
128 -We can cache the RBD on a local NVMe for more performance.
129 -
130 -* `/dev/fastdevice` is the name of the local NVMe.
131 -* `/dev/datavg/datalv` is your name of your existing logical volume containing all the stored data on Ceph.
132 -* we recommend writeback caching
133 -
134 -```bash
135 -## setup
136 -# cache device
137 -pvcreate /dev/fastdevice
138 -
139 -# add cache device to vg to cache
140 -vgextend datavg /dev/fastdevice
141 -
142 -# create cache pool (meta+data combined):
143 -lvcreate -n cache --type cache-pool -l '100%FREE' datavg /dev/fastdevice
144 -
145 -# enable caching
146 -#
147 -# --type cache (recommended): use dm-cache for read and writecache
148 -# --cachemode: do we cache writes?
149 -# buffer writes: writeback
150 -# no write buffering: writethrough
151 -#
152 -# --type writecache: only ever cache writes, not reads
153 -#
154 -# --chunksize data block management size
155 -lvconvert --type cache --cachepool cache --cachemode writeback --chunksize 1024KiB /dev/datavg/datalv
156 -
157 -## status
158 -# check status
159 -lvs -ao+devices
160 -
161 -## resizing
162 -lvconvert --splitcache /dev/datavg/datalv
163 -lvextend -l +100%FREE /dev/datavg/datalv
164 -lvconvert ... # to enable caching again
165 -
166 -## disabling
167 -# deactivate and keep cache lv
168 -lvconvert --splitcache /dev/datavg/datalv
169 -
170 -# disable and delete cache lv -> cache-pv still part of vg!
171 -# watch out when resizing the lv -> the cache-pv will get parts of the lv then, use pvmove to remove again.
172 -lvconvert --uncache /dev/datavg/datalv
173 -
174 -# remove pv from vg
175 -lvreduce datavg /dev/fastdevice
176 -```