如何扩容 Bluestore block.wal 和 blockl.db 分区
iliul@地平线 Ceph开源社区
介绍
本篇介绍如何扩容 bluestore block.wal 和 blockl.db 分区以及有数据溢出到 slow db 后数据迁移方法(从 Ceph v14.1.0 后),下面操作示例以裸盘设备(如果通过 LVM 管理操作类似)为例进行
替换 wal, db 分区
查询当前 OSD.0 信息,确定block.db和block.wal对应的磁盘分区
tree -a /var/lib/ceph/osd/ceph-0ceph-0 ├── activate.monmap ├── block -> /dev/ceph-b17ef1f2-8c1a-4be1-97a7-c35b77d79e78/osd-block-45693cad-5a91-4dbc-8180-b25f4d864f33 ├── block.db -> /dev/sdd2 ├── block.wal -> /dev/sdd1 ├── bluefs ├── ceph_fsid ├── fsid ├── keyring ├── kv_backend ├── magic ├── mkfs_done ├── osd_key ├── ready ├── require_osd_release ├── type └── whoamifdisk -l /dev/sdd
Disk /dev/sdd: 931 GiB, 999653638144 bytes, 1952448512 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 0389A1C8-5306-AC45-8300-0B012020B755 Device Start End Sectors Size Type /dev/sdd1 2048 2099199 2097152 1G Linux filesystem /dev/sdd2 2099200 35653631 33554432 16G Linux filesystemsgdisk -p /dev/sdd
Disk /dev/sdd: 1952448512 sectors, 931.0 GiB Model: PERC H730P Mini Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 0389A1C8-5306-AC45-8300-0B012020B755 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 1952448478 Partitions will be aligned on 2048-sector boundaries Total free space is 1698691039 sectors (810.0 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 2099199 1024.0 MiB 8300 2 2099200 35653631 16.0 GiB 8300sgdisk -i 1 /dev/sdd 记录下Partition unique GUID下面创建对应的替换分区时需要
Partition GUID code: 0FC63DAF-8483-4772-8E79-3D69D8477DE4 ( Linux filesystem) Partition unique GUID: 98D073A1-925D-944D-9505-7FE489848305 First sector: 2048 (at 1024.0 KiB) Last sector: 2099199 (at 1025.0 MiB) Partition size: 2097152 sectors (1024.0 MiB) Attribute flags: 0000000000000000 Partition name: ''sgdisk -i 2 /dev/sdd 同上,记录下Partition unique GUID值
Partition GUID code: 0FC63DAF-8483-4772-8E79-3D69D8477DE4 ( Linux filesystem) Partition unique GUID: 4BBA676A-A628-904A-8C93-1B691161C5A1 First sector: 2099200 (at 1.0 GiB) Last sector: 35653631 (at 17.0 GiB) Partition size: 33554432 sectors (16.0 GiB) Attribute flags: 0000000000000000 Partition name: ''stop osd.0
# systemctl stop ceph-osd@0ceph-bluestore-tool show-label —path /var/lib/ceph/osd/ceph-0
{ "/var/lib/ceph/osd/ceph-0/block": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 998579896320, "btime": "2019-06-13T11:42:12.194273+0800", "description": "main", "bluefs": "1", "ceph_fsid": "45b2df47-f946-43e1-9a06-4832ff3e5c24", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQASxgFdn465LBAA7x6kMuoS9UiqwqVjDmHD9A==", "ready": "ready", "require_osd_release": "15", "whoami": "0" }, "/var/lib/ceph/osd/ceph-0/block.wal": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 1073741824, // 扩容前大小,1G "btime": "2019-06-13T11:42:12.196216+0800", "description": "bluefs wal" }, "/var/lib/ceph/osd/ceph-0/block.db": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 17179869184, // 扩容前大小,16G "btime": "2019-06-13T11:42:12.195025+0800", "description": "bluefs db" } }create new wal, db 准备要替换目标分区 /dev/sdd3 和 /dev/sdd4 , 这里注意 35653632 下一个扇区的起始位置,参考 sgdisk -p 输出结果,而 typecode 后面加要替换的分区 GUID,
参考如下帮助信息
-c, --change-name=partnum:name change partition's name -n, --new=partnum:start:end create new partition -t, --typecode=partnum:{hexcode|GUID} change partition type code -g, --mbrtogpt convert MBR to GPT
创建新分区 sdd3 和 sdd4
# sgdisk --new=3:35653632:+4GiB --change-name="3:ceph block.wal" --typecode="3:98D073A1-925D-944D-9505-7FE489848305" --mbrtogpt /dev/sdd # sgdisk --new=4:44042240:+100GiB --change-name="4:ceph block.db" --typecode="4:4BBA676A-A628-904A-8C93-1B691161C5A1" --mbrtogpt /dev/sdd
创建后分区信息
sgdisk -p /dev/sddNumber Start (sector) End (sector) Size Code Name 1 2048 2099199 1024.0 MiB 8300 2 2099200 35653631 16.0 GiB 8300 3 35653632 44042239 4.0 GiB FFFF ceph block.wal 4 44042240 253757439 100.0 GiB FFFF ceph block.dbdd status=progress if=/dev/sdd1 of=/dev/sdd3 拷贝 wal 分区
# dd status=progress if=/dev/sdd1 of=/dev/sdd3 1055113728 bytes (1.1 GB, 1006 MiB) copied, 50 s, 21.1 MB/s 2097152+0 records in 2097152+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 51.1283 s, 21.0 MB/sdd status=progress if=/dev/sdd2 of=/dev/sdd4 拷贝 db 分区
# dd status=progress if=/dev/sdd2 of=/dev/sdd4 17179038208 bytes (17 GB, 16 GiB) copied, 997 s, 17.2 MB/s 33554432+0 records in 33554432+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 998.58 s, 17.2 MB/s删除旧的分区并设置原UUID到新分区 取原分区 Partition unique GUID
# sgdisk --delete=1 --delete=2 --partition-guid="3:98D073A1-925D-944D-9505-7FE489848305" --partition-guid="4:4BBA676A-A628-904A-8C93-1B691161C5A1" /dev/sdd The operation has completed successfully.partprobe 更新内核分区表
# partprobeuse new partition 删除旧的符号链接
# cd /var/lib/ceph/osd/ceph-0/ # rm block.wal # rm block.db
创建新符号链接到 block.wal 和 block.db
# ln -s /dev/sdd3 block.wal # ln -s /dev/sdd4 block.db
配置新分区权限
# chown -R ceph:ceph /dev/sd*
设置分区类型 - Linux filesystem(20)
# fdisk /dev/sdd -> p -> t (3, 4) -> 20 > wsgdisk -p /dev/sdd 确认分区信息
Disk /dev/sdd: 1952448512 sectors, 931.0 GiB Model: PERC H730P Mini Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 0FC63DAF-8483-4772-8E79-3D69D8477DE4 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 1952448478 Partitions will be aligned on 2048-sector boundaries Total free space is 1734342623 sectors (827.0 GiB) Number Start (sector) End (sector) Size Code Name 3 35653632 44042239 4.0 GiB 8300 ceph block.wal 4 44042240 253757439 100.0 GiB 8300 ceph block.dbceph-bluestore-tool bluefs-bdev-expand —path /var/lib/ceph/osd/ceph-0 扩展 WAL, DB 分区
inferring bluefs devices from bluestore path 0 : device size 0x100000000 : own 0x[1000~3ffff000] = 0x3ffff000 : using 0x4ff000(5.0 MiB) 1 : device size 0x1900000000 : own 0x[2000~3ffffe000] = 0x3ffffe000 : using 0xb7bfe000(2.9 GiB) 2 : device size 0xe880000000 : own 0x[6f99900000~94cd00000] = 0x94cd00000 : using 0x9a200000(2.4 GiB) Expanding... 0 : expanding from 0x40000000 to 0x100000000 0 : size label updated to 4294967296 // 4G 1 : expanding from 0x400000000 to 0x1900000000 1 : size label updated to 107374182400 // 100Gceph-bluestore-tool show-label —path /var/lib/ceph/osd/ceph-0 检验扩展后OSD的 wal, db 分区信息
inferring bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-0/block": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 998579896320, "btime": "2019-06-13T11:42:12.194273+0800", "description": "main", "bluefs": "1", "ceph_fsid": "45b2df47-f946-43e1-9a06-4832ff3e5c24", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQASxgFdn465LBAA7x6kMuoS9UiqwqVjDmHD9A==", "ready": "ready", "require_osd_release": "15", "whoami": "0" }, "/var/lib/ceph/osd/ceph-0/block.wal": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 4294967296, // WAL 大小 4G "btime": "2019-06-13T11:42:12.196216+0800", "description": "bluefs wal" }, "/var/lib/ceph/osd/ceph-0/block.db": { "osd_uuid": "45693cad-5a91-4dbc-8180-b25f4d864f33", "size": 107374182400, // DB 大小 100G "btime": "2019-06-13T11:42:12.195025+0800", "description": "bluefs db" } }启动 osd.0 启动 OSD.0 观察 osd log 是否有异常报错,并观察集群状态
# systemctl start ceph-osd@0当前集群状态 仍有告警提示,但注意到 db device 已经显示为 100G
BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s) osd.0 spilled over 1.1 GiB metadata from 'db' device (4.8 GiB used of 100 GiB) to slow device核对 BlueFS 信息 如下分别显示 WAL, DB 分区大小,并且显示了仍然使用了 slow device,需要进行 migrate 操作
# ceph daemon osd.0 perf dump | jq .bluefs { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 107374174208, // 100G DB 分区 "db_used_bytes": 5129625600, "wal_total_bytes": 4294963200, // 4G WAL 分区 "wal_used_bytes": 521138176, "slow_total_bytes": 39943405568, "slow_used_bytes": 1176502272, // slow 已用大小,需要迁移 "num_files": 104, "log_bytes": 1908736, "log_compactions": 2, "logged_bytes": 278183936, "files_written_wal": 2, "files_written_sst": 129, "bytes_written_wal": 6509639753, "bytes_written_sst": 7286006904, "bytes_written_slow": 0, "max_bytes_wal": 549449728, "max_bytes_db": 6008332288, "max_bytes_slow": 0, "read_random_count": 57820, "read_random_bytes": 7151101853, "read_random_disk_count": 19252, "read_random_disk_bytes": 6999861094, "read_random_buffer_count": 38650, "read_random_buffer_bytes": 151240759, "read_count": 15253, "read_bytes": 675829317, "read_prefetch_count": 4902, "read_prefetch_bytes": 337983129 }
迁移已溢出的元数据
将已经溢出的 metadata 数据移回到 block.db 分区
停掉 OSD.0# systemctl stop ceph-osd@0执行迁移
# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source /var/lib/ceph/osd/ceph-0/block --dev-target /var/lib/ceph/osd/ceph-0/block.db --command bluefs-bdev-migrate inferring bluefs devices from bluestore path设置分区权限
# chown -R ceph:ceph /dev/sdd* # chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/block.db启动 OSD.0
# systemctl start ceph-osd@0检查 bluefs 统计信息
# ceph daemon osd.0 perf dump | jq .bluefs | grep slow_used_bytes "slow_used_bytes": 0,
如上,数据已经从 slow device 中迁走,集群告警消失
查看更多关于如何扩容 Bluestore block.wal 和 blockl.db 分区的详细内容...
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://www.haodehen.cn/did127063