ceph版本说明
代号
版本号
状态
发布时间
支持截止
重要特性
Octopus
15.2
终止支持
2020-03-20
2022-06
引入 cephadm
Pacific
16.2
LTS 支持中
2021-03-31
2025-03
RBD 即时克隆
Quincy
17.2
LTS 稳定版
2022-04-19
2026-05
增强安全性
Reef
18.2
STS 稳定版
2023-05-31
2024-05
性能优化
Squid
19.2
开发中
-
-
下一代版本
其中红帽 Ceph Storage 5 对应上游 Ceph 的 Octopus 版本 (v15.2.x)
这里有Octopus还支持在Centos7.9上部署,因为7.9内核版本实在太低了
部署方式介绍
方式
自动化程度
学习曲线
K8s集成
适用规模
运维复杂度
ceph-deploy
❌ 最低
⭐⭐⭐⭐⭐
❌
<10 节点
高
cephadm
✅ 高
⭐⭐
✅
任意规模
低
Rook
✅ 极高
⭐
✅ 原生
云原生环境
极低
ceph-ansible
✅ 中
⭐⭐⭐
⚠️ 需适配
50-500 节点
中
商业工具
✅ 高
⭐
✅
大规模企业级
低
之前用ceph-deploy部署过,那真是太折磨了
cephadm组件
cephadm shell:用来执行集群部署与管理任务的命令行
cephadm orchestrator:协调集群之间的配置更改
cephadm特点
可以登录到镜像仓库来拉取ceph镜像,并使用对应镜像在ceph节点上进行部署
使用ssh连接,可以向集群中添加主机,添加存储和监控主机
使用容器化部署,在bootstrap节点上不需要额外的软件,在bootstrap上使用命令行部署集群
提供了两个接口管理ceph,一个是命令行,一个是GUI,初始化集群后默认部署
使用ceph orchestrator(编排器)作为守护进程,支持扩容缩容集群
集群部署流程 1 2 3 4 5 6 7 8 9 1.在bootstrap节点上安装cephadm-ansible的包 2.运行cephadm的预安装playbook 3.使用cephadm引导集群 (1)bootstrap节点安装与启动ceph Mon与Mgr (2)创建/etc/ceph目录 (3)准备ssh密钥放入/etc/ceph/ceph.pub,并添加到authorozed_keys,做免密 (4)将集群通信最小配置文件写入/etc/ceph/ceph.conf (5)写入/etc/ceph/ceph.client.admin.keyring管理密钥client.admin (6)使用监控组件,如Prometheus grafana
集群部署 环境说明 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 192.168.10.110 ansible 192.168.10.141 cephadm-1 192.168.10.142 cephadm-2 192.168.10.143 cephadm-3 ansible与ceph主机均为Rocky8.10 3*60G SATA盘 ceph v17.2.8 quincy (stable) nmcli con modify ens18 ipv4.addresses 192.168.10.141/24 ipv4.gateway 192.168.10.1 \ ipv4.method manual ipv4.dns 8.8.8.8 nmcli connection up ens18 hostnamectl set-hostname cephadm-1 && bash nmcli con modify ens18 ipv4.addresses 192.168.10.142/24 ipv4.gateway 192.168.10.1 \ ipv4.method manual ipv4.dns 8.8.8.8 nmcli connection up ens18 hostnamectl set-hostname cephadm-2 && bash nmcli con modify ens18 ipv4.addresses 192.168.10.143/24 ipv4.gateway 192.168.10.1 \ ipv4.method manual ipv4.dns 8.8.8.8 nmcli connection up ens18 hostnamectl set-hostname cephadm-3 && bash
cephadm-ansible环境检查 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 yum -y install yum-utils yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin cat > /etc/docker/daemon.json << EOF { "registry-mirrors":["[https://a88uijg4.mirror.aliyuncs.com](https://a88uijg4.mirror.aliyuncs.com/)", "[https://docker.lmirror.top](https://docker.lmirror.top/)", "[https://docker.m.daocloud.io](https://docker.m.daocloud.io/)", "[https://hub.uuuadc.top](https://hub.uuuadc.top/)", "[https://docker.anyhub.us.kg](https://docker.anyhub.us.kg/)", "[https://dockerhub.jobcher.com](https://dockerhub.jobcher.com/)", "[https://dockerhub.icu](https://dockerhub.icu/)", "[https://docker.ckyl.me](https://docker.ckyl.me/)", "[https://docker.awsl9527.cn](https://docker.awsl9527.cn/)", "[https://docker.laoex.link](https://docker.laoex.link/)"] } EOF systemctl daemon-reload systemctl restart docker systemctl enable docker yum -y install python3 ansible-core git clone https://github.com/ceph/cephadm-ansible.git cd cephadm-ansiblevim inventory [admin] ansible ansible_host=192.168.10.110 [storage] cephadm-1 ansible_host=192.168.10.141 cephadm-2 ansible_host=192.168.10.142 cephadm-3 ansible_host=192.168.10.143 ansible-galaxy collection install community.general ansible -i inventory storage -m yum -a 'name=podman,runc state=absent' ansible -i inventory storage -m yum -a 'name=yum-utils state=present' ansible -i inventory storage -m shell -a 'yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo' ansible -i inventory storage -m yum -a 'name=docker-ce,docker-ce-cli,containerd.io,docker-buildx-plugin,docker-compose-plugin state=present disable_gpg_check=yes' ansible -i inventory storage -m copy -a 'src=/etc/docker/daemon.json dest=/etc/docker/daemon.json' ansible -i inventory storage -m shell -a 'systemctl daemon-reload' ansible -i inventory storage -m service -a 'name=docker state=restarted' ansible -i inventory storage -m service -a 'name=docker enabled=yes' ceph_defaults/defaults/main.yml 关键变量: ceph_release: 17.2.8 ceph_mirror: https://mirrors.aliyun.com/ceph ceph_stable_key: https://mirrors.aliyun.com/ceph/keys/release.asc 相关仓库可以在https://mirrors.aliyun.com/ceph中查找是否支持 ansible-playbook -i inventory cephadm-preflight.yml 能跑完就ok,cephadm和ceph-common全都安装好了
cephadm引导集群安装 配完docker镜像加速之后镜像能随便拉
单机创建集群(不推荐)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 cephadm bootstrap \ --mon-ip 192.168.10.110 \ --initial-dashboard-user admin \ --initial-dashboard-password wangsheng ... URL: https://ansible:8443/ User: admin Password: wangsheng ... Bootstrap complete. ceph orch host add cephadm-1 192.168.10.142
使用配置文件批量添加(推荐)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 vim /etc/ceph/initial-config-primary-cluster.yaml service_type: host addr: 192.168.10.110 hostname: ansible labels: [admin] --- service_type: host addr: 192.168.10.141 hostname: cephadm-1 labels: - ssd-node --- service_type: host addr: 192.168.10.142 hostname: cephadm-2 labels: - ssd-node --- service_type: host addr: 192.168.10.143 hostname: cephadm-3 labels: - ssd-node - high-memory --- service_type: mon placement: hosts: - cephadm-1 - cephadm-2 - cephadm-3 --- service_type: rgw service_id: realm.zone placement: hosts: - cephadm-1 - cephadm-2 - cephadm-3 --- service_type: mgr placement: hosts: - cephadm-1 - cephadm-2 - cephadm-3 --- service_type: osd service_id: default_drive_group placement: host_pattern: 'cephadm-*' label: "ssd-node" count: 3 data_devices: paths: - /dev/sdb - /dev/sdc - /dev/sdd cephadm bootstrap --mon-ip=192.168.10.110 \ --initial-dashboard-password=wangsheng \ --initial-dashboard-user admin \ --dashboard-password-noupdate --allow-fqdn-hostname \ --apply-spec /etc/ceph/initial-config-primary-cluster.yaml \ --allow-overwrite Enabling autotune for osd_memory_target You can access the Ceph CLI with: sudo /usr/sbin/cephadm shell --fsid 8b27889e-52af-11f0-bedb-bc2411f9a113 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring Please consider enabling telemetry to help improve Ceph: ceph telemetry on Bootstrap complete. ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 3m ago 10m count:1 crash 4/4 4m ago 10m * grafana ?:3000 1/1 3m ago 10m count:1 mgr 3/3 4m ago 9m cephadm-1;cephadm-2;cephadm-3 mon 3/3 4m ago 9m cephadm-1;cephadm-2;cephadm-3 node-exporter ?:9100 4/4 4m ago 10m * osd.default_drive_group 3 4m ago 9m count:3;label:ssd-node;cephadm-* prometheus ?:9095 1/1 3m ago 10m count:1 rgw.realm.zone ?:80 3/3 4m ago 7m cephadm-1;cephadm-2;cephadm-3 可以看到mon和mgr都只有这三个节点有,主节点没有 service_type: mgr placement: hosts: - ansible - cephadm-1 - cephadm-2 - cephadm-3 ceph orch apply -i /etc/ceph/initial-config-primary-cluster.yaml ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 25s ago 23m count:1 crash 4/4 7m ago 23m * grafana ?:3000 1/1 25s ago 23m count:1 mgr 4/4 7m ago 104s ansible;cephadm-1;cephadm-2;cephadm-3 mon 3/3 7m ago 104s cephadm-1;cephadm-2;cephadm-3 node-exporter ?:9100 4/4 7m ago 23m * osd.default_drive_group 3 7m ago 104s count:3;label:ssd-node;cephadm-* prometheus ?:9095 1/1 25s ago 23m count:1 rgw.realm.zone ?:80 3/3 7m ago 87s cephadm-1;cephadm-2;cephadm-3 Ceph MGR 以高可用模式运行,多个实例通过仲裁选举一个 Leader(活跃节点)。 客户端访问 Dashboard 时,MGR 会返回 Leader 节点的地址(通过 HTTP 302 重定向)。 我现在的leader节点就是143
集群的重新安装 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 已经使用cephadm bootstrap后的bootstrap节点 如果再使用cephadm bootstrap就会因为节点上有mon等服务,端口检查报错: cephadm bootstrap --mon-ip=192.168.10.110 --initial-dashboard-password=wangsheng --initial-dashboard-user admin --dashboard-password-noupdate --allow-fqdn-hostname --config /etc/ceph/initial-config-primary-cluster.yaml --allow-overwrite ... Verifying IP 192.168.10.110 port 3300 ... Cannot bind to IP 192.168.10.110 port 3300: [Errno 98] Address already in use ERROR: Cannot bind to IP 192.168.10.110 port 3300: [Errno 98] Address already in use cephadm ls | grep -i FSID "fsid" : "265fdbaa-528c-11f0-9401-bc2411f9a113" , cephadm rm-cluster --force --fsid 265fdbaa-528c-11f0-9401-bc2411f9a113 ss -tunlp | grep 3300 可见已经没有输出了
问题解决 进程状态异常
1 2 3 4 5 6 7 8 9 10 ceph orch ps | grep error rgw.realm.zone.cephadm-1.qfehta cephadm-1 *:80 error 112s ago 32m - - <unknown> <unknown> <unknown> rgw.realm.zone.cephadm-2.vluxwt cephadm-2 *:80 error 112s ago 32m - - <unknown> <unknown> <unknown> ceph orch daemon restart rgw.realm.zone.cephadm-1.qfehta Scheduled to restart rgw.realm.zone.cephadm-1.qfehta on host 'cephadm-1' ceph orch daemon restart rgw.realm.zone.cephadm-2.vluxwt Scheduled to restart rgw.realm.zone.cephadm-2.vluxwt on host 'cephadm-2'