Akemi

Docker存储迁移实践

2025/12/01

在很多情况下,比如docker里跑GPU训练、处理任务,或者是用registry作为私有容器仓库,都很容易遇到爆炸的情况

毕竟docker默认情况下,是将数据存放在/var/lib下的,占根目录的空间

收到开发反馈说这个k8s节点经常遇到临时存储不够的情况

使用ncdu / --exclude /data --exclude /mnt 在排除挂载的空间之后扫描了一下根目录下的文件,结果删除了一堆log之后,ncdu显示只用了40G,实际上用了100多个G

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
sudo journalctl --disk-usage
# 日志也没占用什么空间
Archived and active journals take up 2.9G in the file system.

# 看了下/var/下站的最多
sudo du -sh /usr /var /opt /snap /home /data /boot
16G /usr
114G /var
8.4G /opt
8.5G /snap
226G /home
158G /data
194M /boot

# 看一眼docker占用
docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 32 2 11.08GB 5.656GB (51%)
Containers 2 2 276.5MB 0B (0%)
Local Volumes 14 1 70.79GB 980.2MB (1%)
Build Cache 31 0 311.1MB 311.1MB

docker system df -v
Images space usage:

REPOSITORY TAG IMAGE ID CREATED SIZE SHARED SIZE UNIQUE SIZE CONTAINERS
registry2 backup 2b893ab365df 2 weeks ago 25.4MB 25.44MB 0B 0
xxxxx
Containers space usage:

CONTAINER ID IMAGE COMMAND LOCAL VOLUMES SIZE CREATED STATUS NAMES
dabec99b0609 registry:2 "/entrypoint.sh /etc…" 1 0B 5 months ago Up 8 days registry2

Local Volumes space usage:

VOLUME NAME LINKS SIZE
98793e91065d2d274f57f8bd9147607b86a9caefb4b3fe152ed1aa29120ea765 0 0B
e9c10d6727913682bc4c0cd76bf3c69ae234ab1aa7f40699ba25e0c1ac4d6dbe 0 0B
f2f9881bfcbd4b7ea6a05d29fb5fd6b2ed2db5c13ce8bdbceae89c31743c1832 0 0B
26107f7c3e8c73c51db4ddb71fd7aebfeefafd0688380f1988a2bf01059623c6 0 0B
5b8d8a32aa97a8f8d804f38cf64b23605ac61b617f195c3f4aedadee8a29e601 0 0B
7f94a19a4caf26f7c8e343cef2a3de92b01502de448304b3de111e2987070338 0 0B
09d2724f32128edecf137193d7ead873fa91b9c03e649efbbe6c3c424cd87114 0 0B
fa4712f2b368456ee519bfdd384b97484b891a84335d6aa35ea0c1f8124cc620 0 0B
5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 1 69.81GB
996382fc26991266de5510f0d53ac8040451e55f51136beb1d9f1a0ff5a00b9b 0 0B
bd282d6dc523171d96845384fbb7d8c4dda94945639286040c73dfe582edde9c 0 980.2MB
3ad7d780120ad29d7b29763e309ee58226b3a9cba6ecde63ab8da3cd8b17a018 0 0B
95c9901058ab3c5290e0b90e9815c6f2d9a6e88a8d8adb48153b66f2a838443b 0 0B
cc3d1da4d0e36a70e63332ce795c036a7939841ba5d690a204fb45ebfc70e1f4 0 0B

Build cache usage: 311.1MB

可见registry2的容器使用的local volume占用了69个G,这一层叫5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790

存储迁移

那么目标就是将存储层迁移到/data上,并且打一个软链

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 停止容器
docker stop registry2

# 将存储层进行一手备份
sudo cp -r \
/var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 \
/data/registry_data.bak

# 将存储层进行移动
sudo mv /var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 \
/data/registry_data

# 打一个软链
sudo ln -s /data/registry_data /var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790

# 查看软链
sudo ls -l /var/lib/docker/volumes/
总用量 84
...
5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 -> /data/registry_data

# 重新启动
docker start registry2

# 进行重新拉取镜像的测试
docker rmi easzlab.io.local:5000/bitnami/minio:2025.4.22-debian-12-r1
docker pull easzlab.io.local:5000/bitnami/minio:2025.4.22-debian-12-r1

能拉取到就说明容器已经正常运行
CATALOG