在很多情况下,比如docker里跑GPU训练、处理任务,或者是用registry作为私有容器仓库,都很容易遇到爆炸的情况
毕竟docker默认情况下,是将数据存放在/var/lib下的,占根目录的空间
收到开发反馈说这个k8s节点经常遇到临时存储不够的情况
使用ncdu / --exclude /data --exclude /mnt 在排除挂载的空间之后扫描了一下根目录下的文件,结果删除了一堆log之后,ncdu显示只用了40G,实际上用了100多个G
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 sudo journalctl --disk-usage Archived and active journals take up 2.9 G in the file system. sudo du -sh /usr /var /opt /snap /home /data /boot 16G /usr 114G /var 8.4 G /opt8.5 G /snap226G /home 158G /data 194M /boot docker system df TYPE TOTAL ACTIVE SIZE RECLAIMABLE Images 32 2 11.08 GB 5.656 GB (51 %) Containers 2 2 276.5 MB 0B (0 %) Local Volumes 14 1 70.79 GB 980.2 MB (1 %) Build Cache 31 0 311.1 MB 311.1 MB docker system df -v Images space usage: REPOSITORY TAG IMAGE ID CREATED SIZE SHARED SIZE UNIQUE SIZE CONTAINERS registry2 backup 2b893ab365df 2 weeks ago 25.4 MB 25.44 MB 0B 0 xxxxx Containers space usage: CONTAINER ID IMAGE COMMAND LOCAL VOLUMES SIZE CREATED STATUS NAMES dabec99b0609 registry:2 "/entrypoint.sh /etc…" 1 0B 5 months ago Up 8 days registry2 Local Volumes space usage: VOLUME NAME LINKS SIZE 98793e9 1065d2d274f57f8bd9147607b86a9caefb4b3fe152ed1aa29120ea765 0 0Be9c10d6727913682bc4c0cd76bf3c69ae234ab1aa7f40699ba25e0c1ac4d6dbe 0 0B f2f9881bfcbd4b7ea6a05d29fb5fd6b2ed2db5c13ce8bdbceae89c31743c1832 0 0B 26 107f7c3e8c73c51db4ddb71fd7aebfeefafd0688380f1988a2bf01059623c6 0 0B5b8d8a32aa97a8f8d804f38cf64b23605ac61b617f195c3f4aedadee8a29e601 0 0B 7f94a19a4caf26f7c8e343cef2a3de92b01502de448304b3de111e2987070338 0 0B 09d2724f32128edecf137193d7ead873fa91b9c03e649efbbe6c3c424cd87114 0 0B fa4712f2b368456ee519bfdd384b97484b891a84335d6aa35ea0c1f8124cc620 0 0B 5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 1 69.81 GB 996382fc26991266de5510f0d53ac8040451e55f51136beb1d9f1a0ff5a00b9b 0 0B bd282d6dc523171d96845384fbb7d8c4dda94945639286040c73dfe582edde9c 0 980.2 MB 3ad7d780120ad29d7b29763e309ee58226b3a9cba6ecde63ab8da3cd8b17a018 0 0B 95c9901058ab3c5290e0b90e9815c6f2d9a6e88a8d8adb48153b66f2a838443b 0 0B cc3d1da4d0e36a70e63332ce795c036a7939841ba5d690a204fb45ebfc70e1f4 0 0B Build cache usage: 311.1 MB
可见registry2的容器使用的local volume占用了69个G,这一层叫5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790
存储迁移
那么目标就是将存储层迁移到/data上,并且打一个软链
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 docker stop registry2 sudo cp -r \ /var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 \ /data/registry_data.bak sudo mv /var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 \ /data/registry_data sudo ln -s /data/registry_data /var/lib/docker/volumes/5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 sudo ls -l /var/lib/docker/volumes/ 总用量 84 ... 5d0b8eec88dcf18b23a4f2e5d16b6e4a9e4c78eadfc0a57363c8f96f0838b790 -> /data/registry_data docker start registry2 docker rmi easzlab.io.local:5000 /bitnami/minio:2025.4 .22 -debian-12 -r1 docker pull easzlab.io.local:5000 /bitnami/minio:2025.4 .22 -debian-12 -r1 能拉取到就说明容器已经正常运行