Akemi

使用kubeadm的k8s IP地址改变-单主节点集群

2025/06/20

这个东西整了我三天,贼累,多主节点再说吧,以后有空学一学

在Kubernetes集群中,节点的IP地址是其身份识别的关键部分。当IP地址改变时,会导致以下核心问题:

  • 证书失效
    Kubeadm使用节点IP地址来生成证书,所有依赖证书的通信,如apiserver,etcd都会因此失败
  • kubelet 配置失效
    工作节点上的kubelet使用配置文件连接到apiserver,改变之后工作节点无法找到apiserver
  • 控制平面组件配置失效
    主节点上会有一些包含绑定到旧IP的配置,如apiserver的–advertise-address, etcd的–listen-peer-urls、–listen-client-urls、–initial-advertise-peer-urls、–advertise-client-urls
  • 节点对象状态不一致
    工作节点状态信息(如 InternalIP)依然是旧IP,导致网络功能异常

仅主节点IP改变

模拟故障

1
2
3
4
5
6
7
8
9
10
# 原ip 192.168.10.151

# 新ip 192.168.10.152
nmcli connection modify ens18 ipv4.method manual \
ipv4.addresses 192.168.10.152/24 ipv4.gateway 192.168.10.1 \
ipv4.dns 8.8.8.8
nmcli connection up ens18

kubectl get nodes
Unable to connect to the server: dial tcp 192.168.10.151:6443: connect: no route to host

重新编辑kubeadm配置文件

将其中的192.168.10.151改为192.168.10.152,然后我们使用kubeadm初始化新的证书与配置文件

kubeadm init phase用于分阶段执行集群初始化流程。它的核心作用是允许用户按需运行初始化过程中的某个独立步骤(Phase),而不是一次性执行完整的初始化流程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
cat kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.10.152
bindPort: 6443
nodeRegistration:
criSocket: unix://var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: test
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.30.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

重新生成apiserver证书

ca证书别动,etcd证书也别动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 备份证书与配置文件
cp -r /etc/kubernetes /etc/kubernetes.bak

# 只需要修改apiserver的证书,其他不要动
cd /etc/kubernetes/pki
mv apiserver.key apiserver.key.bak
mv apiserver.crt apiserver.crt.bak

# 使用kubeadm重新生成证书
kubeadm init phase certs all --config kubeadm.yaml

[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local test] and IPs [10.96.0.1 192.168.10.151]
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key

ls -l /etc/kubernetes/pki/
-rw-r--r-- 1 root root 1277 Jun 18 01:54 apiserver.crt
...

可见api-server证书,与etcd证书等均已经重新更新

更新控制平面配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 备份一手
cp -a /etc/kubernetes/manifests /etc/kubernetes/manifests.backup

# 生成新控制平面配置文件
kubeadm init phase control-plane all --config kubeadm.yaml

[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"

cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep 152
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.10.152:6443
- --advertise-address=192.168.10.152
host: 192.168.10.152
host: 192.168.10.152
host: 192.168.10.152
可见控制平面配置文件已经更新

更新etcd配置

1
2
3
4
5
6
7
8
9
10
11
kubeadm init phase etcd local --config kubeadm.yaml

# 用sed也可以
# sed -i 's/192.168.10.151/192.168.10.152/g' /etc/kubernetes/manifests/etcd.yaml

cat /etc/kubernetes/manifests/etcd.yaml | grep -E 'advertise-client-urls|listen-client-urls'
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.10.152:2379
- --advertise-client-urls=https://192.168.10.152:2379
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.10.152:2379

systemctl restart kubelet

更新kubelet配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 重新生成/etc/kubernetes/kubelet.conf
rm -f /etc/kubernetes/kubelet.conf
这个文件包含kubelet连接apiserver的证书,密钥和证书路径

kubeadm init phase kubeconfig kubelet --config kubeadm.yaml
[kubeconfig] Writing "kubelet.conf" kubeconfig file

# Kubelet服务配置文件
rm -f /var/lib/kubelet/config.yaml

kubeadm init phase kubelet-start --config kubeadm.yaml

[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet

# 将改动写入文件
kubeadm init phase kubelet-finalize all --config kubeadm.yaml

systemctl daemon-reload
systemctl restart kubelet

更新kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
rm -f /etc/kubernetes/{admin.conf,kubelet.conf,controller-manager.conf,scheduler.conf}
rm -f /etc/kubernetes/*.conf

kubeadm init phase kubeconfig all --config kubeadm.yaml

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file

# 更新kubectl配置
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 重启kubelet
systemctl restart kubelet

#
kubectl get nodes
NAME STATUS ROLES AGE VERSION
slave Ready <none> 14h v1.30.0
test Ready control-plane 14h v1.30.0

更新configmap配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
kubectl get cm -n kube-system
NAME DATA AGE
coredns 1 35m
extension-apiserver-authentication 6 35m
kube-apiserver-legacy-service-account-token-tracking 1 35m
kube-proxy 2 35m
kube-root-ca.crt 1 35m
kubeadm-config 1 35m
kubelet-config 1 35m

kubeadm init phase upload-config kubeadm --config kubeadm.yaml

kubectl -n kube-system edit cm kube-proxy
将原IP修改为现在IP

从节点kubelet配置调整

1
2
3
4
5
6
vim /etc/kubernetes/kubelet.conf
或者

sed -i 's/192.168.10.151/192.168.10.152/g' /etc/kubernetes/kubelet.conf

systemctl restart kubelet

重新安装网络插件(可选)

直接重启也可以

1
2
3
4
5
6
7
8
9
10
11
12
我使用的是calico
# 将calico的pod全部强制删除
kubectl delete pods --all --namespace=calico-apiserver --force --grace-period=0
kubectl delete pods --all --namespace=calico-system --force --grace-period=0
kubectl delete pods --all --namespace=tigera-operator --force --grace-period=0

# 重新应用资源
kubectl apply --server-side tigera-operator.yaml
kubectl apply -f custom-resources.yaml

# 重启电脑
systemctl reboot

验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
kubectl get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-apiserver calico-apiserver-7dbb565dd-7fbx2 1/1 Running 1 (6m40s ago) 7m12s 10.244.25.8 slave <none> <none>
calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 0 7m12s 10.244.27.199 test <none> <none>
calico-system calico-kube-controllers-64c85b8c9f-vf274 1/1 Running 0 7m12s 10.244.25.7 slave <none> <none>
calico-system calico-node-4lj9h 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none>
calico-system calico-node-lzxfq 1/1 Running 0 7m12s 192.168.10.152 test <none> <none>
calico-system calico-typha-84b5fbbc4f-skkpj 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none>
calico-system csi-node-driver-92mhf 2/2 Running 0 7m12s 10.244.25.9 slave <none> <none>
calico-system csi-node-driver-lptz7 2/2 Running 0 7m12s 10.244.27.200 test <none> <none>
kube-system coredns-6d58d46f65-2rw4q 1/1 Running 1 (7m46s ago) 75m 10.244.27.197 test <none> <none>
kube-system coredns-6d58d46f65-tqxvb 1/1 Running 1 (7m46s ago) 75m 10.244.27.198 test <none> <none>
kube-system etcd-test 1/1 Running 1 (7m46s ago) 60m 192.168.10.152 test <none> <none>
kube-system kube-apiserver-test 1/1 Running 1 (7m46s ago) 60m 192.168.10.152 test <none> <none>
kube-system kube-controller-manager-test 1/1 Running 13 (7m46s ago) 75m 192.168.10.152 test <none> <none>
kube-system kube-proxy-5pg9g 1/1 Running 1 (7m49s ago) 73m 192.168.10.231 slave <none> <none>
kube-system kube-proxy-dkvk4 1/1 Running 1 (7m46s ago) 75m 192.168.10.152 test <none> <none>
kube-system kube-scheduler-test 1/1 Running 13 (7m46s ago) 75m 192.168.10.152 test <none> <none>
tigera-operator tigera-operator-767c6b76db-kjd65 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none>

快速脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
export oldip1=192.168.10.152
export newip1=192.168.10.151

find /etc/kubernetes -type f | xargs sed -i "s/$oldip1/$newip1/"
find /root/.kube/config -type f | xargs sed -i "s/$oldip1/$newip1/"

cd /root/.kube/cache/discovery
mv ${oldip1}_6443 ${newip1}_6443

cd /etc/kubernetes/pki
mv -f apiserver.key apiserver.key.bak
mv -f apiserver.crt apiserver.crt.bak

cd ~
kubeadm init phase certs all --config kubeadm.yaml
systemctl restart kubelet

kubectl -n kube-system edit cm kube-proxy
修改为新IP

systemctl reboot

主节点与从节点地址均改变

在修改主节点操作的基础上进行操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# 原IP 192.168.10.231
# 现IP 192.168.10.232
nmcli connection modify ens18 ipv4.method manual \
ipv4.addresses 192.168.10.232/24 ipv4.gateway 192.168.10.1 \
ipv4.dns 8.8.8.8
nmcli connection up ens18

# 测试功能
kubectl apply -f test-deploy.yaml

kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-test-8458968cc8-4z78w 0/1 Running 2 (17s ago) 5m58s
nginx-test-8458968cc8-vjlps 0/1 Running 1 (37s ago) 5m58s
nginx-test-8458968cc8-xgcr4 0/1 Running 2 (17s ago) 5m58s
有镜像但跑不起来,网络原因

# 主节点删除旧节点记录
# kubectl get nodes -o name | xargs kubectl delete node
kubectl get nodes -o name
node/slave
node/test

kubectl delete node slave
node "slave" deleted

kubectl set env daemonset/calico-node -n calico-system IP_AUTODETECTION_METHOD=can-reach=192.168.10.151

# 重置从节点
kubeadm reset -f
iptables -F && iptables -t nat -F
rm -rf /etc/cni/net.d

# 重新加入集群
kubeadm join 192.168.10.151:6443 --token w9o87t.mmpxgomqej2h1ifz \
--discovery-token-ca-cert-hash \
sha256:52502b8be55539e174c2a3ebdafaecc4b94ee6e976cf0c72aaa13c26aff6023c

#此时calico安装会卡住,需要从节点进行重启
kubectl get pods -A -w -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-apiserver calico-apiserver-7dbb565dd-chvhh 1/1 Running 0 7m6s 10.244.27.206 test <none> <none>
calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 1 (135m ago) 147m 10.244.27.201 test <none> <none>
calico-system calico-kube-controllers-64c85b8c9f-tqttr 1/1 Running 0 7m6s 10.244.27.205 test <none> <none>
calico-system calico-node-k68rj 0/1 Running 0 7s 192.168.10.151 test <none> <none>
calico-system calico-node-qjk68 1/1 Running 0 6m20s 192.168.10.232 slave <none> <none>
calico-system calico-typha-84b5fbbc4f-4xsl7 1/1 Running 0 7m6s 192.168.10.151 test <none> <none>
calico-system csi-node-driver-cx5ds 0/2 ContainerCreating 0 6m20s <none> slave <none> <none>
calico-system csi-node-driver-lptz7 2/2 Running 3 (135m ago) 147m 10.244.27.202 test <none> <none>

# 重启完成
此时calico与nginx均可以正常运行,节点状态也处于Ready
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-7dbb565dd-chvhh 1/1 Running 0 10m
calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 1 (138m ago) 151m
calico-system calico-kube-controllers-64c85b8c9f-tqttr 1/1 Running 0 10m
calico-system calico-node-k68rj 1/1 Running 0 3m22s
calico-system calico-node-qjk68 1/1 Running 1 (2m9s ago) 9m35s
calico-system calico-typha-84b5fbbc4f-4xsl7 1/1 Running 0 10m
calico-system csi-node-driver-cx5ds 2/2 Running 0 9m35s
calico-system csi-node-driver-lptz7 2/2 Running 3 (138m ago) 151m
default nginx-test-8458968cc8-8qp65 1/1 Running 0 10m
default nginx-test-8458968cc8-gmwsw 1/1 Running 0 10m
default nginx-test-8458968cc8-m7j7f 1/1 Running 0 10m
kube-system coredns-6d58d46f65-2rw4q 1/1 Running 2 (138m ago) 3h39m
kube-system coredns-6d58d46f65-tqxvb 1/1 Running 2 (138m ago) 3h39m
kube-system etcd-test 1/1 Running 1 (138m ago) 141m
kube-system kube-apiserver-test 1/1 Running 1 (138m ago) 141m
kube-system kube-controller-manager-test 1/1 Running 15 (138m ago) 3h39m
kube-system kube-proxy-6dgxp 1/1 Running 1 (2m9s ago) 9m35s
kube-system kube-proxy-dkvk4 1/1 Running 2 (138m ago) 3h39m
kube-system kube-scheduler-test 1/1 Running 15 (138m ago) 3h39m
tigera-operator tigera-operator-767c6b76db-q4j2q 1/1 Running 0 10m

kubectl get nodes
NAME STATUS ROLES AGE VERSION
slave Ready <none> 8m21s v1.30.0
test Ready control-plane 3h38m v1.30.0
CATALOG
  1. 1. 仅主节点IP改变
    1. 1.1. 模拟故障
    2. 1.2. 重新编辑kubeadm配置文件
    3. 1.3. 重新生成apiserver证书
    4. 1.4. 更新控制平面配置文件
    5. 1.5. 更新etcd配置
    6. 1.6. 更新kubelet配置文件
    7. 1.7. 更新kubeconfig
    8. 1.8. 更新configmap配置
    9. 1.9. 从节点kubelet配置调整
    10. 1.10. 重新安装网络插件(可选)
    11. 1.11. 验证
    12. 1.12. 快速脚本
  2. 2. 主节点与从节点地址均改变