Akemi

Kubespray使用HAProxy+keepalived做集群高可用

2025/09/11

接上回,刚用kubespray部署了个多master的集群,现在用haproxy+keepalive的方式做个高可用,顺便整理成ansible的形式

整体的流程,就是客户端 → VIP (10.163.2.150:6444) → 当前主节点的 HAProxy → 某个 master 节点的 kube-apiserver

本文所有配置文件都是用简化写法,为的是方便,完全可以进行扩充,或者使用j2模板的形式

keepalive配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
10.163.2.106 master1
10.163.2.102 master2
10.163.2.101 master3
10.163.2.109 worker1
10.163.2.108 worker2
10.163.2.131 worker3
10.163.2.150 vip

# 我这里因为装的时候用了代理,先把代理取消掉
ansible -i inventory/sample/inventory.ini -m shell -a \
"sed -i '/^proxy.*/d' /etc/yum.conf" kube-master
ansible -i inventory/sample/inventory.ini -m shell -a \
"sed -i '/^proxy.*/d' /etc/dnf/dnf.conf" kube-master

# 安装keepalived
ansible -i inventory/sample/inventory.ini -m yum -a \
"name=keepalived state=latest" kube-master

# 制作一个keepalived配置文件
cat > keepalived.conf <<EOF
! Configuration File for keepalived

global_defs {
router_id LVS_DEVEL
}

vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.163.2.150/24
}
}
EOF

# 复制文件到对应节点
ansible -i inventory/sample/inventory.ini -m copy -a \
"src=keepalived.conf dest=/etc/keepalived/keepalived.conf" kube-master

ansible -i inventory/sample/inventory.ini -m service -a \
"name=keepalived state=restarted enabled=true" kube-master

haproxy配置

每个master的6444端口转发到3个master的6443端口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
ansible -i inventory/sample/inventory.ini -m yum -a \
"name=haproxy state=latest" kube-master

# 搞个配置文件
cat > haproxy.cfg <<EOF
global
daemon
maxconn 4000
user haproxy
group haproxy
pidfile /var/run/haproxy.pid

defaults
mode tcp
timeout connect 10s
timeout client 1mc
timeout server 1m
retries 3
maxconn 3000

frontend fe_k8s
bind *:6444
default_backend be_k8s

backend be_k8s
balance roundrobin
server master1 10.163.2.106:6443 check
server master2 10.163.2.102:6443 check
server master3 10.163.2.101:6443 check
EOF

#
ansible -i inventory/sample/inventory.ini -m copy -a \
"src=haproxy.cfg dest=/etc/haproxy/haproxy.cfg" kube-master

ansible -i inventory/sample/inventory.ini -m service -a \
"name=haproxy state=restarted enabled=true" kube-master

k8s集群配置

此时vip 10.163.2.150已经可以ping通,但如果在kubeconfig中改成VIP
依然会因为证书问题报错:

1
2
3
4
5
6
7
8
9
kubectl get pods -A
E0910 20:01:15.399627 83677 memcache.go:265] couldn't get current server API group list: Get "https://10.163.2.150:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150
E0910 20:01:15.403011 83677 memcache.go:265] couldn't get current server API group list: Get "https://10.163.2.150:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150
E0910 20:01:15.405688 83677 memcache.go:265] couldn't get current server API group list: Get "https://10.163.2.150:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150
E0910 20:01:15.408948 83677 memcache.go:265] couldn't get current server API group list: Get "https://10.163.2.150:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150
E0910 20:01:15.411453 83677 memcache.go:265] couldn't get current server API group list: Get "https://10.163.2.150:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150
Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 10.163.2.102, 10.163.2.106, 127.0.0.1, 10.163.2.101, not 10.163.2.150

试完记得改回去,不然一会儿会报错

修改kubespray配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
vim inventory/sample/group_vars/all/all.yml
添加配置
loadbalancer_apiserver:
address: 10.163.2.150
port: 6444
loadbalancer_apiserver_type: haproxy
loadbalancer_apiserver_port: 6443

# 重新部署k8s,更新一下配置
ansible-playbook -i inventory/sample/inventory.ini cluster.yml -v -u root --private-key=~/.ssh/id_rsa

PLAY RECAP ****************************************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
master1 : ok=731 changed=33 unreachable=0 failed=0 skipped=1259 rescued=0 ignored=1
master2 : ok=632 changed=28 unreachable=0 failed=0 skipped=1105 rescued=0 ignored=1
master3 : ok=634 changed=28 unreachable=0 failed=0 skipped=1103 rescued=0 ignored=1
worker1 : ok=482 changed=9 unreachable=0 failed=0 skipped=756 rescued=0 ignored=1
worker2 : ok=482 changed=9 unreachable=0 failed=0 skipped=756 rescued=0 ignored=1
worker3 : ok=482 changed=9 unreachable=0 failed=0 skipped=756 rescued=0 ignored=1

测试VIP

更新完成后,api-server的地址会变成域名的形式放在kubeconfig里

同时hosts文件中也会被添加相关信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
grep server ~/.kube/config
server: https://lb-apiserver.kubernetes.local:6444

cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost6 localhost6.localdomain6 localhost6.localdomain
# Ansible inventory hosts BEGIN
10.163.2.106 master1.cluster.local master1
10.163.2.102 master2.cluster.local master2
10.163.2.101 master3.cluster.local master3
10.163.2.109 worker1.cluster.local worker1
10.163.2.108 worker2.cluster.local worker2
10.163.2.131 worker3.cluster.local worker3
# Ansible inventory hosts END
10.163.2.150 lb-apiserver.kubernetes.local

我可以看到VIP现在在3上,进行一次断电
[root@master3 ~]# ip a |grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 10.163.2.101/24 brd 10.163.2.255 scope global dynamic noprefixroute eth0
inet 10.163.2.150/24 scope global secondary eth0
[root@master3 ~]# systemctl poweroff

切换到了master1上,那我们在master2上再次使用kubectl,可以通
[root@master2 ~]# ip a |grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 10.163.2.102/24 brd 10.163.2.255 scope global dynamic noprefixroute eth0
[root@master2 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane 80m v1.27.5
master2 Ready control-plane 80m v1.27.5
master3 NotReady control-plane 80m v1.27.5
worker1 Ready <none> 79m v1.27.5
worker2 Ready <none> 79m v1.27.5
worker3 Ready <none> 79m v1.27.5

CATALOG
  1. 1. keepalive配置
  2. 2. haproxy配置
  3. 3. k8s集群配置
  4. 4. 测试VIP