Akemi

MySQL高可用之MHA

2024/09/08

MHA介绍

MHA是是Mysql高可用的解决方案

在mysql故障切换过程中,能在0-30秒内完成数据库故障切换操作,且能最大程度上保证数据的一致性

MHA角色

主要包括MHA Manager管理节点与MHA Node数据节点,MHA Manager可以部署在单独节点,也可以部署在slave节点

MHA Manager功能:
master自动切换与故障转移
实现其他脚本的运行,比如状态监测

MHA node功能:
复制主节点的binlog数据
对比从节点的中继日志文件
定时删除中继日志

MHA工作原理

1.在崩溃的master保存二进制日志
2.识别含有最新更新的slave
3.应用差异的中继日志到其他slave
4.应用从master保存的二进制日志
5.提升一个slave作为新的master
6.使其他slave连接新的master进行复制

MHA工具

是由Perl开发的开源工具

优点:
支持master自动监控与故障转移
支持基于GTID的复制模式
不易产生数据丢失
一个监控节点可以监控多个集群
增强的数据安全性

缺点:
原生不支持VIP,需要脚本或第三方工具
只对主数据库进行监控
需要基于SSH互信
没有提供从服务器的读负载均衡

MHA工具组件

由Manager工具包和Node工具包组成

Manager工具包:

  • masterha_check_ssh——检查MHA ssh配置
  • masterha_check_repl——检查mysql复制
  • masterha_manager——启动MHA
  • masterha_check_status——检查MHA运行状态
  • masterha_master_monitor——检测master是否宕机
  • masterha_master_switch——控制故障转移
  • masterha_conf_host——添加或删除server信息

Node工具包:通常由MHA Manager脚本触发

  • save_binary_logs——保存和复制master二进制日志
  • apply_diff_relay_logs——识别差异中继日志并应用于其他slave
  • purge_relay_logs——清除中继日志

部署MHA

环境说明

1
2
3
4
5
6
7
8
9
10
11
centos 7.9.2009
mysql 5.7.44
192.168.10.125 mysql-1 master-1
192.168.10.126 mysql-2 salve-1 master-2
192.168.10.127 mysql-3 salve-2
192.168.10.128 mysql-4 slave-3
192.168.10.129 mha mha-manger
架构:
基于MHA实现主从复制,采用GTID+无损同步复制,双主多从
优先:双主热备,读写分离,SLAVE可扩展
缺点:读写分离需要再程序端解决,master大量写时会产生主从延时

主从复制部署

先做master1和slave1-3的无损同步主从复制,其实另外主从同步方式也可以

主库配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
\cp /etc/my.cnf /etc/my.cnf.bak
cat >/etc/my.cnf<<EOF
[mysql]
default-character-set=utf8
socket=/usr/local/mysql/logs/mysql.sock

[mysqld]
#基本配置
port=3306
server-id=253306
character-set-server=utf8mb4
default-storage-engine=INNODB
pid-file=/usr/local/mysql/data/mysqld.pid
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/usr/local/mysql/logs/mysql.sock
log-error=/usr/local/mysql/logs/mysqld.log

skip_name_resolve=on
expire_logs_days=7
innodb_support_xa=1
log_bin_trust_function_creators=1
innodb_flush_log_at_trx_commit=1
sync_binlog=1
transaction_isolation=READ-COMMITTED

#慢日志与通用日志参数
log-output=FILE
general_log=0
general_log_file=/usr/local/mysql/logs/general.log
slow_query_log=ON
slow_query_log_file=/usr/local/mysql/logs/slow_query.log
long_query_time=10

#default-authentication-plugin=mysql_native_password 密码策略
check_proxy_users=on
mysql_native_password_proxy_users=on

#binlog参数
log_bin=/usr/local/mysql/logs/bin.log
log_bin_index=/usr/local/mysql/logs/binlog.index
binlog_format=ROW
binlog_rows_query_log_events=on
max_binlog_size=2024M
binlog_cache_size=1M

#无损同步复制
gtid_mode=on
enforce_gtid_consistency=1
log_slave_updates=1
binlog_gtid_simple_recovery=1

#无损半同步复制参数
plugin_load="rpl_semi_sync_slave=semisync_slave.so;rpl_semi_sync_master=semisync_master.so"
loose_rpl_semi_sync_master_enabled=1
loose_rpl_semi_sync_slave_enabled=1
loose_rpl_semi_sync_master_timeout=5000
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_point=AFTER_SYNC

#异步复制参数
#replicate-do-db=xxx
#binlog-ignore-db=mysql
#auto_increment_increment=3
#auto_increment_offset=1

#federated 远程数据库
EOF
systemctl restart mysqld.service

从库配置(需要修改server_id)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
\cp /etc/my.cnf /etc/my.cnf.bak
cat >/etc/my.cnf<<EOF
[mysql]
default-character-set=utf8
socket=/usr/local/mysql/logs/mysql.sock

[mysqld]
port=3306
bind-address=0.0.0.0
server-id=263306
pid-file=/usr/local/mysql/data/mysqld.pid
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/usr/local/mysql/logs/mysql.sock
log-error=/usr/local/mysql/logs/mysqld.log
character-set-server=utf8mb4
default-storage-engine=INNODB
skip_name_resolve=on
expire_logs_days=7
innodb_support_xa=1
innodb_flush_log_at_trx_commit=1
transaction-isolation=read-committed

#日志配置
log-output=FILE
general_log=0
general_log_file=/usr/local/mysql/logs/general.log
slow_query_log=ON
slow_query_log_file=/usr/local/mysql/logs/slow_query.log
long_query_time=10

#中继日志
relay_log=/usr/local/mysql/logs/relay.log
relay_log_purge=0 #不自动清除
relay_log_info_repository=table
relay_log_recovery=1

#从库多线程参数
slave-parallel-type=LOGICAL_CLOCK
slave-parallel-workers=4
master_info_repository=table
slave_skip_errors=ddl_exist_errors #忽略ddl中"是否存在"的报错
slave_preserve_commit_order=1

#GTID参数
gtid_mode=on
enforce_gtid_consistency=1
log_slave_updates=1
binlog_gtid_simple_recovery=1

#binlog参数
log_bin=/usr/local/mysql/logs/bin.log
log_bin_index=/usr/local/mysql/logs/binlog.index
binlog_format=ROW
binlog_rows_query_log_events=on
max_binlog_size=2024M
binlog_cache_size=1M
read_only=1
log_bin_trust_function_creators=1
sync_binlog=1

#半同步复制参数
plugin_load="rpl_semi_sync_slave=semisync_slave.so;rpl_semi_sync_master=semisync_master.so"
loose_rpl_semi_sync_master_enabled=1 #旧配置,适用于5.6
loose_rpl_semi_sync_slave_enabled=1 #旧配置,适用于5.6
loose_rpl_semi_sync_master_timeout=5000 #旧配置,适用于5.6
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_point=AFTER_SYNC

#异步复制参数
#replicate-do-db=xxx
#binlog-ignore-db=mysql
#auto_increment_increment=3
#auto_increment_offset=2

#default-authentication-plugin=mysql_native_password
#check_proxy_users=on
#mysql_native_password_proxy_users=on
#federated
EOF
systemctl restart mysqld.service

主从复制建立连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql -uroot -proot
grant replication client,replication slave on *.* to 'repuser'@'%' identified by 'repuser123';
flush privileges;

stop slave;
reset slave;
change master to
master_host='192.168.10.125',
master_port=3306,
master_user='repuser',
master_password='repuser123',
master_auto_position=1; -- auto_position需要GTID支持
start slave;
show slave status\G

报错处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
即主库没有所需的binlog

可以尝试
change master to
master_host='192.168.10.125',
master_port=3306,
master_user='repuser',
master_password='repuser123',
master_log_file='bin.000014',
master_log_pos=234;
的方式指定pos

如果还是报错
查看master的gtid
show variables like 'gtid_purged';
+---------------+---------------------------------------------+
| Variable_name | Value |
+---------------+---------------------------------------------+
| gtid_purged | 793623a6-4c20-11ef-aa71-bc241123c3a6:1-3479 |
+---------------+---------------------------------------------+
将slave节点的gtid设置与主节点一致
RESET MASTER;
set global gtid_purged='793623a6-4c20-11ef-aa71-bc241123c3a6:1-3479';
stop slave;
start slave;

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
查看主节点连接
show processlist;
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 5 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 8 | repuser | 192.168.10.126:44730 | NULL | Binlog Dump GTID | 206 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 11 | repuser | 192.168.10.127:46398 | NULL | Binlog Dump GTID | 157 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 13 | repuser | 192.168.10.128:42122 | NULL | Binlog Dump GTID | 97 | Master has sent all binlog to slave; waiting for more updates | NULL |
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+

主库导入一个sql或自建数据库进行测试
create database wsdb01;
create table wst01 (id int,name varchar(40));
insert into wst01 values (1,'ws'),(2,'xhy');
commit;
select * from wsdb01.wst01;
+------+------+
| id | name |
+------+------+
| 1 | ws |
| 2 | xhy |
+------+------+

备库验证
select * from wsdb01.wst01;
+------+------+
| id | name |
+------+------+
| 1 | ws |
| 2 | xhy |
+------+------+

安装MHA

安装MHA node包

1
2
3
4
5
6
7
8
9
所有节点
\cp /usr/local/mysql/lib/libmysqlclient.so.20 /lib64/
yum -y install perl-DBD-MySQL perl-devel perl-CPAN
cd /root/MHA
tar -xf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58/
perl Makefile.PL
make && make install
ln -s /root/MHA/mha4mysql-node-0.58/bin/* /bin/

安装MHA manager包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
仅在manager节点上做
#安装
yum -y install perl-DBD-MySQL perl-devel perl-CPAN
yum -y install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

cd /root/MHA
tar -xf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58/
perl Makefile.PL
make && make install

#配置
mkdir -p /etc/masterha
\cp /root/MHA/mha4mysql-manager-0.58/samples/conf/app1.cnf /etc/masterha/app1.cnf
cat >/etc/masterha/app1.cnf<<EOF
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/usr/local/mysql/logs/bin.log
master_ip_failover_script=/usr/local/bin/master_ip_failover #自动切换脚本
master_ip_online_change_script=/usr/local/bin/master_ip_online_change # 手动切换脚本
password=root
user=root
ping_interval=1 #ping间隔时间
remote_workdir=/tmp
repl_user=repuser
repl_password=repuser123
report_script=/usr/local/bin/send_report
secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysql-2 -s mysql-3 -s mysql-4
shutdown_script=""
ssh_user=root

[server1]
hostname=192.168.10.125
port=3306

[server2]
hostname=192.168.10.126
port=3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.10.127
port=3306

[server4]
hostname=192.168.10.128
port=3306
EOF

配置脚本清理relay

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
每个slave节点
mysql -uroot -proot -e "set global relay_log_purge=0;"

cat >/usr/local/bin/purge_relay_log.sh<<'EOF'
#!/bin/bash
user=root
passwd=root
port=3306
socket=/usr/local/mysql/logs/mysql.sock
log_dir='/var/log/masterha/log'
work_dir='/usr/local/mysql/logs/relay.log'
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd -S $socket --host=localhost --disable_relay_log_purge --port=$port
EOF
chmod +x /usr/local/bin/purge_relay_log.sh

echo "0 4 * * * root sh /usr/local/bin/purge_relay_log.sh" >>/etc/crontab
systemctl restart crond.service

MHA manager启动

启动前测试

ssh连接测试
masterha_check_ssh --conf=/etc/masterha/app1.cnf
#[info] All SSH connection tests passed successfully.

复制环境测试
先注释掉app1.cnf中的master_ip_failver_script,因为此时还没有配置自动切换的参数
如果不注释就会报错
masterha_check_repl --conf=/etc/masterha/app1.cnf
#MySQL Replication Health is OK.

#查看当前状态
masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

启动MHA

1
2
3
4
5
6
7
8
9
10
11
12
13
#创建日志目录
mkdir -p /var/log/masterha/app1/

#nohop启动
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf \
--ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

--remove_dead_master_conf表示自动删除已经失效的配置
--ignore_last_failover表示忽略8小时内两次宕机的情况,避免反复切换

#查看当前状态
masterha_check_status --conf=/etc/masterha/app1.cnf
#app1 (pid:14416) is running(0:PING_OK), master:mysql-1

MHA切换VIP

MHA切换VIP的过程

1.配置文件检查

2.宕机master处理,删除VIP

3.保存有问题的master和和最新的slave库有区别的中继日志

4.识别最新数据的slave库

5.应用3中保存的日志

6.将最新的slave提升为新master

7.将其他slave连接到新master

配置VIP

使用mha自带的功能,需要配置脚本与mha结合来实现,是一种手动切换的用法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
master节点:
/sbin/ifconfig ens18:1 192.168.10.198/24

配置master_ip_failover手动切换脚本
cat >/usr/local/bin/master_ip_failover <<'EOF'
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

my $vip = '192.168.10.198/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens18:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens18:$key down";

GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}

sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
EOF
chmod +x /usr/local/bin/*

重新在/etc/masterha/app1.cnf中添加脚本的参数
master_ip_failover_script=/usr/local/bin/master_ip_failover #自动切换脚本
重新进行健康检查
masterha_check_repl --conf=/etc/masterha/app1.cnf
#MySQL Replication Health is OK.
此时失败切换脚本经过检查没有问题就不会报错

新建一个启动脚本:
cat >/usr/local/bin/manager_start<<EOF
#!/bin/bash
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf \
--ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
EOF
chmod +x /usr/local/bin/*

重启监控
masterha_stop --conf=/etc/masterha/app1.cnf
manager_start

查看日志,可以看到现在已经监听我的125主机即主数据库
tail -10 /var/log/masterha/app1/manager.log
IN SCRIPT TEST====/sbin/ifconfig ens18:1 down==/sbin/ifconfig ens18:1 192.168.10.198/24===

Checking the Status of the script.. OK
Sat Sep 7 16:03:06 2024 - [info] OK.
Sat Sep 7 16:03:06 2024 - [warning] shutdown_script is not defined.
Sat Sep 7 16:03:06 2024 - [info] Set master ping interval 1 seconds.
Sat Sep 7 16:03:06 2024 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysql-2 -s mysql-3 -s mysql-4
Sat Sep 7 16:03:06 2024 - [info] Starting ping health check on mysql-1(192.168.10.125:3306)..
Sat Sep 7 16:03:06 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

MHA主库切换测试(脚本)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
手动停止主节点mysql
systemctl stop mysqld

查看日志
tail -f /var/log/masterha/app1/manager.log

Started automated(non-interactive) failover.
Invalidated master IP address on mysql-1(192.168.10.125:3306)
Selected mysql-2(192.168.10.126:3306) as a new master.
mysql-2(192.168.10.126:3306): OK: Applying all logs succeeded.
mysql-2(192.168.10.126:3306): OK: Activated master IP address.
mysql-3(192.168.10.127:3306): OK: Slave started, replicating from mysql-2(192.168.10.126:3306)
mysql-4(192.168.10.128:3306): OK: Slave started, replicating from mysql-2(192.168.10.126:3306)
mysql-2(192.168.10.126:3306): Resetting slave info succeeded.
Master failover to mysql-2(192.168.10.126:3306) completed successfully.

查看mysql-2,可以看到mysql-3和-4已经指向了mysql-2
mysql -uroot -proot -e "show processlist;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 17 | repuser | 192.168.10.127:49552 | NULL | Binlog Dump GTID | 99 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 18 | repuser | 192.168.10.128:48006 | NULL | Binlog Dump GTID | 99 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 21 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+

且此时VIP已经漂移到了mysql-2节点
ip a | grep ens -A 10 | grep -w inet
inet 192.168.10.126/24 brd 192.168.10.255 scope global noprefixroute ens18
inet 192.168.10.198/24 brd 192.168.10.255 scope global secondary ens18:1

后续问题:避免MHA自动关闭脚本

MHA官方bug,每次切换完VIP,manager进程都会自动嘎掉,所以需要一个脚本来避免这种情况

写完之后直接在master节点中运行即可

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
while true
do
COUNT=$(ps -ef | grep manager | egrep -v -c grep)
if [ $COUNT -eq 0 ];then
manager_start
else
echo 'MHA manager start'
fi
sleep 5
done

后续问题:重新使原主库正常运行

此时原主库已经脱离集群,此时如果需要将主库重新加入服务器组,需要重新配置其为从库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
1.修改my.cnf配置文件

#从库参数
relay_log=/usr/local/mysql/logs/relay.log
relay_log_purge=0 #不自动清除
relay_log_info_repository=table
relay_log_recovery=1
slave-parallel-type=LOGICAL_CLOCK
slave-parallel-workers=4
master_info_repository=table
slave_skip_errors=ddl_exist_errors #忽略ddl中"是否存在"的报错
slave_preserve_commit_order=1

systemctl start mysqld.service

2.重新进行指向到mysql-2

stop slave;
reset slave;
change master to
master_host='192.168.10.126',
master_port=3306,
master_user='repuser',
master_password='repuser123',
master_auto_position=1;
start slave;
show slave status\G

3.配置mysql-1库为只读
自动切换后默认还是可写状态,不会自动关闭可写
mysql -uroot -proot -e "show variables like 'read_only';"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+

所以需要重新设置为只读模式
my.cnf配置文件
read_only=1

4.重新修改/etc/masterha/app1.cnf
加入
[server1]
candidate_master=1
check_repl_delay=0
hostname=mysql-1
port=3306

在线切换脚本

mha支持集成的自定义脚本,和上面的检测脚本一样

/etc/masterha/app1.cnf中的master_ip_online_change参数脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
cat >/usr/local/bin/master_ip_online_change<<'EOF'
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_host, $orig_master_ip,
$orig_master_port, $orig_master_user,
$new_master_host, $new_master_ip, $new_master_port,
$new_master_user,
);

my $vip = '192.168.10.198/24'; # Virtual IP
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens18:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens18:$key down";
my $ssh_user = "root";
my $new_master_password='root';
my $orig_master_password='root';
GetOptions(
'command=s' => \$command,
#'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
#'orig_master_password=s' => \$orig_master_password,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
#'new_master_password=s' => \$new_master_password,
);

exit &main();

sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}

sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );

if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}

if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}

push @threads, $ref;
}
return @threads;
}

sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
#$orig_master_handler->disable_log_bin_local();
#print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}

## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();

## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
#$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
#$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();

## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
#$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();

## Update master ip on the catalog database, etc
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {

# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}
EOF

chmod +x /usr/local/bin/*

手动切换

1
2
3
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive \
--new_master_host=192.168.10.125 --new_master_port=3306 --orig_master_is_new_slave \
--running_updates_limit=10000

宕机切换

1
2
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive \
--dead_master_host=192.168.10.125

keepalived切换VIP

只需要修改

master_ip_online_change与master_ip_failover_script脚本内容即可

master_ip_failover脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

my $vip = '192.168.1.198/24';
my $key = '1';
my $ssh_start_vip ="systemctl start keepalived";
my $ssh_stop_vip ="systemctl stop keepalived";

GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}

sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

master_ip_online_change脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_host, $orig_master_ip,
$orig_master_port, $orig_master_user,
$new_master_host, $new_master_ip, $new_master_port,
$new_master_user,
);

my $vip = '192.168.1.198/24';
my $key = '1';
my $ssh_start_vip ="systemctl start keepalived";
my $ssh_stop_vip ="systemctl stop keepalived";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
my $ssh_user = "root";
my $new_master_password='root';
my $orig_master_password='root';
GetOptions(
'command=s' => \$command,
#'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
#'orig_master_password=s' => \$orig_master_password,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
#'new_master_password=s' => \$new_master_password,
);

exit &main();

sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}

sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );

if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}

if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}

push @threads, $ref;
}
return @threads;
}

sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
#$orig_master_handler->disable_log_bin_local();
#print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}

## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();

## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
#$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
#$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();

## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
#$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();

## Update master ip on the catalog database, etc
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {

# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}

重启mha
/usr/local/bin/masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.

添加非本机IP绑定

1
2
3
echo 'net.ipv4.ip_nonlocal_bind=1' >> /etc/sysctl.conf
sysctl -p
echo '1' > /proc/sys/net/ipv4/ip_forward

keepalived主从配置

非抢占模式,直接凭优先级来判定

主从配置分别在一台主节点和一台备选主上配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#数据库主keepalived配置
yum -y install keepalived
cat >/etc/keepalived/keepalived.conf <<EOF
! Configuration File for keepalived
global_defs {
router_id mysql-2
}
vrrp_instance VI_1 {
state BACKUP
interface ens18
virtual_router_id 51
priority 80
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.10.198/24 dev ens18
}
}
virtual_server 192.168.10.125 3306 {
delay_loop 2
lb_algo wrr
lb_kind DR
persistence_timeout 60
protocol TCP
real_server 192.168.10.125 3306 {
weight 3
notify_down /etc/keepalived/keepalived_stop.sh
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
}
EOF

#从keepalived配置

yum -y install keepalived
cat >/etc/keepalived/keepalived.conf <<EOF
! Configuration File for keepalived
global_defs {
router_id mysql-1
}
vrrp_instance V_mysql_1 {
state BACKUP
interface ens18
virtual_router_id 51
priority 90
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass itpux
}
virtual_ipaddress {
192.168.10.198/24
}
}
virtual_server 192.168.10.126 3306 {
delay_loop 2
lb_algo wrr
lb_kind DR
persistence_timeout 60
protocol TCP
real_server 192.168.10.126 3306 {
weight 3
notify_down /etc/keepalived/keepalived_stop.sh
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
}
EOF

杀死keepalived的脚本

1
2
3
4
5
6
7
8
9
10
cat >/etc/keepalived/keepalived_stop.sh<<EOF
#!/bin/bash
systemctl stop keepalived.service
EOF
chmod +x /etc/keepalived/keepalived_stop.sh

systemctl daemon-reload
systemctl restart keepalived.service

注意主库节点先启动keepalived,否则VIP会被抢占

VIP切换测试

1
2
3
4
5
6
7
8
9
10
11
主(mysql-2)查看ip
inet 192.168.10.126/24 brd 192.168.10.255 scope global noprefixroute ens18
inet 192.168.10.198/24 scope global secondary ens18
从(mysql-1)查看ip
inet 192.168.10.125/24 brd 192.168.10.255 scope global noprefixroute ens18

切换vip
主(mysql-2)
systemctl stop mysqld

vip切换成功,略过了

MHA主库切换测试(keepalived)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
根据需求修改mha的cnf文件:
secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysql-2 -s mysql-3 -s mysql-4
[server2]
candidate_master=1
check_repl_delay=0
hostname=mysql-1
port=3306
manager_start

mysql-1和mysql-2做一个双主:
stop slave;
reset slave;
change master to
master_host='192.168.10.126',
master_port=3306,
master_user='repuser',
master_password='repuser123',
master_auto_position=1;
start slave;
show slave status\G

此时master是mysql-1,VIP也在mysql-1身上
关闭mysql-1
systemctl stop mysqld.service

mysql-2(第二台主)
ip a show ens18 | grep 192
inet 192.168.10.126/24 brd 192.168.10.255 scope global noprefixroute ens18
inet 192.168.10.198/24 scope global secondary ens18
mysql -uroot -proot -e "show processlist;"
+-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 74 | repuser | 192.168.10.127:50230 | NULL | Binlog Dump GTID | 129 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 75 | repuser | 192.168.10.128:48684 | NULL | Binlog Dump GTID | 129 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 141 | root | localhost | NULL | Query | 0 | starting | show processlist |
+-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
mysql -uroot -proot -e "show variables like 'server_id';"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value |
+---------------+--------+
| server_id | 263306 |
+---------------+--------+
可以看到剩余mysql-3和mysql-4成功指向了mysql-2为新的主库

宕机库恢复同上MHA部分:重新添加cnf等

CATALOG
  1. 1. MHA介绍
  2. 2. 部署MHA
    1. 2.1. 主从复制部署
    2. 2.2. 主从复制建立连接
    3. 2.3. 安装MHA
      1. 2.3.1. 安装MHA node包
      2. 2.3.2. 安装MHA manager包
      3. 2.3.3. MHA manager启动
  3. 3. MHA切换VIP
    1. 3.1. MHA切换VIP的过程
    2. 3.2. 配置VIP
    3. 3.3. MHA主库切换测试(脚本)
    4. 3.4. 后续问题:避免MHA自动关闭脚本
    5. 3.5. 后续问题:重新使原主库正常运行
    6. 3.6. 在线切换脚本
  4. 4. keepalived切换VIP
    1. 4.1. 添加非本机IP绑定
    2. 4.2. keepalived主从配置
    3. 4.3. VIP切换测试
    4. 4.4. MHA主库切换测试(keepalived)