Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' 即主库没有所需的binlog
查看主节点连接 show processlist; +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 5 | root | localhost | NULL | Query | 0 | starting | show processlist | | 8 | repuser | 192.168.10.126:44730 | NULL | Binlog Dump GTID | 206 | Master has sent all binlog to slave; waiting for more updates | NULL | | 11 | repuser | 192.168.10.127:46398 | NULL | Binlog Dump GTID | 157 | Master has sent all binlog to slave; waiting for more updates | NULL | | 13 | repuser | 192.168.10.128:42122 | NULL | Binlog Dump GTID | 97 | Master has sent all binlog to slave; waiting for more updates | NULL | +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
主库导入一个sql或自建数据库进行测试 create database wsdb01; create table wst01 (id int,name varchar(40)); insert into wst01 values (1,'ws'),(2,'xhy'); commit; select * from wsdb01.wst01; +------+------+ | id | name | +------+------+ | 1 | ws | | 2 | xhy | +------+------+
备库验证 select * from wsdb01.wst01; +------+------+ | id | name | +------+------+ | 1 | ws | | 2 | xhy | +------+------+
安装MHA
安装MHA node包
1 2 3 4 5 6 7 8 9
所有节点 \cp /usr/local/mysql/lib/libmysqlclient.so.20 /lib64/ yum -y install perl-DBD-MySQL perl-devel perl-CPAN cd /root/MHA tar -xf mha4mysql-node-0.58.tar.gz cd mha4mysql-node-0.58/ perl Makefile.PL make && make install ln -s /root/MHA/mha4mysql-node-0.58/bin/* /bin/
ssh连接测试 masterha_check_ssh --conf=/etc/masterha/app1.cnf #[info] All SSH connection tests passed successfully.
复制环境测试 先注释掉app1.cnf中的master_ip_failver_script,因为此时还没有配置自动切换的参数 如果不注释就会报错 masterha_check_repl --conf=/etc/masterha/app1.cnf #MySQL Replication Health is OK.
#查看当前状态 masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
my $vip = '192.168.10.198/24'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig ens18:$key$vip"; my $ssh_stop_vip = "/sbin/ifconfig ens18:$key down";
重新在/etc/masterha/app1.cnf中添加脚本的参数 master_ip_failover_script=/usr/local/bin/master_ip_failover #自动切换脚本 重新进行健康检查 masterha_check_repl --conf=/etc/masterha/app1.cnf #MySQL Replication Health is OK. 此时失败切换脚本经过检查没有问题就不会报错
Checking the Status of the script.. OK Sat Sep 7 16:03:06 2024 - [info] OK. Sat Sep 7 16:03:06 2024 - [warning] shutdown_script is not defined. Sat Sep 7 16:03:06 2024 - [info] Set master ping interval 1 seconds. Sat Sep 7 16:03:06 2024 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysql-2 -s mysql-3 -s mysql-4 Sat Sep 7 16:03:06 2024 - [info] Starting ping health check on mysql-1(192.168.10.125:3306).. Sat Sep 7 16:03:06 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Started automated(non-interactive) failover. Invalidated master IP address on mysql-1(192.168.10.125:3306) Selected mysql-2(192.168.10.126:3306) as a new master. mysql-2(192.168.10.126:3306): OK: Applying all logs succeeded. mysql-2(192.168.10.126:3306): OK: Activated master IP address. mysql-3(192.168.10.127:3306): OK: Slave started, replicating from mysql-2(192.168.10.126:3306) mysql-4(192.168.10.128:3306): OK: Slave started, replicating from mysql-2(192.168.10.126:3306) mysql-2(192.168.10.126:3306): Resetting slave info succeeded. Master failover to mysql-2(192.168.10.126:3306) completed successfully.
查看mysql-2,可以看到mysql-3和-4已经指向了mysql-2 mysql -uroot -proot -e "show processlist;" mysql: [Warning] Using a password on the command line interface can be insecure. +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 17 | repuser | 192.168.10.127:49552 | NULL | Binlog Dump GTID | 99 | Master has sent all binlog to slave; waiting for more updates | NULL | | 18 | repuser | 192.168.10.128:48006 | NULL | Binlog Dump GTID | 99 | Master has sent all binlog to slave; waiting for more updates | NULL | | 21 | root | localhost | NULL | Query | 0 | starting | show processlist | +----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+
且此时VIP已经漂移到了mysql-2节点 ip a | grep ens -A 10 | grep -w inet inet 192.168.10.126/24 brd 192.168.10.255 scope global noprefixroute ens18 inet 192.168.10.198/24 brd 192.168.10.255 scope global secondary ens18:1
后续问题:避免MHA自动关闭脚本
MHA官方bug,每次切换完VIP,manager进程都会自动嘎掉,所以需要一个脚本来避免这种情况
写完之后直接在master节点中运行即可
1 2 3 4 5 6 7 8 9 10 11
#!/bin/bash whiletrue do COUNT=$(ps -ef | grep manager | egrep -v -c grep) if [ $COUNT -eq 0 ];then manager_start else echo'MHA manager start' fi sleep 5 done
stop slave; reset slave; change master to master_host='192.168.10.126', master_port=3306, master_user='repuser', master_password='repuser123', master_auto_position=1; start slave; show slave status\G
3.配置mysql-1库为只读 自动切换后默认还是可写状态,不会自动关闭可写 mysql -uroot -proot -e "show variables like 'read_only';" mysql: [Warning] Using a password on the command line interface can be insecure. +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | OFF | +---------------+-------+
use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper;
my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, );
my $vip = '192.168.10.198/24'; # Virtual IP my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig ens18:$key$vip"; my $ssh_stop_vip = "/sbin/ifconfig ens18:$key down"; my $ssh_user = "root"; my $new_master_password='root'; my $orig_master_password='root'; GetOptions( 'command=s' => \$command, #'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, #'orig_master_password=s' => \$orig_master_password, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, #'new_master_password=s' => \$new_master_password, );
exit &main();
sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return$curdate . " " . sprintf( "%06d", $microsec ); }
sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } }
sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 );
if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); }
if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); }
push @threads, $ref; } return @threads; }
sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print"ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); }
## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print"ok.\n"; } else { die "Failed!\n"; }
## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); }
print"Disabling the VIP on old master: $orig_master_host \n"; &stop_vip();
## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit$exit_code; } exit$exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper();
## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only();
## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect();
## Update master ip on the catalog database, etc print"Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit$exit_code; } exit$exit_code; } elsif ( $command eq "status" ) {
# A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; }
#!/usr/bin/env perl use strict; use warnings FATAL => 'all';
use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper;
my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, );
my $vip = '192.168.1.198/24'; my $key = '1'; my $ssh_start_vip ="systemctl start keepalived"; my $ssh_stop_vip ="systemctl stop keepalived"; my $orig_master_ssh_port = 22; my $new_master_ssh_port = 22; my $ssh_user = "root"; my $new_master_password='root'; my $orig_master_password='root'; GetOptions( 'command=s' => \$command, #'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, #'orig_master_password=s' => \$orig_master_password, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, #'new_master_password=s' => \$new_master_password, );
exit &main();
sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return$curdate . " " . sprintf( "%06d", $microsec ); }
sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } }
sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 );
if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); }
if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); }
push @threads, $ref; } return @threads; }
sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print"ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); }
## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print"ok.\n"; } else { die "Failed!\n"; }
## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); }
print"Disabling the VIP on old master: $orig_master_host \n"; &stop_vip();
## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit$exit_code; } exit$exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper();
## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only();
## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect();
## Update master ip on the catalog database, etc print"Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit$exit_code; } exit$exit_code; } elsif ( $command eq "status" ) {
# A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; }
mysql-2(第二台主) ip a show ens18 | grep 192 inet 192.168.10.126/24 brd 192.168.10.255 scope global noprefixroute ens18 inet 192.168.10.198/24 scope global secondary ens18 mysql -uroot -proot -e "show processlist;" +-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 74 | repuser | 192.168.10.127:50230 | NULL | Binlog Dump GTID | 129 | Master has sent all binlog to slave; waiting for more updates | NULL | | 75 | repuser | 192.168.10.128:48684 | NULL | Binlog Dump GTID | 129 | Master has sent all binlog to slave; waiting for more updates | NULL | | 141 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+---------+----------------------+------+------------------+------+---------------------------------------------------------------+------------------+ mysql -uroot -proot -e "show variables like 'server_id';" mysql: [Warning] Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | server_id | 263306 | +---------------+--------+ 可以看到剩余mysql-3和mysql-4成功指向了mysql-2为新的主库