MySQL 高可用
MySQL 高可用解决方案
MySQL官方和社区里推出了很多高可用的解决方案,大体如下,仅供参考(数据引用自Percona)
| Method (实现方法) | Level of Availability (可用性级别) |
|---|---|
| Simple replication (简单复制) | 98-99.9% |
| Master-Master/MMM (双主/MMM) | 99% |
| SAN (存储区域网络) | 99.5-99.9% |
| DRBD, MHA, Tungsten Replicator | 99.9% |
| NDBCluster, Galera Cluster | 99.999% |
- MMM: Multi-Master Replication Manager for MySQL,Mysql主主复制管理器是一套灵活的脚本程序,基于perl实现,用来对mysql replication进行监控和故障迁移,并能管理mysql Master-Master复制的配置(同一时间只有一个节点是可写的)
- MHA:Master High Availability,对主节点进行监控,可实现自动故障转移至其它从节点;通过提升某一从节点为新的主节点,基于主从复制实现,还需要客户端配合实现,目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,出于机器成本的考虑,淘宝进行了
改造,目前淘宝TMHA已经支持一主一从
以下技术可以达到金融级的高可用性要求
- Galera Cluster:wsrep(MySQL extended with the Write Set Replication)
通过wsrep协议在全局实现复制;任何一节点都可读写,不需要主从复制,实现多主读写 - GR(Group Replication):MySQL官方提供的组复制技术(MySQL 5.7.17引入的技术),基于原生复制技术Paxos算法,实现了多主更新,复制组由多个server成员构成,组中的每个server可独立地执行事务,但所有读写事务只在冲突检测成功后才会提交
3个节点互相通信,每当有事件发生,都会向其他节点传播该事件,然后协商,如果大多数节点都同意这次的事件,那么该事件将通过,否则该事件将失败或回滚。这些节点可以是单主模型的(single-primary),也可以是多主模型的(multi-primary)。单主模型只有一个主节点可以接受写操作,主节点故障时可以自动选举主节点。多主模型下,所有节点都可以接受写操作,所以没有master-slave的概念。
MHA Master High Availability
MHA 工作原理和架构
官方文档
https://github.com/yoshinorim/mha4mysql-manager/wiki
MHA集群架构

MHA工作原理

- MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃的master保存二进制日志事件(binlog events)
- 识别含有最新更新的slave
- 应用差异的中继日志(relay log)到其他的slave
- 应用从master保存的二进制日志事件(binlog events)到所有slave节点
- 提升一个slave为新的master
- 使其他的slave连接新的master进行复制
- 故障服务器自动被剔除集群(masterha_conf_host),将配置信息去掉
- 旧的Master的 VIP 漂移到新的master上,用户应用就可以访问新的Master
- MHA是一次性的高可用性解决方案,Manager会自动退出
选举新的Master
- 如果设定权重(candidate_master=1),按照权重强制指定新主,但是默认情况下如果一个slave落后master 二进制日志超过100M的relay logs,即使有权重,也会失效.如果设置check_repl_delay=0,即使落后很多日志,也强制选择其为新主
- 如果从库数据之间有差异,最接近于Master的slave成为新主
- 如果所有从库数据都一致,按照配置文件顺序最前面的当新主
数据恢复
- 当主服务器的SSH还能连接,从库对比主库position 或者GTID号,将二进制日志保存至各个从节点并且应用(执行save_binary_logs 实现)
- 当主服务器的SSH不能连接, 对比从库之间的relaylog的差异(执行apply_diff_relay_logs[实现])
注意:
为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL的半同步复制
MHA软件
MHA软件由两部分组成,Manager工具包和Node工具包
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 故障转移(自动或手动)
masterha_conf_host 添加或删除配置的server信息
masterha_stop --conf=app1.cnf 停止MHA
masterha_secondary_check 两个或多个网络线路检查MySQL主服务器的可用
Node工具包:这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具
save_binary_logs #保存和复制master的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog #去除不必要的ROLLBACK事件(MHA已不再使用此工具)
purge_relay_logs #清除中继日志(不会阻塞SQL线程)
MHA自定义扩展:
secondary_check_script #通过多条网络路由检测master的可用性
master_ip_ailover_script #更新Application使用的masterip
shutdown_script #强制关闭master节点
report_script #发送报告
init_conf_load_script #加载初始配置参数
master_ip_online_change_script #更新master节点ip地址
MHA配置文件:
global配置,为各application提供默认配置,默认文件路径 /etc/masterha_default.cnf
application配置:为每个主从复制集群
实现 MHA 实战案例
环境:四台主机
10.0.0.7 CentOS7 MHA管理端
10.0.0.8 CentOS8 MySQL8.0 Master
10.0.0.18 CentOS8 MySQL8.0 Slave1
10.0.0.28 CentOS8 MySQL8.0 Slave2
->master
MHA管理端 ->slave1
->slave2
在管理节点上安装两个包mha4mysql-manager和mha4mysql-node
说明:
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 只支持CentOS7上安装,不支持在CentOS8安装,支持MySQL5.7和MySQL8.0 ,但和CentOS8版本上的Mariadb-10.3.17不兼容
mha4mysql-manager-0.56-0.el6.noarch.rpm 不支持CentOS 8,只支持CentOS7及以下版本
两个安装包
mha4mysql-manager
mha4mysql-node
#下载
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
范例:
[root@mha-manager ~]#yum -y install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@mha-manager ~]#yum -y install mha4mysql-node-0.58-0.el7.centos.noarch.rpm
在所有MySQL服务器上安装mha4mysql-node包
此包支持CentOS 8,7,6
mha4mysql-node
范例:
[root@master ~]#yum -y install mha4mysql-node-0.58-0.el7.centos.noarch.rpm
在所有节点实现相互之间 ssh key 验证
[root@mha-manager ~]#ssh-keygen
[root@mha-manager ~]#ssh-copy-id 127.0.0.1
[root@mha-manager ~]#rsync -av .ssh 10.0.0.8:/root/
[root@mha-manager ~]#rsync -av .ssh 10.0.0.18:/root/
[root@mha-manager ~]#rsync -av .ssh 10.0.0.28:/root/
在管理节点建立配置文件
注意: 此文件的行尾不要加空格等符号
[root@mha-manager ~]#mkdir /etc/mastermha/
[root@mha-manager ~]#vim /etc/mastermha/app1.cnf
[server default]
user=mhauser #用于远程连接MySQL所有节点的用户,需要有管理员的权限
password=ayaka
manager_workdir=/data/mastermha/app1/ #目录会自动生成,无需手动创建
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root #用于实现远程ssh基于KEY的连接,访问二进制日志
repl_user=repluser #主从复制的用户信息
repl_password=ayaka
ping_interval=1 #健康性检查的时间间隔
master_ip_failover_script=/usr/local/bin/master_ip_failover #切换VIP的perl脚本,不支持跨网络,也可用Keepalived实现
report_script=/usr/local/bin/sendmail.sh #当执行报警脚本
check_repl_delay=0 #默认值为1,表示如果slave中从库落后主库relay log超过100M,主库不会选择这个从库为新的master,因为这个从库进行恢复需要很长的时间.通过设置参数check_repl_delay=0,mha触发主从切换时会忽略复制的延时,对于设置candidate_master=1的从库非常有用,这样确保这个从库一定能成为最新的master
master_binlog_dir=/data/mysql/ #指定二进制日志存放的目录,mha4mysql-manager-0.58必须指定,之前版本不需要指定
[server1]
hostname=10.0.0.8
port=3306
candidate_master=1
[server2]
hostname=10.0.0.18
port=3306
[server3]
hostname=10.0.0.28
port=3306
candidate_master=1 #设置为优先候选master,即使不是集群中事件最新的slave,也会优先当master
#最终文件内容
[root@mha-manager ~]#cat /etc/mastermha/app1.cnf
[server default]
user=mhauser
password=ayaka
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=ayaka
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
report_script=/usr/local/bin/sendmail.sh
check_repl_delay=0
master_binlog_dir=/data/mysql/
[server1]
hostname=10.0.0.8
candidate_master=1
[server2]
hostname=10.0.0.18
candidate_master=1
[server3]
hostname=10.0.0.28
说明: 主库宕机谁来接管新的master
1. 所有从节点日志都是一致的,默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致,自动选择最接近于主库的从库充当新主
3. 如果对于某节点设定了权重(candidate_master=1),权重节点会优先选择。但是此节点日志量落后主库超过100M日志的话,也不会被选择。可以配合check_repl_delay=0,关闭日志量的检查,强制选择候选节点
相关脚本
[root@mha-manager ~]#cat /usr/local/bin/sendmail.sh
#!/bin/bash
echo "MHA is failover!" | mail -s "MHA Warning" root@ayaka.com
[root@mha-manager ~]#chmod +x /usr/local/bin/sendmail.sh
[root@mha-manager ~]#vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on
your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
#执行时必须删除下面三行注释
my $vip = '10.0.0.100/24'; #设置Virtual IP
my $key = "1"; #指定VIP所在网卡的别名
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; #指定VIP所在网卡
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
# updating global catalog, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc)here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host\n";
&start_vip();
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --
orig_master_host=host --orig_master_ip=ip --orig_master_port=port --
new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@mha-manager ~]#chmod +x /usr/local/bin/master_ip_failover
实现Master
[root@master ~]#dnf -y install mysql-server
[root@master ~]#mkdir /data/mysql/
[root@master ~]#chown mysql.mysql /data/mysql/
[root@master ~]#vim /etc/my.cnf
[mysqld]
server_id=1
log-bin=/data/mysql/mysql-bin
skip_name_resolve=1
general_log #观察结果,非必须项,生产无需启用
[root@master ~]#systemctl enable --now mysqld
[root@master ~]#mysql
mysql>show master logs;
#如果是MySQL8.0执行下面操操作
mysql> create user repluser@'10.0.0.%' identified by 'ayaka';
mysql> grant replication slave on *.* to repluser@'10.0.0.%';
mysql> create user mhauser@'10.0.0.%' identified by 'ayaka';
mysql> grant all on *.* to mhauser@'10.0.0.%';
#如果是MySQL8.0以前版本执行下面操操作
mysql>grant replication slave on *.* to repluser@'10.0.0.%' identified by'ayaka';
mysql>grant all on *.* to mhauser@'10.0.0.%' identified by 'ayaka';
#配置VIP
[root@master ~]#ifconfig eth0:1 10.0.0.100/24
实现slave
[root@slave ~]#dnf -y install mysql-server
[root@slave ~]#mkdir /data/mysql
[root@slave ~]#chown mysql.mysql /data/mysql/
[root@slave ~]#vim /etc/my.cnf
[mysqld]
server_id=2 #不同节点此值各不相同
log-bin=/data/mysql/mysql-bin
read_only
relay_log_purge=0 #禁止自动删除处理过的relay_log
skip_name_resolve=1 #禁止反向解析
general_log #方便观察的设置,生产无需启用
[root@slave ~]#systemctl enable --now mysqld
[root@slave ~]#mysql
mysql>CHANGE MASTER TO MASTER_HOST='10.0.0.8', MASTER_USER='repluser',
MASTER_PASSWORD='ayaka', MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=156;
mysql>START SLAVE;
检查MHA的环境
#检查环境
[root@mha-manager ~]#masterha_check_ssh --conf=/etc/mastermha/app1.cnf
[root@mha-manager ~]#masterha_check_repl --conf=/etc/mastermha/app1.cnf
#查看状态
[root@mha-manager ~]#masterha_check_status --conf=/etc/mastermha/app1.cnf
范例:
[root@mha-manager ~]#masterha_check_ssh --conf=/etc/mastermha/app1.cnf
Wed Jun 17 09:59:41 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 09:59:41 2020 - [info] Reading application default configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 09:59:41 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 09:59:41 2020 - [info] Starting SSH connection tests..
Wed Jun 17 09:59:42 2020 - [debug]
Wed Jun 17 09:59:41 2020 - [debug] Connecting via SSH from
root@10.0.0.8(10.0.0.8:22) to root@10.0.0.18(10.0.0.18:22)..
Wed Jun 17 09:59:42 2020 - [debug] ok.
Wed Jun 17 09:59:42 2020 - [debug] Connecting via SSH from
root@10.0.0.8(10.0.0.8:22) to root@10.0.0.28(10.0.0.28:22)..
Wed Jun 17 09:59:42 2020 - [debug] ok.
Wed Jun 17 09:59:43 2020 - [debug]
Wed Jun 17 09:59:42 2020 - [debug] Connecting via SSH from
root@10.0.0.18(10.0.0.18:22) to root@10.0.0.8(10.0.0.8:22)..
Wed Jun 17 09:59:42 2020 - [debug] ok.
Wed Jun 17 09:59:42 2020 - [debug] Connecting via SSH from
root@10.0.0.18(10.0.0.18:22) to root@10.0.0.28(10.0.0.28:22)..
Wed Jun 17 09:59:43 2020 - [debug] ok.
Wed Jun 17 09:59:44 2020 - [debug]
Wed Jun 17 09:59:42 2020 - [debug] Connecting via SSH from
root@10.0.0.28(10.0.0.28:22) to root@10.0.0.8(10.0.0.8:22)..
Wed Jun 17 09:59:43 2020 - [debug] ok.
Wed Jun 17 09:59:43 2020 - [debug] Connecting via SSH from
root@10.0.0.28(10.0.0.28:22) to root@10.0.0.18(10.0.0.18:22)..
Wed Jun 17 09:59:43 2020 - [debug] ok.
Wed Jun 17 09:59:44 2020 - [info] All SSH connection tests passed successfully.
[root@mha-manager ~]#masterha_check_repl --conf=/etc/mastermha/app1.cnf
Wed Jun 17 10:00:56 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 10:00:56 2020 - [info] Reading application default configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 10:00:56 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 10:00:56 2020 - [info] MHA::MasterMonitor version 0.56.
Creating directory /data/mastermha/app1/.. done.
Wed Jun 17 10:00:58 2020 - [info] GTID failover mode = 0
Wed Jun 17 10:00:58 2020 - [info] Dead Servers:
Wed Jun 17 10:00:58 2020 - [info] Alive Servers:
Wed Jun 17 10:00:58 2020 - [info] 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:00:58 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:00:58 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:00:58 2020 - [info] Alive Slaves:
Wed Jun 17 10:00:58 2020 - [info] 10.0.0.18(10.0.0.18:3306) Version=10.3.17-
MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Jun 17 10:00:58 2020 - [info] Replicating from 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:00:58 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:00:58 2020 - [info] 10.0.0.28(10.0.0.28:3306) Version=10.3.17-
MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Jun 17 10:00:58 2020 - [info] Replicating from 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:00:58 2020 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:00:58 2020 - [info] Checking slave configurations..
Wed Jun 17 10:00:58 2020 - [info] Checking replication filtering settings..
Wed Jun 17 10:00:58 2020 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jun 17 10:00:58 2020 - [info] Replication filtering check ok.
Wed Jun 17 10:00:58 2020 - [info] GTID (with auto-pos) is not supported
Wed Jun 17 10:00:58 2020 - [info] Starting SSH connection tests..
Wed Jun 17 10:01:00 2020 - [info] All SSH connection tests passed successfully.
Wed Jun 17 10:01:00 2020 - [info] Checking MHA Node version..
Wed Jun 17 10:01:01 2020 - [info] Version check ok.
Wed Jun 17 10:01:01 2020 - [info] Checking SSH publickey authentication settings
on the current master..
Wed Jun 17 10:01:01 2020 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Wed Jun 17 10:01:01 2020 - [info] Master MHA Node version is 0.56.
Wed Jun 17 10:01:01 2020 - [info] Checking recovery script configurations on
10.0.0.8(10.0.0.8:3306)..
Wed Jun 17 10:01:01 2020 - [info] Executing command: save_binary_logs --
command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --
output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 -
-start_file=mariadb-bin.000002
Wed Jun 17 10:01:01 2020 - [info] Connecting to root@10.0.0.8(10.0.0.8:22)..
Creating /data/mastermha/app1 if not exists.. Creating directory
/data/mastermha/app1.. done.
ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mariadb-bin.000002
Wed Jun 17 10:01:02 2020 - [info] Binlog setting check done.
Wed Jun 17 10:01:02 2020 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Wed Jun 17 10:01:02 2020 - [info] Executing command : apply_diff_relay_logs --
command=test --slave_user='mhauser' --slave_host=10.0.0.18 --slave_ip=10.0.0.18
--slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=10.3.17-
MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-
log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:01:02 2020 - [info] Connecting to root@10.0.0.18(10.0.0.18:22)..
Creating directory /data/mastermha/app1/.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:01:02 2020 - [info] Executing command : apply_diff_relay_logs --
command=test --slave_user='mhauser' --slave_host=10.0.0.28 --slave_ip=10.0.0.28
--slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=10.3.17-
MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-
log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:01:02 2020 - [info] Connecting to root@10.0.0.28(10.0.0.28:22)..
Creating directory /data/mastermha/app1/.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:01:03 2020 - [info] Slaves settings check done.
Wed Jun 17 10:01:03 2020 - [info]
10.0.0.8(10.0.0.8:3306) (current master)
+--10.0.0.18(10.0.0.18:3306)
+--10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:01:03 2020 - [info] Checking replication health on 10.0.0.18..
Wed Jun 17 10:01:03 2020 - [info] ok.
Wed Jun 17 10:01:03 2020 - [info] Checking replication health on 10.0.0.28..
Wed Jun 17 10:01:03 2020 - [info] ok.
Wed Jun 17 10:01:03 2020 - [warning] master_ip_failover_script is not defined.
Wed Jun 17 10:01:03 2020 - [warning] shutdown_script is not defined.
Wed Jun 17 10:01:03 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@mha-manager ~]#masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
启动MHA
#开启MHA,默认是前台运行,生产环境一般为后台执行
nohup masterha_manager --conf=/etc/mastermha/app1.cnf --remove_dead_master_conf --ignore_last_failover &> /dev/null
#测试环境:
#masterha_manager --conf=/etc/mastermha/app1.cnf --remove_dead_master_conf --ignore_last_failover
#如果想停止后台执行的MHA,可以执行下面命令
[root@mha-master ~]#masterha_stop --conf=/etc/mastermha/app1.cnf
Stopped app1 successfully.
#查看状态
masterha_check_status --conf=/etc/mastermha/app1.cnf
范例:
[root@mha-manager ~]#masterha_manager --conf=/etc/mastermha/app1.cnf --remove_dead_master_conf --ignore_last_failover
Wed Jun 17 10:02:58 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 10:02:58 2020 - [info] Reading application default configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 10:02:58 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
#查看到健康性检查
[root@master ~]#tail -f /var/lib/mysql/centos8.log
200617 20:14:16 28 Query SELECT 1 As Value
200617 20:14:17 28 Query SELECT 1 As Value
200617 20:14:18 28 Query SELECT 1 As Value
200617 20:14:19 28 Query SELECT 1 As Value
200617 20:14:20 28 Query SELECT 1 As Value
200617 20:14:21 28 Query SELECT 1 As Value
[root@mha-manager ~]#masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 (pid:25994) is running(0:PING_OK), master:10.0.0.8
排错日志
tail /data/mastermha/app1/manager.log
范例:
[root@mha-manager ~]#cat /data/mastermha/app1/manager.log
Wed Jun 17 10:02:58 2020 - [info] MHA::MasterMonitor version 0.56.
Wed Jun 17 10:03:00 2020 - [info] GTID failover mode = 0
Wed Jun 17 10:03:00 2020 - [info] Dead Servers:
Wed Jun 17 10:03:00 2020 - [info] Alive Servers:
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:03:00 2020 - [info] Alive Slaves:
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.18(10.0.0.18:3306) Version=10.3.17-
MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Jun 17 10:03:00 2020 - [info] Replicating from 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.28(10.0.0.28:3306) Version=10.3.17-
MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Jun 17 10:03:00 2020 - [info] Replicating from 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Checking slave configurations..
Wed Jun 17 10:03:00 2020 - [info] Checking replication filtering settings..
Wed Jun 17 10:03:00 2020 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jun 17 10:03:00 2020 - [info] Replication filtering check ok.
Wed Jun 17 10:03:00 2020 - [info] GTID (with auto-pos) is not supported
Wed Jun 17 10:03:00 2020 - [info] Starting SSH connection tests..
Wed Jun 17 10:03:02 2020 - [info] All SSH connection tests passed successfully.
Wed Jun 17 10:03:02 2020 - [info] Checking MHA Node version..
Wed Jun 17 10:03:03 2020 - [info] Version check ok.
Wed Jun 17 10:03:03 2020 - [info] Checking SSH publickey authentication settings
on the current master..
Wed Jun 17 10:03:03 2020 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Wed Jun 17 10:03:03 2020 - [info] Master MHA Node version is 0.56.
Wed Jun 17 10:03:03 2020 - [info] Checking recovery script configurations on
10.0.0.8(10.0.0.8:3306)..
Wed Jun 17 10:03:03 2020 - [info] Executing command: save_binary_logs --
command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --
output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 -
-start_file=mariadb-bin.000002
Wed Jun 17 10:03:03 2020 - [info] Connecting to root@10.0.0.8(10.0.0.8:22)..
Creating /data/mastermha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mariadb-bin.000002
Wed Jun 17 10:03:04 2020 - [info] Binlog setting check done.
Wed Jun 17 10:03:04 2020 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Wed Jun 17 10:03:04 2020 - [info] Executing command : apply_diff_relay_logs --
command=test --slave_user='mhauser' --slave_host=10.0.0.18 --slave_ip=10.0.0.18
--slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=10.3.17-
MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-
log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:03:04 2020 - [info] Connecting to root@10.0.0.18(10.0.0.18:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:03:05 2020 - [info] Executing command : apply_diff_relay_logs --
command=test --slave_user='mhauser' --slave_host=10.0.0.28 --slave_ip=10.0.0.28
--slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=10.3.17-
MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-
log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:03:05 2020 - [info] Connecting to root@10.0.0.28(10.0.0.28:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:03:05 2020 - [info] Slaves settings check done.
Wed Jun 17 10:03:05 2020 - [info]
10.0.0.8(10.0.0.8:3306) (current master)
+--10.0.0.18(10.0.0.18:3306)
+--10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:03:05 2020 - [warning] master_ip_failover_script is not defined.
Wed Jun 17 10:03:05 2020 - [warning] shutdown_script is not defined.
Wed Jun 17 10:03:05 2020 - [info] Set master ping interval 1 seconds.
Wed Jun 17 10:03:05 2020 - [warning] secondary_check_script is not defined. It
is highly recommended setting it to check master reachability from two or more
routes.
Wed Jun 17 10:03:05 2020 - [info] Starting ping health check on
10.0.0.8(10.0.0.8:3306)..
Wed Jun 17 10:03:05 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL
doesn't respond..
模拟故障
#模拟故障
[root@master ~]#systemctl stop mysqld
#当 master down机后,mha管理程序自动退出
[root@mha-manager ~]#masterha_manager --conf=/etc/mastermha/app1.cnf
Wed Jun 17 10:02:58 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 10:02:58 2020 - [info] Reading application default configuration
from /etc/mastermha/app1.cnf..
Wed Jun 17 10:02:58 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 10:06:37 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 10:06:37 2020 - [info] Reading application default configuration
from /etc/mastermha/app1.cnf..
Wed Jun 17 10:06:37 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
[root@mha-manager ~]#cat /data/mastermha/app1/manager.log
Wed Jun 17 10:02:58 2020 - [info] MHA::MasterMonitor version 0.56.
Wed Jun 17 10:03:00 2020 - [info] GTID failover mode = 0
Wed Jun 17 10:03:00 2020 - [info] Dead Servers:
Wed Jun 17 10:03:00 2020 - [info] Alive Servers:
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:03:00 2020 - [info] Alive Slaves:
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:03:00 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:03:00 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:03:00 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Current Alive Master:
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:03:00 2020 - [info] Checking slave configurations..
Wed Jun 17 10:03:00 2020 - [info] Checking replication filtering settings..
Wed Jun 17 10:03:00 2020 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jun 17 10:03:00 2020 - [info] Replication filtering check ok.
Wed Jun 17 10:03:00 2020 - [info] GTID (with auto-pos) is not supported
Wed Jun 17 10:03:00 2020 - [info] Starting SSH connection tests..
Wed Jun 17 10:03:02 2020 - [info] All SSH connection tests passed
successfully.
Wed Jun 17 10:03:02 2020 - [info] Checking MHA Node version..
Wed Jun 17 10:03:03 2020 - [info] Version check ok.
Wed Jun 17 10:03:03 2020 - [info] Checking SSH publickey authentication
settings on the current master..
Wed Jun 17 10:03:03 2020 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Wed Jun 17 10:03:03 2020 - [info] Master MHA Node version is 0.56.
Wed Jun 17 10:03:03 2020 - [info] Checking recovery script configurations on
10.0.0.8(10.0.0.8:3306)..
Wed Jun 17 10:03:03 2020 - [info] Executing command: save_binary_logs --
command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --
output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 -
-start_file=mariadb-bin.000002
Wed Jun 17 10:03:03 2020 - [info] Connecting to
root@10.0.0.8(10.0.0.8:22)..
Creating /data/mastermha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mariadb-bin.000002
Wed Jun 17 10:03:04 2020 - [info] Binlog setting check done.
Wed Jun 17 10:03:04 2020 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Wed Jun 17 10:03:04 2020 - [info] Executing command :
apply_diff_relay_logs --command=test --slave_user='mhauser' --
slave_host=10.0.0.18 --slave_ip=10.0.0.18 --slave_port=3306 --
workdir=/data/mastermha/app1/ --target_version=10.3.17-MariaDB-log --
manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --
relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:03:04 2020 - [info] Connecting to
root@10.0.0.18(10.0.0.18:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:03:05 2020 - [info] Executing command :
apply_diff_relay_logs --command=test --slave_user='mhauser' --
slave_host=10.0.0.28 --slave_ip=10.0.0.28 --slave_port=3306 --
workdir=/data/mastermha/app1/ --target_version=10.3.17-MariaDB-log --
manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --
relay_dir=/var/lib/mysql/ --slave_pass=xxx
Wed Jun 17 10:03:05 2020 - [info] Connecting to
root@10.0.0.28(10.0.0.28:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 17 10:03:05 2020 - [info] Slaves settings check done.
Wed Jun 17 10:03:05 2020 - [info]
10.0.0.8(10.0.0.8:3306) (current master)
+--10.0.0.18(10.0.0.18:3306)
+--10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:03:05 2020 - [warning] master_ip_failover_script is not
defined.
Wed Jun 17 10:03:05 2020 - [warning] shutdown_script is not defined.
Wed Jun 17 10:03:05 2020 - [info] Set master ping interval 1 seconds.
Wed Jun 17 10:03:05 2020 - [warning] secondary_check_script is not defined.
It is highly recommended setting it to check master reachability from two or
more routes.
ed Jun 17 10:03:05 2020 - [info] Starting ping health check on
10.0.0.8(10.0.0.8:3306)..
Wed Jun 17 10:03:05 2020 - [info] Ping(SELECT) succeeded, waiting until
MySQL doesn't respond..
Wed Jun 17 10:06:31 2020 - [warning] Got timeout on MySQL Ping(SELECT) child
process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line
431.
Wed Jun 17 10:06:31 2020 - [info] Executing SSH check script:
save_binary_logs --command=test --start_pos=4 --
binlog_dir=/var/lib/mysql,/var/log/mysql --
output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 -
-binlog_prefix=mariadb-bin
Wed Jun 17 10:06:32 2020 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.0.8' (4))
Wed Jun 17 10:06:32 2020 - [warning] Connection failed 2 time(s)..
Wed Jun 17 10:06:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.0.8' (4))
Wed Jun 17 10:06:33 2020 - [warning] Connection failed 3 time(s)..
Wed Jun 17 10:06:34 2020 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.0.8' (4))
Wed Jun 17 10:06:34 2020 - [warning] Connection failed 4 time(s)..
Wed Jun 17 10:06:36 2020 - [warning] HealthCheck: Got timeout on checking
SSH connection to 10.0.0.8! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
line 342.
Wed Jun 17 10:06:36 2020 - [warning] Master is not reachable from health
checker!
Wed Jun 17 10:06:36 2020 - [warning] Master 10.0.0.8(10.0.0.8:3306) is not
reachable!
Wed Jun 17 10:06:36 2020 - [warning] SSH is NOT reachable.
Wed Jun 17 10:06:36 2020 - [info] Connecting to a master server failed.
Reading configuration file /etc/masterha_default.cnf and /etc/mastermha/app1.cnf
again, and trying to connect to all servers to check server status..
Wed Jun 17 10:06:36 2020 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 17 10:06:36 2020 - [info] Reading application default configuration
from /etc/mastermha/app1.cnf..
Wed Jun 17 10:06:36 2020 - [info] Reading server configuration from
/etc/mastermha/app1.cnf..
Wed Jun 17 10:06:37 2020 - [info] GTID failover mode = 0
Wed Jun 17 10:06:37 2020 - [info] Dead Servers:
Wed Jun 17 10:06:37 2020 - [info] 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:37 2020 - [info] Alive Servers:
Wed Jun 17 10:06:37 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:06:37 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:06:37 2020 - [info] Alive Slaves:
Wed Jun 17 10:06:37 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:37 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:37 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:06:37 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:37 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:37 2020 - [info] Checking slave configurations..
Wed Jun 17 10:06:37 2020 - [info] Checking replication filtering settings..
Wed Jun 17 10:06:37 2020 - [info] Replication filtering check ok.
Wed Jun 17 10:06:37 2020 - [info] Master is down!
Wed Jun 17 10:06:37 2020 - [info] Terminating monitoring script.
Wed Jun 17 10:06:37 2020 - [info] Got exit code 20 (Master dead).
Wed Jun 17 10:06:37 2020 - [info] MHA::MasterFailover version 0.56.
Wed Jun 17 10:06:37 2020 - [info] Starting master failover.
Wed Jun 17 10:06:37 2020 - [info]
Wed Jun 17 10:06:37 2020 - [info] * Phase 1: Configuration Check Phase..
Wed Jun 17 10:06:37 2020 - [info]
Wed Jun 17 10:06:38 2020 - [info] GTID failover mode = 0
Wed Jun 17 10:06:38 2020 - [info] Dead Servers:
Wed Jun 17 10:06:38 2020 - [info] 10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:38 2020 - [info] Checking master reachability via
MySQL(double check)...
Wed Jun 17 10:06:39 2020 - [info] ok.
Wed Jun 17 10:06:39 2020 - [info] Alive Servers:
Wed Jun 17 10:06:39 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:06:39 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:06:39 2020 - [info] Alive Slaves:
Wed Jun 17 10:06:39 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:39 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:39 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:06:39 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:39 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:39 2020 - [info] Starting Non-GTID based failover.
Wed Jun 17 10:06:39 2020 - [info]
Wed Jun 17 10:06:39 2020 - [info] ** Phase 1: Configuration Check Phase
completed.
Wed Jun 17 10:06:39 2020 - [info]
Wed Jun 17 10:06:39 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jun 17 10:06:39 2020 - [info]
Wed Jun 17 10:06:39 2020 - [info] Forcing shutdown so that applications
never connect to the current master..
Wed Jun 17 10:06:39 2020 - [warning] master_ip_failover_script is not set.
Skipping invalidating dead master IP address.
Wed Jun 17 10:06:39 2020 - [warning] shutdown_script is not set. Skipping
explicit shutting down of the dead master.
Wed Jun 17 10:06:40 2020 - [info] * Phase 2: Dead Master Shutdown Phase
completed.
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3: Master Recovery Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] The latest binary log file/position on all
slaves is mariadb-bin.000002:3062073
Wed Jun 17 10:06:40 2020 - [info] Latest slaves (Slaves that received relay
log files to the latest):
Wed Jun 17 10:06:40 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:40 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:40 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:06:40 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:40 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:40 2020 - [info] The oldest binary log file/position on all
slaves is mariadb-bin.000002:3062073
Wed Jun 17 10:06:40 2020 - [info] Oldest slaves:
Wed Jun 17 10:06:40 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:40 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:40 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:06:40 2020 - [info] 10.0.0.28(10.0.0.28:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:40 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog
Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [warning] Dead Master is not SSH reachable. Could
not save it's binlogs. Transactions that were not sent to the latest slave
(Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3.3: Determining New Master
Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] Finding the latest slave that has all
relay logs for recovering other slaves..
Wed Jun 17 10:06:40 2020 - [info] All slaves received relay logs to the same
position. No need to resync each other.
Wed Jun 17 10:06:40 2020 - [info] Searching new master from slaves..
Wed Jun 17 10:06:40 2020 - [info] Candidate masters from the configuration
file:
Wed Jun 17 10:06:40 2020 - [info] 10.0.0.18(10.0.0.18:3306)
Version=10.3.17-MariaDB-log (oldest major version between slaves) log-
bin:enabled
Wed Jun 17 10:06:40 2020 - [info] Replicating from
10.0.0.8(10.0.0.8:3306)
Wed Jun 17 10:06:40 2020 - [info] Primary candidate for the new Master
(candidate_master is set)
Wed Jun 17 10:06:40 2020 - [info] Non-candidate masters:
Wed Jun 17 10:06:40 2020 - [info] Searching from candidate_master slaves
which have received the latest relay log events..
Wed Jun 17 10:06:40 2020 - [info] New master is 10.0.0.18(10.0.0.18:3306)
Wed Jun 17 10:06:40 2020 - [info] Starting master failover..
Wed Jun 17 10:06:40 2020 - [info]
From:
10.0.0.8(10.0.0.8:3306) (current master)
+--10.0.0.18(10.0.0.18:3306)
+--10.0.0.28(10.0.0.28:3306)
To:
10.0.0.18(10.0.0.18:3306) (new master)
+--10.0.0.28(10.0.0.28:3306)
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3.3: New Master Diff Log
Generation Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] This server has all relay logs. No need
to generate diff files from the latest slave.
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] *NOTICE: If any error happens from this
phase, manual recovery is needed.
Wed Jun 17 10:06:40 2020 - [info] Starting recovery on
10.0.0.18(10.0.0.18:3306)..
Wed Jun 17 10:06:40 2020 - [info] This server has all relay logs. Waiting
all logs to be applied..
Wed Jun 17 10:06:40 2020 - [info] done.
Wed Jun 17 10:06:40 2020 - [info] All relay logs were successfully applied.
Wed Jun 17 10:06:40 2020 - [info] Getting new master's binlog name and
position..
Wed Jun 17 10:06:40 2020 - [info] mariadb-bin.000002:344
Wed Jun 17 10:06:40 2020 - [info] All other slaves should start replication
from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.18',
MASTER_PORT=3306, MASTER_LOG_FILE='mariadb-bin.000002', MASTER_LOG_POS=344,
MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Wed Jun 17 10:06:40 2020 - [warning] master_ip_failover_script is not set.
Skipping taking over new master IP address.
Wed Jun 17 10:06:40 2020 - [info] Setting read_only=0 on
10.0.0.18(10.0.0.18:3306)..
Wed Jun 17 10:06:40 2020 - [info] ok.
Wed Jun 17 10:06:40 2020 - [info] ** Finished master recovery successfully.
Wed Jun 17 10:06:40 2020 - [info] * Phase 3: Master Recovery Phase
completed.
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff
Log Generation Phase..
Wed Jun 17 10:06:40 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] -- Slave diff file generation on host
10.0.0.28(10.0.0.28:3306) started, pid: 24706. Check tmp log
/data/mastermha/app1//10.0.0.28_3306_20200617100637.log if it takes time..
Wed Jun 17 10:06:41 2020 - [info]
Wed Jun 17 10:06:41 2020 - [info] Log messages from 10.0.0.28 ...
Wed Jun 17 10:06:41 2020 - [info]
Wed Jun 17 10:06:40 2020 - [info] This server has all relay logs. No need
to generate diff files from the latest slave.
Wed Jun 17 10:06:41 2020 - [info] End of log messages from 10.0.0.28.
Wed Jun 17 10:06:41 2020 - [info] -- 10.0.0.28(10.0.0.28:3306) has the
latest relay log events.
Wed Jun 17 10:06:41 2020 - [info] Generating relay diff files from the
latest slave succeeded.
Wed Jun 17 10:06:41 2020 - [info]
Wed Jun 17 10:06:41 2020 - [info] * Phase 4.2: Starting Parallel Slave Log
Apply Phase..
Wed Jun 17 10:06:41 2020 - [info]
Wed Jun 17 10:06:41 2020 - [info] -- Slave recovery on host
10.0.0.28(10.0.0.28:3306) started, pid: 24708. Check tmp log
/data/mastermha/app1//10.0.0.28_3306_20200617100637.log if it takes time..
Wed Jun 17 10:06:42 2020 - [info]
Wed Jun 17 10:06:42 2020 - [info] Log messages from 10.0.0.28 ...
Wed Jun 17 10:06:42 2020 - [info]
Wed Jun 17 10:06:41 2020 - [info] Starting recovery on
10.0.0.28(10.0.0.28:3306)..
Wed Jun 17 10:06:41 2020 - [info] This server has all relay logs. Waiting
all logs to be applied..
Wed Jun 17 10:06:41 2020 - [info] done.
Wed Jun 17 10:06:41 2020 - [info] All relay logs were successfully applied.
Wed Jun 17 10:06:41 2020 - [info] Resetting slave 10.0.0.28(10.0.0.28:3306)
and starting replication from the new master 10.0.0.18(10.0.0.18:3306)..
Wed Jun 17 10:06:41 2020 - [info] Executed CHANGE MASTER.
Wed Jun 17 10:06:42 2020 - [info] Slave started.
Wed Jun 17 10:06:42 2020 - [info] End of log messages from 10.0.0.28.
Wed Jun 17 10:06:42 2020 - [info] -- Slave recovery on host
10.0.0.28(10.0.0.28:3306) succeeded.
Wed Jun 17 10:06:42 2020 - [info] All new slave servers recovered
successfully.
Wed Jun 17 10:06:42 2020 - [info]
Wed Jun 17 10:06:42 2020 - [info] * Phase 5: New master cleanup phase..
Wed Jun 17 10:06:42 2020 - [info]
Wed Jun 17 10:06:42 2020 - [info] Resetting slave info on the new master..
Wed Jun 17 10:06:42 2020 - [info] 10.0.0.18: Resetting slave info
succeeded.
Wed Jun 17 10:06:42 2020 - [info] Master failover to
10.0.0.18(10.0.0.18:3306) completed successfully.
Wed Jun 17 10:06:42 2020 - [info]
----- Failover Report -----
app1: MySQL Master failover 10.0.0.8(10.0.0.8:3306) to
10.0.0.18(10.0.0.18:3306) succeeded
Master 10.0.0.8(10.0.0.8:3306) is down!
Check MHA Manager logs at mha-manager:/data/mastermha/app1/manager.log for
details.
Started automated(non-interactive) failover.
The latest slave 10.0.0.18(10.0.0.18:3306) has all relay logs for recovery.
Selected 10.0.0.18(10.0.0.18:3306) as a new master.
10.0.0.18(10.0.0.18:3306): OK: Applying all logs succeeded.
10.0.0.28(10.0.0.28:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.0.0.28(10.0.0.28:3306): OK: Applying all logs succeeded. Slave started,
replicating from 10.0.0.18(10.0.0.18:3306)
10.0.0.18(10.0.0.18:3306): Resetting slave info succeeded.
Master failover to 10.0.0.18(10.0.0.18:3306) completed successfully.
[root@mha-manager ~]#masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
#验证VIP漂移至新的Master上
[root@slave1 ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
group default qlen 1000
link/ether 00:0c:29:e1:0e:53 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.18/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.100/8 brd 10.255.255.255 scope global eth0:1
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fee1:e53/64 scope link
valid_lft forever preferred_lft forever
#自动修改manager节点上的配置文件,将master剔除
[root@mha-manager ~]#cat /etc/mastermha/app1.cnf
[server2]
hostname=10.0.0.18
port=3306
candidate_master=1
[server3]
hostname=10.0.0.28
port=3306
收到报警邮件
注意: 如果出错,需要删除下面文件再执行MHA
root@mha-manager ~]#rm -f /data/mastermha/app1/app1.failover.error
修复主从
修复故障的主库,保证数据同步
修复主从,手工新故障库加入新的主,设为为从库
修复manager的配置文件
清理相关目录
检查ssh互信和replication的复制是否成功
检查VIP,如果有问题,重新配置VIP
重新运行MHA,查询MHA状态,确保运行正常
如果再次运行MHA,需要先删除下面文件
MHA只能漂移一次,如果多次使用必须删除以下文件,要不MHA不可重用
[root@mha-manager ~]#rm -rf /data/mastermha/app1/ #mha_master自己的工作路径
[root@mha-manager ~]#rm -rf /data/mastermha/app1/manager.log #mha_master自己的日志文件
[root@master ~]#rm -rf /data/mastermha/app1/ #每个远程主机即三个节点的的工作目录