现网一个产品报警主从延时,到备机上检查,果然延时(Seconds_Behind_Master)越来越严重,但是Relay_Log_pos值也是慢慢变大的,说明备机在跟主机数据,但跟得比较慢,特别是有System Lock。分析一下。
一、检查
mysql> show processlist;+-------+-------------+--------------------+------+-------------+--------+-----------------------------------------------------------------------------+------------------+| Id | User | Host | db | Command | Time | State | Info |+-------+-------------+--------------------+------+-------------+--------+-----------------------------------------------------------------------------+------------------+| 1 | system user | | NULL | Connect | 171318 | Waiting for master to send event | NULL || 2 | system user | | NULL | Connect | 312206 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL || 3 | system user | | NULL | Connect | 306 | Waiting for an event from Coordinator | NULL || 4 | system user | | NULL | Connect | 10587 | System Lock | NULL || 5 | system user | | NULL | Connect | 126 | Waiting for an event from Coordinator | NULL || 6 | system user | | NULL | Connect | 0 | Waiting for an event from Coordinator | NULL || 13 | repl | 192.168.17.59:61674 | NULL | Binlog Dump | 171281 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL || 74870 | root | localhost | NULL | Query | 0 | init | show processlist |+-------+-------------+--------------------+------+-------------+--------+-----------------------------------------------------------------------------+------------------+9 rows in set (0.00 sec)mysql> show slave status \G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Log_File: mysql-bin.000057 Read_Master_Log_Pos: 792576413 Relay_Log_File: relay-bin.000170 Relay_Log_Pos: 381505193 Relay_Master_Log_File: mysql-bin.000057 Slave_IO_Running: Yes Slave_SQL_Running: Yes Exec_Master_Log_Pos: 792576413 Relay_Log_Space: 561858267 Until_Condition: None Seconds_Behind_Master: 10627 Master_UUID: 41690a50-296b-11e6-a0f1-525400fd6bba Master_Info_File: /data/mysql/datanode1/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: System Lock Master_Retry_Count: 864001 row in set (0.00 sec)
二、定位
[root@db02 datanode1]# mysqlbinlog --no-defaults -v relay-bin.000170 --start-position=381505193 --base64-output=decode-rows |more/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 381505193#161124 14:29:29 server id 53 end_log_pos 796951500 CRC32 0x490aa291 Query thread_id=96281495 exec_time=0 error_code=0SET TIMESTAMP=1479968969/*!*/;SET @@session.pseudo_thread_id=96281495/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=1075838976/*!*/;SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=1/*!*/;/*!\C utf8 *//*!*/;SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN# at 381507211#161124 14:29:29 server id 53 end_log_pos 796953659 CRC32 0x9c5e6005 Update_rows: table id 11108 flags: STMT_END_F### UPDATE `db_test`.`tbl_user_log`### WHERE### @1=228856### @2='test_mobi'### @3='2346150430'### @4='2016-06-02 14:34:25'### @5='2016-11-24 14:23:34'### @6=NULL### @7=0### @8=0### @9=2709### @10='1.1.1'### @11='client'### @12=NULL### @13='2016-06-24 21:45:22'### @14=282### @15='userinfo_sys'### SET### @1=228856### @2='test_mobi'### @3='2346150430'### @4='2016-06-02 14:34:25'### @5='2016-11-24 14:29:29'### @6=NULL### @7=0### @8=0### @9=2710### @10='1.1.1'### @11='client'### @12=NULL### @13='2016-06-24 21:45:22'### @14=282### @15='userinfo_sys'# at 381507429#161124 14:29:29 server id 53 end_log_pos 796953690 CRC32 0x7eec6df7 Xid = 2585781884COMMIT/*!*/;
三、查看打开的表
mysql> show open tables;Database Table In_use Name_locked| db_test | tbl_1 | 0 | 0 || db_test | tbl_user_log | 1 | 0 || db_test | tbl_2 | 0 | 0 |3 rows in set (0.00 sec)
四、分析排查
基本可以定位是这个大批量的update导致的备机延时。这个tbl_user_log表是建了Btree索引的,索引的数据也是比较均匀分散的,但是上面没有主键或者唯一索引。该表有300多万数据,通过观察,slave上的update在relay回放时是很慢的,索引并没起作用,类似全表扫描。遂在该表上建立主键,如果备机延时比较久了,可以通过忽略记录binglog的方式在主备上加个主键操作。
主机上:set session sql_log_bin=0;alter table tbl_user_log add primary key(id);set session sql_log_bin=1;备机上:set session sql_log_bin=0;alter table tbl_user_log add primary key(id);set session sql_log_bin=1;
五、注意总结
通过对该表添加主键的方式后,备机过了几分钟就把落后几个小时的数据给追回来了。后续对上线的表要做好审计,mysql的innodb表一定得要有主键,设置sql_log_bin时不要加global参数。