MySQL主从异常：a fatal error is encountered when it try to get the value of SERVER

MySQL主从异常：a fatal error is encountered when it try to get the value of SERVER_ID variable from maste

末蓝、 2022-02-03 09:09 315阅读 0赞

最近有几个延时从库遇到以下报错：
Fatal error: The slave I/O thread stops because a fatal error is encountered when it tries to get the value of SERVER_UUID variable from master.
详细的从库信息如下：

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.xx.xx
                  Master_User: xx
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002117
          Read_Master_Log_Pos: 71012880
               Relay_Log_File: mysql-relay-bin.000288
                Relay_Log_Pos: 114227959
        Relay_Master_Log_File: mysql-bin.002112
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: mysql,information_schema,performance_schema
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 214202064
              Relay_Log_Space: 1335422139
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 86400
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1159
                Last_IO_Error: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3210
                  Master_UUID: dc31a493-d8db-11e7-9e4a-286ed488c63d
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 86400
          SQL_Remaining_Delay: 0
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 190508 18:35:16
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: dc31a493-d8db-11e7-9e4a-286ed488c63d:678593385-681788219
            Executed_Gtid_Set: 3279fdcd-7069-11e6-afde-286ed488e688:1022823863-1026311077,
dc31a493-d8db-11e7-9e4a-286ed488c63d:1-680979957
                Auto_Position: 1

可以看到错误代码为1159，查看官方文档得知该错误的含义：

Error: 1159 SQLSTATE: 08S01 (ER_NET_READ_INTERRUPTED)
Message: Got timeout reading communication packets

意思是读取通信包超时，看起来似乎是网络问题。
针对这种报错，都是重新start slave就又恢复正常了。

但是从上面输出信息来看，在190508 18:35:16出现网络超时导致IO THREAD异常停止。那么问题来了，为什么从昨天下午的18点到现在，为什么MySQL从库没有尝试去重连呢？
因为每次出现这种错误，都是执行start slave命令就能使复制恢复正常，如果MySQL从库有尝试过自动重连，那么复制应该就自动恢复了，但是并没有重连，没有重连是从Last_IO_Error_Timestamp: 190508 18:35:16看出来的，因为这表示上一次异常停止是190508 18:35:16，而现在时间是5月9号上午，也就是期间都没有发生过自动重连。
而且可以看到Connect_Retry是60，即重连间隔是60s，同时该系统slave_net_timeout设为了10，即超时10s后自动发生重连。
但是在发生1159超时错误以后，从库并没有自动重连，很奇怪，还没找到原因。目前对于这个错误的解决方法是执行start slave。