Nov. 28 @ 8:25 a.m. 2007
Written by: Adam Olsen (synic)
At SendOutCards, we use MySQL replication to ensure that if our main database server goes down because of hardware failure, we'll still have an server that is up to date with our data. In a nutshell, our main database server sends another server every update that is performed on itself.
For non-hardware issues (IE, a bad SQL statement that wipes out everyone's account balance), we've written a script on our slave server that creates a complete backup of the database every night.
This system is great, in theory. Unfortunately, MySQL replication seems to be highly problematic, at least with our database schema and hardware setup. Before long, the slave server always becomes corrupt after an arbitrary length of time (if you have any idea of why this might be happening, please contact me). Due to the size of our database, we've had to change the
To make sure we know about the situation as soon as possible, I've written a simple python script to monitor the slave status via cron that will send an email once an error is detected.
At SendOutCards, we use MySQL replication to ensure that if our main database server goes down because of hardware failure, we'll still have an server that is up to date with our data. In a nutshell, our main database server sends another server every update that is performed on itself.
For non-hardware issues (IE, a bad SQL statement that wipes out everyone's account balance), we've written a script on our slave server that creates a complete backup of the database every night.
This system is great, in theory. Unfortunately, MySQL replication seems to be highly problematic, at least with our database schema and hardware setup. Before long, the slave server always becomes corrupt after an arbitrary length of time (if you have any idea of why this might be happening, please contact me). Due to the size of our database, we've had to change the
expire_log_days (the number of days before the binary log is deleted) in /etc/mysql/my.cnf to 2 days; the reason is that if the slave becomes corrupt and is not repaired immediately the log information can become out of date and a repair without stopping the master database is impossible.To make sure we know about the situation as soon as possible, I've written a simple python script to monitor the slave status via cron that will send an email once an error is detected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #!/usr/bin/env python import MySQLdb, MySQLdb.cursors, os ## REPLACE THE FOLLOWING WITH YOUR INFORMATION db_user = 'someuser' db_pass = 'somepassword' email = 'administrators@email.tld' db = MySQLdb.connect(user=db_user, passwd=db_pass, db='mysql', cursorclass=MySQLdb.cursors.DictCursor) cur = db.cursor() cur.execute('SHOW SLAVE STATUS') row = cur.fetchone() errno = row['Last_Errno'] error = row['Last_Error'] if errno > 0: os.system('echo \"%s\" | mail -s \"Slave error on `hostname`\" %s' % (error, email)) |
No comments have been added yet
Add a comment: