Advanced MySQL Replication Techniques
Pages: 1, 2, 3, 4, 5, 6
Fail-Over, Client Side
So far, I have shown how a server replaces a failed one. All is well when seen from within the replication array, but what about the poor client that was pointing at the now defunct master and keeps getting connection time-out errors?
Here comes the bad news: there is no silver bullet for this problem. Because it's on the application side, you must find an application solution. There are a few server-side solutions, but they are either limited to only two nodes or they depend heavily on a specific operating system (such as CARP, UCARP, or Linux-HA).
The consoling news is that MySQL Cluster is no better in this field. It may come as a surprise to many, but the answer for MySQL high availability is to use multiple data sources in Java. Unless some integration between operating system and MySQL comes up, the situation is that you are pretty much on your own when you need a high-availability system.
The good news is that if circular replication is satisfactory for your needs, a simple load balancer will be enough to guarantee at the same time a good spread of resources and a high-availability system. You can buy a hardware load balancer, or you can implement a software one within your application. You may also monitor the logs of the fail-over events we have seen in this article to exclude a dead node from the ones through which your balancer should loop. If you use such an approach and get random connections coming from the balancer, be sure to use them by transaction, not by query. Even if you don't use transactions, use only one connection within the same unit, be it a web page, a module function, or an administration program.
Let me try to explain further. You are using a device that gets you a random database connection to one of the nodes. You could just use such device for each query in your application, thus spreading the load among all nodes. However, this extreme randomization would not be healthy. It may happen that you do an INSERT query in one node, and then a SELECT query in another node, where the effects of your insert may not yet be replicated. To avoid this, your application should get a connection and use it to execute all queries that are logically related.
Missing pieces
What I have shown in this article is, I believe, a big step forward in the direction of a more useful replication system. There are still several missing pieces to extend this proof of concept into a well-behaved engine.
One, I have to mention that MySQL AB is planning an extension of the current MySQL replication so that it will include some devices to resolve conflicts (such as the clash of unique keys). It is still in early stages, but feel free to explore.
I also need to mention some half-baked features in MySQL 5.x. You know that one of MySQL 5.0's major enhancements was the information schema database, which should offer a coherent view of database metadata objects. Unfortunately, a missing piece in this collection of metadata is all the data concerning replication. Therefore, because you can access replication status only through SHOW SOMETHING statements (as of today, in MySQL 5.1.9), stored procedures cannot access this information. A further problem is that the parameters of CHANGE MASTER TO must be text literals. Variables are not accepted, thus reducing the flexibility of stored procedures. This inconsistency has been reported through the appropriate channels, and we hope somebody will act on it. For the time being, it all means that you can achieve fine-grained replication administration only through external procedures.
(Actually, that is not exactly true. There are some undocumented and unofficial--even deprecated--practices that can overcome these limitations. For the brave, I will illustrate these techniques in Higher Order MySQL, a talk at the MySQL Users Conference 2006.)
Enhanced circular replication offers additional features that I have not shown here. It is possible, for instance, to exchange messages between servers. That is quite useful in the event of master replacement, when the slave could ask the master to perform a RESET MASTER before resuming replication. I leave these amenities to some other article, to avoid burdening this one too much.
However, let me remind you that the code in this article is just a proof of concept, which needs some hardening before being used for production. A real-world application will need to double-check whether the master is really dead before switching to a new master, and the new master must be questioned before the switch to ensure that it's ready to take over; the other nodes should be informed; and so on. You can perform all these actions using the currently available technology, although they will be more effective and easier to implement when the currently planned improvements on the data dictionary are available--according to the information at hand, in MySQL version 5.1.
Playing with the System
Experimenting with replication is not so easy. You need to have several servers to play with master and slaves. For a test of fail-over, you need at least three servers.
Because not everyone can enjoy an abundance of hardware, I offer you MySQL 5 Replication Playground, where all the nodes are in the same box, and they just use different ports and sockets to simulate separate machines.
Should you wish to play with this system, all you need to do is install one instance of MYSQL 5.1 and then the replication playground in your home directory. (You don't need root access.) Download it, peruse the readme, run the installation script, and play with it.
Acknowledgments
Thanks to Patrizio Tassone and Alberto Coduti for the graphics. Thanks to Roland Bouman, Jay Pipes, and Lars Thalmann for catching my mistakes and for providing useful comments and corrections. I am also indebted to them for some rephrasing and definitions in this text. Thanks to Massimiliano Stucchi for providing a test environment for FreeBSD and for catching some more mistakes.
Giuseppe Maxia is a QA developer in MySQL community team. A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years. He is a frequent speaker at open source events and he's the author of many articles. He lives in Sardinia (Italy).
Return to ONLamp.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 32 of 32.
-
not so sure
2009-07-27 14:55:20 gyromastar [Reply | View]
This is a cool idea, but consider statement based replication vs row based replication before you get too far into it.
Consider 4 master dbs each with a 1sec replication lag.
An update statement changes a flag from 0 to 1 on a column over entire table, and is executed on db1. Due to replication lag db4 executes the update ~3 seconds later.
If using statement based replication I'm thinking that there's a 3sec window where inserts get replicated out before the update affects em. Now those few rows are stored differently in db1 than db4.
You'd have to use row based replication so that those inserts in that 3sec window get a second entry in the replication log when the update hits em.
Of course the downside of row based replication is that if your update touches 1mil rows, you just added 1mil statements to your replication log. Not necessarily evil, but something to consider.
-
Existing Database & New PK System
2009-06-28 18:02:01 ChuckPets [Reply | View]
I am trying to confirm my understanding that an existing db with many tables should be no problem for this implementation.
Say an existing db is moved to this system, how will the sequencing go ? Say the db has been stopped, replication set up as suggested on 2 or more servers, then re-started. 2 existing tables, last PK on table1 = 3374, last PK on table2 = 6737. Server ID = 10 (auto_increment = 10, offset = 1); Server ID 20 (auto_increment = 10, offset = 2).
My understanding is that next insert on Server ID 10 into table1 will result in a PK of 3385, second insert will have PK = 3395, and so on.
The next insert on Server ID 10 into table2 will result in a PK of 6748, second insert will have PK = 6758, and so on.
And thet next insert on Server ID 20 into table1 will result in a PK of 3386, second insert will have PK = 3396, and so on.
The next insert on Server ID 20 into table2 will result in a PK of 6749, second insert will have PK = 6759, and so on.
And so the 2 servers should never encounter a attempted PK duplication error, no matter what the starting last PK was ?
And so long as these 2 system variables don't change, the individual table AutoInc settings won't interfere ?
Thanks for your time.
Chuck
-
MySQL Multimaster Replication
2009-02-11 02:23:44 drkjmg [Reply | View]
Hi everyone.
Actually I did it. Its work for me, but on this way.
I use MMM what i just modified for my own needs with combining this article its working perfectly in enterprise class production ;)
-
Circular Replication
2008-09-30 06:39:32 pawankorotane [Reply | View]
I having a problem in circular replication. I need to implement replication in four servers. I implemented master-master replication in two servers. I am unable to add two more master in the replication network. The method used in this articale is different then the method i used for the master master replication. Please help me to adding two more masters in the replication group.
-
SHOW SLAVE STATUS
2008-09-18 04:15:23 Drew81 [Reply | View]
Just a note: In my application i use SHOW SLAVE STATUS to determine whether a servers master db is alive. Was there a reason why this was undesirable and hence satisfied by federated tables?
-
Circular replication failure recovery :BUG
2008-05-27 00:42:55 fafarun [Reply | View]
Hi,
Thanks for you great article on mysql replication.
But I have a big problem.
I want realise a circular replication failure recovery.
A->B->C->A
IF B is down
the check_master() of A change his master.
But the command change master starting replication in log 'FIRT4 and at position 4.
My mysql A want to replay the binlog, or it's impossible.
fafarun.
-
Questions on Circular Fallover
2008-02-21 11:37:31 BasicCharlie [Reply | View]
Thanks for writing this article, it was great and gave a lot of insights I couldn't find anywhere else. I have a question regarding the circular replication recovery though. When you demonstrate the two node recovery using the events, you use the change master command, and specify not only the host, but the log file and position as well. Then in the 4 node example, you only specify the host. Are the log files important? If so how can you get the values?
I'd expect that if you don't specify the log values then it would start at the beginning, and that this would be a problem in the fall over.
-
Distance between servers
2007-08-14 13:01:23 jlowe1 [Reply | View]
I'd just like to point out that you assume that the MySQL replicas are close together - especially when saying that network problems are rare :) I can tell you for 100% that when running replicas across multiple continents network errors (and data corruption) are anything but rare.
Not that this invalidates anything you have said, especially, it is just that the servers being "close" is a hidden requirement of your technique.
-
Virtual testing
2007-05-16 03:35:34 symfrog [Reply | View]
Really good article.
"Because not everyone can enjoy an abundance of hardware, I offer you MySQL 5 Replication Playground, where all the nodes are in the same box, and they just use different ports and sockets to simulate separate machines."
Virtual server from VMWare helps alot in this case.
-
Virtual testing
2007-05-16 04:22:28 Giuseppe Maxia |
[Reply | View]
Virtual Servers eat up a lot of memory when you want to test three or four nodes.
Abundance of hardware includes RAM in a single box. If you have plenty of RAM in your desktop and you can afford to fire four virtual machines, good for you.
I have been using VMWare WS for 7 years. But when I need to do some heavy testing on replicating nodes, I prefer emulation over virtualization.
Cheers
Giuseppe
-
Consistency problem?
2007-03-14 08:43:53 UloPe [Reply | View]
Hi,
thank you for you interesting article.
I have a quation though.
Lets assume we have 4 servers (A,B,C,D) setup to replicate as follows: A->B->C->-D->A
As I understand your setup the following will happen if we pull the plug on C:
D should switch over to B as its master.
But here is the question: what happens if there are updates made on B while C is already dead? As soon (i.e. 30-59 sec if checkinterval is 30 s) as D detects that C is gone it will tell B to "reset master" and switch over, thereby missing all the changes that happend on A and B for maybe nearly a minute.
Am I correct in that assumption or did i miss something?
Thanks.
Ulrich -
Consistency problem?
2007-03-14 09:44:35 Giuseppe Maxia |
[Reply | View]
You are right.
Please be aware that my article is just a proof of concept. It is not an example of what you should do in production, because of this problem.
Currently, you can't detect the master log position using only SQL routines. You should use external applications for that.
At the moment, the technology available in MySQL stored routines is not able to handle such information unassisted.
Best regards
Giuseppe
-
fail-over infinite looping problem
2007-03-09 09:00:49 chazByrne [Reply | View]
The scenario is that four servers A, B, C, and D (with server ids 1, 2, 3, and 4 respectively) replicate in a topology { A->B, B->C, C->D, D->A }. Server A crashes. The DBA wants slave B to connect to master D instead and continue with the circular replication.
The normal loop detection that MySQL has discards events that are generated on the same server, but since A is no longer in the circle, the generated events will loop forever. My current problem is an update statement from the downed server that is replicated in an infinite loop...
Is there a work around for this? Or a command I am missing.
Thank you very much,
Chaz Byrne
-
Circle Replication Question (infinite loops?)
2006-12-05 21:29:21 i_h_m [Reply | View]
In circular replication A->B->C->D->A
What keeps a bin log event from propogating indefinitely?
E.g. If A was updated... that update gets repeated in B, then C, then D, --- When the same update event gets sent back to A, how does A know to not execute that update?
Thanks!
-
Circle Replication Question (infinite loops?)
2006-12-05 23:51:19 Giuseppe Maxia |
[Reply | View]
Read carefully. The article explains it.
"The option replicate-same-server-id has the purpose of avoiding infinite replication loops, effectively telling each node to ignore from its master's binary log any statement that originated with its own server ID."
Giuseppe
-
conceptually great but far from being usable
2006-09-05 01:05:12 Oyku [Reply | View]
Thanks for the great article. Unfortunately it's far from being practically usable after testing in production grade traffic and data environment.
Let's consider this case. A,B,C,D in circular replication. First of all it's crucial to have a RESET MASTER before everything else so that until a problem occurs all hosts will have the same log_pos.
When B fails, failing over to A as master for C requires
1-checking to see if the latest update came from A or D
2-If it came from A then determining the position for that event
3-determining the log file name in A
4-If the latest update came from D, based on the timestamp determining the log file name in A by examining D's logbin events.
It seems perfectly scriptable BUT
1- With B not functioning, A,C,D working in circular replication there is no good way to reintroduce B without disturbing the system
2- Even though the steps above can be scriptible it's perfectly the database server software's responsibility to provide the basic information regarding replication not the db admin's by checking the log positions and timestamps.
My conclusion is having multi master ability with managed auto increment keys is wonderful. But right now out of the box MySQL does not provide the tools to use it safely. The CHANGE MASTER TO very low level. In a multi master situation, you cannot tell your web server to stop updating just because you'll need to figure out what log name and which position to issue the command with.
Having looked at 5.1 NDB replication, I saw good signs that can be brought into plain vanilla replication. Until that day, I'd refrain from using multi master replication.
I hope I'm not missing a point here.
-
Replicating updates
2006-08-31 12:08:58 giuseb [Reply | View]
Hi Giuseppe, thanks for the great article from a beginner MySQL enthusiast.
Here's my question. While the proposed technique to avoid primary key conflicts for inserts seems very convincing, one topic I was sort of expecting to see covered is how to reconcile updates of the same record occurring at multiple nodes between replications.
Please excuse me if the question is off the mark for lack of knowledge...
Ciao,
Giuseppe :) -
Replicating updates
2006-08-31 12:24:21 Giuseppe Maxia |
[Reply | View]
In a circular replication system there is currently no mechanism to solve conflicts. So, if two nodes update the same record, the last one overwrites the previous record.
In case of a INSERT of the same record with a UNIQUE key in two nodes, there is going to be a conflict. The replication paradigm can't handle it.
If you were looking for a distributed database where multiple branches update a central repository, replication, with today's technology, is not your solution.
There are plans to introduce conflict resolution mechanisms and multi-master systems in replication, but we are talking about version 5.2 or 5.3.
Cheers
Giuseppe
-
multiple master replication ambiquity
2006-06-28 12:00:29 andrew_b [Reply | View]
I've been interested in replication using multiple, unrelated masters and one slave. I have confused this with what is referred to as multimaster replication and multiple master replication - where the masters *are* related.
The MySQL 5.1 Reference Manual seems somewhat ambiguous about the use of these terms. It might be worth pointing out that the setup I seek is not currently available directly in MySQL.
-
requiered line mistake
2006-06-09 03:52:51 Pr3d@t0r [Reply | View]
hi
i made a mistake about the line need, it should be log_bin=mysql_bin instead of the server_log i specified.
Sorry, my bad! =)
-
required line
2006-06-09 03:32:27 Pr3d@t0r [Reply | View]
hi
i recently looked at your advanced replication techniques in setting up multimaster replication. i would say great tutorial, but i think you already know that.
I had one issue though. I noticed it was included in the examples of more than two masters but not in the example of the replication between only two server. the line that was missing: server-log=mysql-bin. for some reason without adding that line the replication wouldn't work. you might want to ad that line to the tutorial. i think beginners might find it difficult to figure out what was wrong.
well keep up the good work!
cheers
-
Double masters
2006-05-26 15:54:21 jerry2000 [Reply | View]
Hi there
I would like to set up double masters and I have read your excellent article. I have 2 questions though:
1. Is there any problem setting the increment to 2, so all my ID's would be multiples of 2 instead of one? I mean, server doesn't care if I have 1 milion auto incremented numbers or 1 million numbers in increments of 2?
2. I would use one master for one domain and the other for another domain site (same sites)... What is my concern is what if a link breaks, something goes wrong the the replication stops working. I would end with 2 separate homepages that forum et. would not sync. Is there any way to get the 2 pages resynced again or is the script necesarry that looks all the ID's and inserts accordingly (quite a complex script). Is it possible to "repair" a broken replication link?
Yours
Jerry
-
Network failure
2006-05-23 21:47:39 rjasdf [Reply | View]
Rarer than the Master going down is the network going down. In this case, the slaves might wrongly deduce that the Master is dead and throw the switch. Meanwhile the Master is alive. Some clients might continue to talk to the "dead" master while others are switched to the "new" master. You get a mess. -
Network failure
2006-05-24 00:19:28 Giuseppe Maxia |
[Reply | View]
In the complete solution scenario (See the Replication Playground for a working example), the slave does not change master without being sure that the new master is alive. Thus, in a dead network, a master switch does not happen, because the new master is unreachable.
Implementing such control is trivial.
Cheers
Giuseppe
-
Questions about Automatic Fail-over
2006-05-04 07:17:42 volts [Reply | View]
I have some questions about the failover approach described in this article.
As I understand it, MySQL replication works as follows:
1. Begin with identical baseline copies of the database on both master and slaves;
2. Make changes to the master database, logging the changes in the binlog files.
3. Asynchronously propogate the changes (from the binlogs) to relay logs on each slave and apply the changes to the slave database. Each slave will likely be at a slightly different position in the master's binlogs (master-binlogfile/position), due to factors like local load and propogation delay.
4. If a slave has log-slave-updates turned on, it will write the changes into its own binlogs. Although each slave-binlogfile/position will correspond to a master-binlogfile/position, both the filename and the position will likely be different.
In the 'normal' failover scenario a failed master is replaced with a former slave, starting replication for all other slaves at the new master's 'mysql-bin.000001', position 0. Would this cause to the other slaves to attempt to repeat all previous transactions? I can see many issues with this, starting with unique key collisions.
In the circular replication scenario, if a server fails, its slave changes to another master without specifying a binlogfile/position. Does this mean that the the slave uses the old master's binlog name and position with the new master?
Database changes applied at each server will be propagating asynchronously around the circle. It seems unlikely that all servers will all be at exactly the same binlog/position. Is it possible that a slave will ask its new master for a binlog/position from some time in the past or that doesn't yet exist? -
Questions about Automatic Fail-over
2006-05-04 07:55:20 Giuseppe Maxia |
[Reply | View]
Please read the answer to the other comments.
Your concerns hava already been addressed.
Regards
Giuseppe
-
wrong master_log_pos, master_log_name
2006-05-03 10:21:24 MarkSwanson [Reply | View]
I know the code is for example purposes, but hard coding these values will not work. The manual seems to imply that if no values (master_log_pos, master_log-name) are used the slave will use the same coordinates it was using with the old master. The manual further states this will ensure no discontinuity in replication.
I wasn't able to find anything in the documentation that stated that the log filenames and their contents were byte-for-byte in sync between machines. The log filenames are definately different on my master compared to my slave.
So does an existing slave - when changed to a master - rename its log files and adjust the log file contents so that when a slave connects with previous file/offset coordinates everything will work?
I somehow don't think it works this way, but can't find the documentation on how it does work. Please post if you know.
Cheers.
-
wrong master_log_pos, master_log_name
2006-05-03 11:16:01 Giuseppe Maxia |
[Reply | View]
Hi, thanks for your comments.
Please have a look at the code in the Replication Playground, (http://sourceforge.net/projects/my-repl-play) where there is a working sample (not perfect, but it should work in many cases).
As I said in the article, there are some bits that still prevent an all-SQL solution, namely the lack of slave related data from the information schema, without which it is impossible to set the slave to the current log file and position in the master.
However, with the technology at hand today, it is possible to ask the would be master to do a RESET MASTER (see the example in the replication playground).
According to my sources, this limitation should go away before MySQL 5.1 is released as generally available (I was promised a fix to the information schema by June). Thus, the techniques shown in this articles should be available for general consumption in a more reliable form.
I will post something in my blog as soon as this solution becomes available.
Regards
Giuseppe
-
HA Lamp box
2006-05-03 07:24:10 simon_massey [Reply | View]
the mysql technique in this articles sounds idea for setting up a highly availability lamp (Linux, Apache, mysql, perl/python/php) website. the techniques in the book "The Linux Enterprise Cluster" by Karl Kopper could be used for Apache fail-over.
if you co-locate an Apache and mysql instances on each host (each apache node is also a mysql node) then you wont have to think very hard about the database fail-over with the technique described in this article. each Apache configuration would connect to localhost for mysql. just failing over Apache from that host would then effectively stop access to the co-located mysql instance.
if a particular mysql instance or Apache instance on a given host has failed then the whole hosts would be suspect and it should all be swapped out. as long as the heartbeat hit a scripted web page that touched the database then you would be testing the whole stack on a given node for a failure.
as the techniques for Apache clustering from a client perspective are well documented pushing the load balancing and fail-over up to that level would makes sense. if you read that book when building a dynamic website cluster the problem part is the resilient database. as the technique in this article requires no shared disk or other specialist hardware it appears to complete the picture perfectly.







Let's assume we use similar techniques as this article to ensure there are no auto-increment primary conflicts. Would it be possible then to have each app server writing to a separate MySQL database (to speed up the writes from the app server's perspective) and have all of these "masters" replicate to the slave reporting database? Logically, this works for my application since the app servers themselves never do anything with the log data (other than write it).
Thanks,
:JB