Update databases in a load balancing cluster using a metaphor - Part 2 |
21
AUG
09
In a previous post I started writing about problems that I encountered when designing the update of databases in a load balancing cluster and how a metaphor was of great help. In this second part I want talk about the physical implementation of the architecture.
Using a semaphore for smooth running
The physical implementation uses a database (updatesdb) to describe all components, roles and procedures thus constituting the configuration of the system. In addition this database is used to set up a communication channel for all components. It encompasses three tables: hosts, files and locks which I will describe in more detail.
The hosts table contains all combinations of components and the tasks they have to perform. The first action any task must take when run is to check if the component it is running on is allowed (has been configured) to do so. Even if a scholar would run e.g. the teacher's task TeacherPreparesLesson nothing would happen because the script would terminate immediately after that check. This is how machines are assigned for a particular role e.g. a teacher or a scholar. For that reason this table is is only modified when configuring the system.
| Application | Hostname |
|---|---|
| BoysLearn | 192.168.xxx.56 |
| BoysLearn | 192.168.xxx.57 |
| BoysLearn | 192.168.xxx.58 |
| GirlsLearn | 192.168.xxx.66 |
| GirlsLearn | 192.168.xxx.67 |
| GirlsLearn | 192.168.xxx.68 |
| TeacherHoldsLesson | 192.168.xxx.3 |
| TeacherPreparesLesson | 192.168.xxx.3 |
| RecsForOnlineDB | 192.168.xxx.3 |
| PDFForOnlineDB | 192.168.xxx.3 |
| TextForOnlineDB | 192.168.xxx.3 |
| XMLForOnlineDB | 192.168.xxx.3 |
| NewsForOnlineDB | 192.168.xxx.3 |
The files table is a helper table that contains e.g. the file names of full text documents which the teacher has to process or ids of queries that were stored by users to get notified upon new results which are jointly processed by satellites.
| Application | Hostname | Filename |
|---|---|---|
| NotifyForOnlineDB | 192.168.xxx.66 | 123 |
| NotifyForOnlineDB | 192.168.xxx.67 | 167 |
| NotifyForOnlineDB | NULL | 177 |
| NotifyForOnlineDB | NULL | 238 |
| NotifyForOnlineDB | NULL | 321 |
| NotifyForOnlineDB | NULL | 464 |
| NotifyForOnlineDB | NULL | 501 |
The locks table is updated by every task that is run. It puts the name of the component it is running on in the task's row. When finished it removes this entry again. Any task that is started checks if its name is 'locked' by another component in which case it enters a loop checking this 'lock' for a predefined period of time. After this period it reports an error to the task monitor indicating that this 'locking' task is unexpectedly still running (or hanging). Eventually this is an implementation of a semaphore to avoid any collision of tasks.
| Application | Hostname |
|---|---|
| BoysLearn | NULL |
| GirlsLearn | NULL |
| TeacherHoldsLesson | 192.168.xxx.3 |
| TeacherPreparesLesson | NULL |
| RecsForOnlineDB | NULL |
| PDFForOnlineDB | NULL |
| TextForOnlineDB | NULL |
| XMLForOnlineDB | NULL |
| NewsForOnlineDB | NULL |
Making configuration a piece of cake
The use of a database dedicated for regulation of and communication among the components already guarantees the smooth running of tasks. Yet in conjunction with another feature of this architecture it makes configuring the system a piece of cake: all VMs are perfect copies of each other, distinguished only by their IP which constitutes the machine's name.
Lets say we want to retire the current teacher (reference machine) and "promote" a former scholar to overtake his job. The only modification we have to do is to go to the hosts table delete all rows corresponding to the former scholar and replace all names of teacher with the scholar's name. Done!
It is also used to easily scale up the entire system. I simple copy one VM, assign a new IP and configure it as
If one VM acts up, I don't even try to investigate and fix the problem - unless the problem is obvious or of recurring nature - I simply delete it and replace it with a working copy. Done!
To not falsely take all credits: this architecture was originally conceived by my brilliant ex-colleague Curro (I hope I can link to your blog very soon gg)
Back to school
Now, lets return to the problem at hand. In the first part we looked at the controller of the reference machine (story of the teacher). After the reference machine successfully updated itself (prepared the lesson) it finishes this task by setting the values for BoysLearn and GirlsLearn from null to 0 and flagging itself as waiting for satellites to copy new files (holding the lesson).
The satellites, who kept pinging for that event (wait for teacher) now know its time to copy (attend the lesson). The first group of satellites (the boys) starts with incrementing the value for BoysLearn (enter classroom) and stop their web servers (stop chatting). Then each machine checks if it was the last one to do so (check if last to enter) in which case this (last) satellite is creating a robocopy script directly from the updatesdb (write attendances list). After waiting for five minutes to ensure that the load balancer did recognize all satellites not being responsive (wait for silence) the last satellite runs the robocopy script (let the lesson begin).
Put the lesson into practice
I found two options to perform the copying. I could have either copied from the reference machine to each satellite over the network. This would have been the straightforward approach yet the most time consuming since one regular update would result in approximately 50 minutes of copying. Given the three satellites this would take about five hours while which my system would run at halve steam.
The second option was to only copy once from the reference machine's to one satellite's file system then shut down all satellites and copy the modified VM files on the host. After the copying finished I power them on again. The first reason to go for that solution was the fact that copying on host's file system took me about 8 as opposed to 50 minutes. So this approach is able to halve the copying procedure in case of three satellites. Viewing at the future need to flesh up the system with additional VMs it makes this option even more appealing since it scales way better.
The second reason was that an operating system usually starts to act up after a certain time being in continuous operation (well I can only speak for Windows). Be it components that get stuck, processes which are not closed or develop an increasing appetite for memory etc. there are myriads of events that contribute to an overall degradation of performance. I am sure most of you already experienced similar behavior. At that point I have to admit that my Windows kung fu is not top notch; in cases just described I tend to fall back on the one move I can perform best: its an anytimer and I call it Twisting Punch, its easy to learn and even easier to master: I reboot that machine.
Rebooting the satellites in the course of the updating process not only guarantees at least a weekly refresh of the system but also constitutes the best occasion for that action since the machines are out the load balancing loop anyway. And, indeed, it pays off I never had to restart any component in the satellites for ages.
Using this approach however, I had to split the update procedure for satellites in two parts: one script that ended with the copying and a second that was run as soon as the machine was powered on again. Below you find the controller for the first part:
function startLesson() {
$i = $this;
$i->c( 'checkEnrollment' );
$i->c( 'checkIfLessonToday' );
if( $i->bLessonToday ) {
$i->c( 'haveLastCigarette' );
do {
$i->c( 'countClassMates' );
$bcur = ( $i->scurGender === $i->sMyGender );
if( ! $bcur ) {
$i->c( 'waitForTeacher' );
$i->c( 'lookAtWatch' );
}
} while ( $i->r[ 'bOk' ] && ! $bcur );
$i->c( 'getBooks' );
$i->c( 'getInClassroom' );
$i->c( 'stopChatting' );
$i->c( 'lastToEnter' );
if( $i->bLast ) {
$i->c( 'writeAttendanceList' );
$i->c( 'waitForSilence' );
$i->c( 'letLessonBegin' );
}
}
$i->c( 'writeReport' );
$i->c( "closeNotebook" );
}
Finishing a lesson
After the copying procedure for the boys finishes by rebooting them, they check is if they still function properly by running selenium tests and sample queries (take an examination) In case of success they decrement the value for BoysLearn (leave classroom) and wait until all satellites came up and passed the self tests. (check if the door is still open). The last satellite to finish copying and pass the test sets the BoysLearn to null (close the classroom door) which is the trigger for the satellites to start their webservers (start chatting) as well as the signal for the second group of satellites (girls) that their lesson is about to start.
The same procedure of copying is now run by the second group. The only difference is an additional task performed by the last satellite finishing (the last girl to leave the classroom) that is to remove the entry TeacherHoldsLesson from the locks table (lock the classroom door) which resets the reference machine back into an idle state, ready to prepare the next lesson.
Below you find the script for this part of the update procedure:
function finishLesson() {
$i = $this;
$i->c( 'checkEnrollment' );
$i->c( 'checkIfLessonInProgress' );
if( $i->bLessonInProgress ) {
$i->c( 'makeExam' );
if( $i->bPassedExam ) {
$i->c( 'replaceBooks' );
$i->c( 'leaveClassroom' );
$i->c( 'lastToLeave' );
if( $i->bLast ) {
$i->c( 'closeDoor' );
if( $i->sSex === 'girl' ) {
$i->c( 'lockDoor' );
}
}
}
}
}
if( $i->bPassedExam ) {
do {
$i->c( 'checkTheDoor' );
$i->c( 'lookAtWatch' );
} while ( $i->r[ 'bOk' ] && $bDoorOpen );
$i->c( 'getCoffee' );
$i->c( 'startChatting' );
}
$i->c( 'writeReport' );
$i->c( "closeNotebook" );
}
Lessons learned...
The described architecture is in operation for almost half a year now and proved to run extremely well. Sure, my system experienced problems yet I was not only able to fix the problems in much shorter time then in the past but also to prevent the the system from outages. Mostly I could rely on the girls to help out in such cases. All in all I think not even the phalanx of Joel Spolsky's Nobel Price laureate interns would have built system a more stable last summer :-)
what fun it is to read about our systems (now yours and only yours). Your metaphor is hilarious. Can we try to simplify all this over a beer this afternoon? The satellites are still clean, but the semaphore tables are working double shifts. Bring paper!
A request: how about an article on extending Selenium and automating Selenium tests? One of my interns is having a go at this.
