How to implement a dashboard task monitor to control all your scheduled tasks |
30
JUN
09
I am responsible for a database system available over internet. The architecture comprises of a couple of physical hosts, a bunch of virtual machines and a lot of scheduled tasks and cron-jobs which - until now - created a log file of their activities and sent it via email to a central mailbox. Every day, the sys admin - which would be me - looked into this mailbox and went through the sent reports.
Sounds like a tedious task to perform? Well it was indeed. Moreover I was risking to overlook that a task that should have been performed was not done because the responsible component was down thus neither doing nor sending anything.
Another pitfall was that tasks were not only triggered according to schedule but also by other tasks, e.g. if new full text documents is input all satellite databases have to update themselves. This event happens as soon as someone inputs new data. So I always had to remember what component was supposed to do what and when and what event did trigger what for whom... terrible! To make things even worse I also had to go through the body of emails e.g. examining if a robocopy backup log did raise any exception - I myself! - Horrible!
Now, what exactly were my needs, requirements respectively?
- I wanted to know the status of every single scheduled task: it performed successfully, an error occurred or it didn't do anything at all.
- I wanted an overview that would show me all states at any time for the entire week grouped by weekday in a grid.
- I wanted to view the log file of the scheduled task with one click.
To get an overview I made a list of all scheduled tasks, their name, the component they were running on, at what time they started, at what time they usually finished and on which weekdays they were supposed to run. I ended up with the following table:
TABLE OF SCHEDULED TASKS
Now this table seemed to already satisfy my first and second need because it represented a grid mapping all tasks against weekdays. Since it literally cried for being transposed into a SQL table I had the idea to have every scheduled tasks no longer send emails when they finished but update a database table with the corresponding status i.e. everything went fine (OK) or an error happened (ERR). In case the task was not run because e.g. a component went down the default value (TODO) would remain unchanged. Given the fact that I knew when a task should have been finished (tEnd) I could determine a failure the time I looked at that table.
To satisfy my third requirement I needed another table with similar structure in which the scheduled tasks would put their log files. I now could easily link a specific status with the corresponding log.
Great! But wait, this would work fine for the first week, for every other week I would have to reset all values to default again. For that reason I needed at least two tables of the each kind, one to be a master table holding the information about what should be done by whom and when and another slave table - an exact copy of its master - in which the scheduled tasks would actually update their states, put their logs respectively. Those slave tables would be copied from the master table on a weekly basis.
I decided to name the slave tables after the master and append the week number, which would result in a rotation preserving all states and log files for one year. I now had a database called task_monitor and two master tables states_week and logs_week. Every week I would copy those two tables to their according slaves called e.g. states_week_24 and logs_week_24. In case you, like me, don't know from the top of your head, here is how to copy a table in SQL.
The only thing missing was a proper view of the table. Again, a rather simple task. I loaded the current slave table, looped through the result and created a table structure showing the states. Since I - as mentioned above - also had information about the starting and (normal) ending time of each task I could in addition display time related states. If I was looking at my task monitor grid while a task was supposed to run and I found the default TODO, I replaced it with DOING. If it was later than the normal ending time and I found a default TODO I printed a LATE instead. In case I found a TODO from previous days I replaced it with NOT DONE. I then linked all states with their according log file in the logs table to satisfy my third need.To give my task monitor a shiny touch I reused the states as CSS identifiers to further distinguish by colour and style. I ended up with a dashing dashboard-like task monitor as you can see below :)
SCREENSHOT OF TASK MONITOR TABLE
Up until now the only changes I had to make were to write a script to copy two tables every week and change every scheduled task to update the appropriate slave tables with status and log file instead of sending an email. An of course code the view. Not much really, is it?
In my case I had to additionally modify the task that triggered the execution of other tasks (remember that full text update thing). I implemented a logic that in such a case would - in addition to their own task monitor update - set the value for the components that would have to perform follow-up actions from empty to TODO. By so I ensured that if those "dynamic" tasks would not be performed at all I would get notified.
On last but important step was to slightly modify the tasks that used robocopy for backup jobs. I injected a quick regular expression to search for errors in the log file to determine if an exception occurred and reported it to the task monitor. Now I had to walk through them only in case of an error. Yippy!
All in all it took me two half days to implement this task monitor. Not too much for such work simplification. If you feel like a perl script doing brainless jobs on a more or less regular basis, a bell should ring: Never do jobs that a computer can do for you. And if you are lucky your thirst for action overcomes your sloth and you do something about it, I promise you'll be ever so happy about that :)
