Tuesday, December 16, 2014

Using PHP to do some nasty background jobs

I got a call to build a (not so) simple monitoring system, from scratch. Long time not getting touch by programming and sort of thing made me to agree the task.
The environment was surrounded by UNIX/AIX server and I got a server running linux on 3GB multicore procies.
Task was a bit of challange, processing numbers of logfile on those AIX. While each has 90Megs sized and came almost every 4minutes, I should also able to display in a statistical manner.
All to support the monitoring system.
So, what the preparation?
Nothing fancy, just mysql and php running on linux with additional modules to cop with oracle db, pear, adodb and jpgraph.
I got all the plan in my head and not bother to write it on paper. Well, I am not willing too :)
Here the steps of process.
How to read logfiles in other servers? read bitstreams or make a clone file in local than do readfile?
Ok, need to do some dirt assesment. I called Mr.Infra and asked him to check I/O traffic and the answer is low to mid but nothing high to peak come and go from mac address.
Nail it! safe enough to do simple file open via ftp and read thru the file.
Next, I need to prevent interuption of process in a form of function and saving processed file and data.
Both should be done in Controller and Model at once. (Controller is apps/php and Model is Mysql db).
When talking about feature function, Mysql is richest of rich. It has insert/update constraint handler in case of the key being duplicated or you can just ignore it and continue the process without throwing any error.
So the database side is easy handy. The table with proper index and prepare statement of insert on duplicate is being done.
Now the real challange, processing almost a million lines under 4minutes while should be capable of displaying the progress on the other side.
I should cut down the datagrouping by minutes just to save the storage and limiting memory resource.
The group and its counted number temporarily stored in array, as key and value pair.
Some can imagine when arrays become notoriously big, it also eat up memory resource.
The tricks is read 10K of lines, make a group of it in arrays, iterate and store the array to db then release the array to free memory.
Rinse and repeat until EOF :)
Great, the project is done.

It works for several months, until the graph slow to display and everyone complaint about unresponsiveness of reloading data.
Ok, I'll check.
Later on, i found WOW.. table size has growing up to 8GB in just 3months.
I was too lazy to design backup system, then I forced to build one. ***sh!t***
In the end, I cutdown online data in last a month and store remaining in backup table.
If that is not enough to please the lord, I cutdown the backup table to 6-9 months and remaining will stay in dumpfile.

Now it almost a year and no complaint, good. But I hear request of enhancement here and there.
No, I wont bother..