I have a design headache here, I’m using PHP and MySQL in conjunction with Java (my project is an Android application). I have to decide how to run a series of server side calculations at regular intervals. There is a wealth of material here on SO addressing how to create cron jobs and so on, and that’s great, I may very well end there, but I’m not sure about how to tackle this part of my project in a broader sense.
The application is completely centred upon the geographic locations of users. They’re always organised in clusters of anywhere between 4 and 40, and these clusters form one instance record in my database. These instances can become active or inactive at any time.
For each record in my database, or, I prefer instance, at each epoch, I want to recompute the centroid of the instance from its user locations (that’s easy enough, particularly using a scalar approach given their close proximity), effectively shifting the location of the instance itself by updating latitude and longitude values in the database for the instance. Users will subsequently receive these new instance centroid coordinates at regular intervals when they call home.
This is where it gets messy due to my rank inexperience. I started out by writing a relatively simple calculation involving one SQL select query and one subsequent SQL update operation, for each instance, at each epoch. If we assume an update interval of around 20-30 seconds for now, that’s less than one minute, apparently this breaches a limitation of 1min for cron jobs. (It should be noted that the time difference between epochs can be hardcoded, if absolutely necessary).
In the short term, this process might only take a negligible amount of time to execute, due to the fact there would be very few instances/clusters. However, it would potentially stack up to a lot of SQL queries and a lot of time to process all of the calculations at some point later if the number of instances ran into the thousands… In order to reduce unnecessary load, I naturally want to incorporate some mechanism to exclude inactive instances, though I guess it is still conceivable that the required calculation time could exceed the epoch interval. I guess that’s an issue for (much) later.
As it stands now, the question is two-fold:
- I want to execute the same simple function for all active instances
at each epoch. So, is there any more efficient way to do this than
to run that many iterations? Can I somehow update many table rows
at once, using one big, final SQL update query? Is something like
mysqli_multi_query() actually very helpful here? (At this point I don’t have mysqli).
- How can I best implement a timer or trigger mechanism to re-fire
this process at each epoch, given the fact it may violate the 1min
limit I’ve been reading about for cron jobs?
My current approach is as follows:
- Run one SQL select query to set it all up for the current epoch,
fetching instance ID numbers requiring a centroid shift.
- Populate a PHP array with those instance ID’s
- Sequentially shift each instance using a loop and either one or very
many SQL updates (see above) to write the new coordinate pairs to the database.
- Schedule this task to be carried out at each epoch (in other words,
every x seconds)
Is the above approach sound? At this point, I plan to do it this way unless there’s a better suggestion. I really don’t have a solid handle on how I’m going to schedule the task to execute at each epoch (Point #4), however… I’ve looked all over the place and I can’t solve this myself without some guidance, I’m just not very good yet. 🙂 As always, any suggestions would be greatly appreciated.
You might consider moving from a scheduled task to an update as needed approach. This is fairly easy to accomplish, but there are tradeoffs.
Add a datetime field called Last Updated
Every time you query the object, check the last updated field for
“freshness” (in your case, if it was > than 30 seconds ago)
If its fresh, send the data to the user.
If it isn’t fresh, recalculate the data and save it to the database
(making sure to change the last updated field). Then, send the new
data to the user.
This will eliminate the need for a scheduled task & get rid of the waste of updating every row. However, it can slow down responses to the user.