Assuming we have to log all the users activties of a community, i guess that in brief time our database will become very huge; so my question is:
is this anyway an acceptable compromise (to have a huge DB table) in order to offer this kind of service? Or we can do this in more efficent way?
the kind of activity to be logged is a “classic” social-networking activity-log whre people can look what others are doing or have done and viceversa, so it will track for example when user edit profile, post something, login, logout etc...
my table is already optimized in order to store only
log_activity_table( id int user int ip varchar event varchar #event-name time varchar callbacks text #some-info-from-the-triggered-event )
Im actually working on a similar system so Im interested in the answers you get.
For my project, having a full historical accounting was not important so we chose to keep the table fairly lean much like what youre doing. Our tables look something like this:
CREATE TABLE `activity_log_entry` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `event` varchar(50) NOT NULL, `subject` text, `publisher_id` bigint(20) NOT NULL, `created_at` datetime NOT NULL, `expires_at` datetime NOT NULL, PRIMARY KEY (`id`), KEY `event_log_entry_action_idx` (`action`), KEY `event_log_entry_publisher_id_idx` (`publisher_id`), CONSTRAINT `event_log_entry_publisher_id_user_id` FOREIGN KEY (`publisher_id`) REFERENCES `user` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8
We decided that we dont want to store history forever so we will have a
cron job that kills history after a certain time period. We have both
expired_at columns simply out of convenience. When an event is logged these columns are updated automatically by the model and we use a simple
strftime('%F %T', strtotime($expr)) where
$expr is a string like
'+30 days' we pull from configuration.
subject column is similar to your
callback one. We also chose not to directly relate the subject of the activity to other tables because there is a possibility that not all event subjects will have a table, additionally its not even important to hold this relationship because the only thing we do with this event log is display activity feed messages. We store a serialized value object of data pertinent to the event for use in predetermined message templates. We also directly encode what the event pertained to (ie. profile, comment, status, etc..).
events (aka activities.) are simple strings like
'create', etc.. These are used in some queries and of course to help determine which message to display to a user.
We are still in the early stages so this may change quite a bit (possibly based on comments and answers to this question) but given our requirements it seemed like a good approach.
Case: When all user activities have different tables. Eg. Like, comment, post, become a member.
Then these table should have a key associating the entry to a user. Given a user you can get recent activities by querying each table by the user_key.
Hence if you don’t have a schema yet or you are privileged to change it, go with having different tables for different activities and search multiple activities.
Case: There are some activities which are say generic and don’t have individual table for it
Then have table for generic activities and search it along with other activity tables.
Do you need to store the specific activity of each user, or do you just want to log the kind of activity that is happening over time. If the latter, then you might consider something like RRDtool (or a similar approach) and store the amount of activity over different timesteps in a circular buffer, the size of which stays constant over time. See http://en.wikipedia.org/wiki/RRDtool.