I have a web application where complex permissions determine whether or not a user has access to each of thousands of different files. A user can see all files, but there is an indicator to open files that they have access to. A user has access to a file if someone else in their organization has access to it, or if someone that they are in a collaboration with has shared access to that file.
Right now, I have a complex PHP function that generates a large PHP session by building arrays of the files a user has access to, either in their organization or their collaborations, and merging these access arrays. When these files are displayed to the user, PHP checks this array to see if they have access, and if they do, it adds the button to open the file. I am doing it this way because running the query to check for access for each individual file ended up taking way too long when displaying long file lists, and PHP’s in_array() was substantially faster.
The problem is…
The php session has gotten so large that it seems to be slowing down simple website functions to a crawl, and I need to think of a new way to do this.
My question is…
What would be the best way to replace PHP sessions for storing file permissions and file locations for thousands of files a user has access to, so that when lists of files are being displayed, PHP can rapidly retrieve this information, without needing to run a query for each individual file?
Hm, without knowing the full scope of the problem, I’d suggest adding a
Sessions table in your database and include a
FilePermissions field and a
This field would store a json representation of your permissions structure. This would only require one call to the database and the majority of the processing would take place while parsing the json data server-side (which shouldn’t be much overhead at all).
This is a standard way to reduce the size of client-side session information. A good rule of thumb is putting anything in the
Sessions table that exposes the logic of your application.
I would only store the files that they do have access to in the json field. Non-existence can be assumed as prohibiting them from accessing the files. This would again reduce the performance footprint.
This would only work if there isn’t a complex permissions structure (like each file has permissions for read and write). If it doesn’t, I’d say you’re in the clear.
I’m not sure there is much you can do. Perhaps
memcached can help, but I haven’t used it (although, from what I heard, that’s what it’s used for).
You could persist the array in a file, although, as far as I know, that’s exactly what sessions do.
You could also try using shared memory to persist user data in-memory between script launches.
Do you really need the entire list of user’s permissions in one single array? That is, do you always display thousands of files to the user? If so, why? Would it be possible to redesign the system using AJAX to lazily fetch only a portion of files?
UPDATE: Another idea.
You could also precalculate permissions of the user for each file and store that in the database. Table could be called
FilesPermittedPerUser and have two-column primary key
userID / fileID. This will create an index that is sorted first by
userID, then by
fileID. A two-column key would also enforce uniqueness of entries.
Since it would then be indexed by user, you can simply
ORDER BY userID and
LIMIT 10, 10 to list only files 10-20. Fetching only parts of the list via AJAX would mean you’d never cause the terrible memory load that your scripts currently cause.
It would only require that whenever permissions of the file are updated (for example, file is created, file is deleted, group permissions are changed, group membership of user is changed, group membership of file is changed…) you would have to update the table. I suspect this should not be too difficult. Just make sure you do cache update in a transaction, to preserve operation atomicity.
You may also want to organize the filesystem in folders. It makes very little sense to just throw tons of files at users and have to maintain them at all times. Try throwing 10.000 files at Explorer/Finder/Nautilus and see what happens when you open that folder. Nothing nice, and they get to keep the memory around – and with PHP you don’t.
Final idea (although you probably don’t have to go to these extremes): rewrite filesystem APIs in something that isn’t PHP and can keep the permission data around. Use PHP only to forward requests to this custom server that runs on a different.