I have a web app in which I show a series of posts based on this table schema (there are thousands of rows like this and other columns too (removed as not required for this question)) :-
+---------+----------+----------+ | ID | COL1 | COL2 | +---------+----------+----------+ | 1 | NULL | ---- | | 2 | --- | NULL | | 3 | NULL | ---- | | 4 | --- | NULL | | 5 | NULL | NULL | | 6 | --- | NULL | | 7 | NULL | ---- | | 8 | --- | NULL | +---------+----------+----------+
And I use this query :-
SELECT * from `TABLE` WHERE `COL1` IS NOT NULL AND `COL2` IS NULL ORDER BY `COL1`;
And the resultant result set I get is like:-
+---------+----------+----------+ | ID | COL1 | COL2 | +---------+----------+----------+ | 12 | --- | NULL | | 1 | --- | NULL | | 6 | --- | NULL | | 8 | --- | NULL | | 11 | --- | NULL | | 13 | --- | NULL | | 5 | --- | NULL | | 9 | --- | NULL | | 17 | --- | NULL | | 21 | --- | NULL | | 23 | --- | NULL | | 4 | --- | NULL | | 32 | --- | NULL | | 58 | --- | NULL | | 61 | --- | NULL | | 43 | --- | NULL | +---------+----------+----------+
Notice that the IDs column is jumbled thanks to the order by clause.
I have proper indexes to optimize these queries.
Now, let me explain the real problem. I have a lazy-load kind of functionality in my web-app. So, I display around 10 posts per page by using a
LIMIT 10 after the query for the first page.
We are good till here. But, the real problem comes when I have to load the second page. What do I query now? I do not want the posts to be repeated. And there are new posts coming up almost every 15 seconds which make them go on top(by top I literally mean the first row) of the resultset(I do not want to display these latest posts in the second or third pages but they alter the resultset size so I cannot use
LIMIT 10,10 for the 2nd page and so on as the posts will be repeated.).
Now, all I know is the last ID of the post that I displayed. Say
21 here. So, I want to display the posts of IDs
23,4,32,58,61,43 (refer to the resultset table above). Now, do I load all the rows without using the
LIMIT clause and display 10 ids occurring after the id
21. But for that I will have to interate over thousands of useless rows.But, I cannot use a
LIMIT clause for the 2nd,3rd… pages that is for sure. Also, the IDs are jumbled, so I can definitely not use
WHERE ID>.... So, where do we go now?
I thought for a while and came up with 2 solutions. :-
To store the Ids of the post already displayed and query
WHERE ID NOT IN(id1,id2,...). But, that would cost you extra memory. And if the user loads 100 pages and the ids are in 100000s then a single GET request would not be able to handle it. At least not in all browsers. A POST request can be used.
Alter the way you display posts from
COL1. I don’t know if this would be a good way for you. But, it can save you bandwith and make your code cleaner. It may also be a better way. I would suggest this :-
SELECT * from TABLE where COL1 IS NOT NULL AND COL2 IS NULL AND Id>.. ORDER BY ID DESC LIMIT 10,10. This can affect the way you display your posts by leaps and bounds. But, as you said in your comments that you check if a post meets a criteria and change the COL1 from NULL to the current timestammp, I guess that the newer the posts the, the more above you want to display them. It’s just an idea.
I’m not sure if I’ve understood your question correctly, but here’s how I think I would do it:
- Add a timestamp column to your table, let’s call it
- When displaying the first page, use your query as-is (with
LIMIT 10) and hang on to the timestamp of the most recent record; let’s call it
- For the 2nd, 3rd and subsequent pages, modify your query to filter out all records with
date_added > last_date_added, and use
LIMIT 10, 10,
LIMIT 20, 10,
LIMIT 30, 10and so on.
This will have the effect of freezing your resultset in time, and resetting it every time the first page is accessed.
- Depending on the ordering of your resultset, you might need a separate query to obtain the
last_date_added. Alternatively, you could just cut off at the current time, i.e. the time when the first page was accessed.
- If your IDs are sequential, you could use the same trick with the ID.
I assume new posts will be added with a higher ID than the current max ID right? So couldn’t you just run your query and grab the current max ID. Then when you query for page 2 do the same query but with “ID < max_id”. This should give you the same result set as your page 1 query because any new rows will have ID > max_id. Hope that helps?
ORDER BY `COL1`,`ID`;
This would always put IDs in order. This will let you use:
for your second page.