Home » Php » php – How should I Query this in mysql

php – How should I Query this in mysql

Posted by: admin July 12, 2020 Leave a comment

Questions:

I have a web app in which I show a series of posts based on this table schema (there are thousands of rows like this and other columns too (removed as not required for this question)) :-

+---------+----------+----------+
|   ID    |   COL1   |   COL2   |
+---------+----------+----------+
|   1     |    NULL  |   ----   |
|   2     |    ---   |   NULL   |
|   3     |    NULL  |   ----   |
|   4     |    ---   |   NULL   |
|   5     |    NULL  |   NULL   |
|   6     |    ---   |   NULL   |
|   7     |    NULL  |   ----   |
|   8     |    ---   |   NULL   |
+---------+----------+----------+

And I use this query :-

SELECT * from `TABLE` WHERE `COL1` IS NOT NULL AND `COL2` IS NULL ORDER BY `COL1`;

And the resultant result set I get is like:-

+---------+----------+----------+
|   ID    |   COL1   |   COL2   |
+---------+----------+----------+
|   12    |    ---   |   NULL   |
|   1     |    ---   |   NULL   |
|   6     |    ---   |   NULL   |
|   8     |    ---   |   NULL   |
|  11     |    ---   |   NULL   |
|  13     |    ---   |   NULL   |
|   5     |    ---   |   NULL   |
|   9     |    ---   |   NULL   |
|   17    |    ---   |   NULL   |
|   21    |    ---   |   NULL   |
|   23    |    ---   |   NULL   |
|   4     |    ---   |   NULL   |
|   32    |    ---   |   NULL   |
|   58    |    ---   |   NULL   |
|   61    |    ---   |   NULL   |
|   43    |    ---   |   NULL   |
+---------+----------+----------+

Notice that the IDs column is jumbled thanks to the order by clause.

I have proper indexes to optimize these queries.
Now, let me explain the real problem. I have a lazy-load kind of functionality in my web-app. So, I display around 10 posts per page by using a LIMIT 10 after the query for the first page.

We are good till here. But, the real problem comes when I have to load the second page. What do I query now? I do not want the posts to be repeated. And there are new posts coming up almost every 15 seconds which make them go on top(by top I literally mean the first row) of the resultset(I do not want to display these latest posts in the second or third pages but they alter the resultset size so I cannot use LIMIT 10,10 for the 2nd page and so on as the posts will be repeated.).

Now, all I know is the last ID of the post that I displayed. Say 21 here. So, I want to display the posts of IDs 23,4,32,58,61,43 (refer to the resultset table above). Now, do I load all the rows without using the LIMIT clause and display 10 ids occurring after the id 21. But for that I will have to interate over thousands of useless rows.But, I cannot use a LIMIT clause for the 2nd,3rd… pages that is for sure. Also, the IDs are jumbled, so I can definitely not use WHERE ID>.... So, where do we go now?

How to&Answers:

Hmm..
I thought for a while and came up with 2 solutions. :-

  1. To store the Ids of the post already displayed and query WHERE ID NOT IN(id1,id2,...). But, that would cost you extra memory. And if the user loads 100 pages and the ids are in 100000s then a single GET request would not be able to handle it. At least not in all browsers. A POST request can be used.

  2. Alter the way you display posts from COL1. I don’t know if this would be a good way for you. But, it can save you bandwith and make your code cleaner. It may also be a better way. I would suggest this :- SELECT * from TABLE where COL1 IS NOT NULL AND COL2 IS NULL AND Id>.. ORDER BY ID DESC LIMIT 10,10. This can affect the way you display your posts by leaps and bounds. But, as you said in your comments that you check if a post meets a criteria and change the COL1 from NULL to the current timestammp, I guess that the newer the posts the, the more above you want to display them. It’s just an idea.

Answer:

I’m not sure if I’ve understood your question correctly, but here’s how I think I would do it:

  • Add a timestamp column to your table, let’s call it date_added
  • When displaying the first page, use your query as-is (with LIMIT 10) and hang on to the timestamp of the most recent record; let’s call it last_date_added.
  • For the 2nd, 3rd and subsequent pages, modify your query to filter out all records with date_added > last_date_added, and use LIMIT 10, 10, LIMIT 20, 10, LIMIT 30, 10 and so on.

This will have the effect of freezing your resultset in time, and resetting it every time the first page is accessed.

Notes:

  • Depending on the ordering of your resultset, you might need a separate query to obtain the last_date_added. Alternatively, you could just cut off at the current time, i.e. the time when the first page was accessed.
  • If your IDs are sequential, you could use the same trick with the ID.

Answer:

I assume new posts will be added with a higher ID than the current max ID right? So couldn’t you just run your query and grab the current max ID. Then when you query for page 2 do the same query but with “ID < max_id”. This should give you the same result set as your page 1 query because any new rows will have ID > max_id. Hope that helps?

Answer:

How about?

ORDER BY `COL1`,`ID`;

This would always put IDs in order. This will let you use:

LIMIT 10,10

for your second page.