- We have a PHP/MySQL application.
- Some portions of the calculations are done in SQL directly. eg: All users created in the last 24 hours would be returned via an SQL query ( NOW() – 1 day)
There’s a debate going on between a fellow developer and me where I’m having the opinion that we should:
A. Keep all calculations / code / logic in PHP and treat MySQL as a ‘dumb’ repository of information
B. Do a mix and match depending on whats easier / faster. http://www.onextrapixel.com/2010/06/23/mysql-has-functions-part-5-php-vs-mysql-performance/
I’m looking at maintainability point-of-view. He’s looking at speed (which as the article points out, some operations are faster in MySQL).
@mu is too short
I agree (and quite obviously) efficient WHERE clauses belong in the SQL level. However, what about examples like:
- Calculating a 24 period using NOW() – 1 day in SQL to select all users created in last 24 hours?
- Return capitalized first name and last name of all users?
- Concatenating a string?
- (thoughts, folks?)
Clear examples belonging in the SQL domain:
- specific WHERE selections
- Nested SQL statements
- Ordering / Sorting
- Selecting DISTINCT items
- Counting rows / items
I’d play to the strengths of each system.
Aggregating, joining and filtering logic obviously belongs on the data layer. It’s faster, not only because most DB engines have 10+ years of optimisation for doing just that, but you minimise the data shifted between your DB and web server.
On the other hand, most DB platforms i’ve used have very poor functionality for working with individual values. Things likes date formatting and string manipulation just suck in SQL, you’re better doing that work in PHP.
Basically, use each system for what it’s built to do.
In terms of maintainability, as long as the division between what happens where is clear, separating these to types of logic shouldn’t cause much problem and certainly not enough to out way the benefits. In my opinion code clarity and maintainability are more about consistency than about putting all the logic in one place.
Re: specific examples…
I know this isn’t what you’re referring too but dates are almost a special case. You want to make sure that all dates generated by the system are created either on the web server OR the database. Doing otherwise will cause some insidious bugs if the db server and webserver are ever configured for different timezones (i’ve seen this happen). Imagine, for example, you’ve got a
createdDatecolumn with a default of
getDate()that is applied on insert by the DB. If you were to insert a record then, using a date generated in PHP (eg
date("Y-m-d", time() - 3600), select records created in the last hour, you might not get what you expect. As for which layer you should do this on, i’d favour the DB for, as in the example, it lets you use column defaults.
For most apps i’d do this in PHP. Combining first name and surname sounds simple until you realise you need salutations, titles and middle initials in there sometimes too. Plus you’re almost definitely going to end up in a situation where you want a users first name, surname AND a combine salutation + firstname + surname. Concatenating them DB-side means you end up moving more data, although really, it’s pretty minor.
Depends. As above, if you ever want to use them separately you’re better off performance-wise pulling them out separately and concatenating when needed. That said, unless the datasets your dealing with are huge there are probably other factors (like, as you mention, maintainability) that have more bearing.
A few rules of thumb:
- Generating incremental ids should happen in the DB.
- Personally, i like my default applied by the DB.
- When selecting, anything that reduces the number of records should be done by the DB.
- Its usually good to do things that reduce the size of the dataset DB-side (like with the strings example above).
- And as you say; ordering, aggregation, sub-queries, joins, etc. should always be DB-side.
- Also, we haven’t talked about them but triggers are usually bad/necessary.
There are a few core trade-offs your facing here and the balance really depends on you application.
Some things should definitely-everytime-always be done in SQL. Excluding some exceptions (like the dates thing) for lot of tasks SQL can be very clunky and can leave you with logic in out of the way places. When searching your codebase for references to a specific column (for example) it is easy to miss those contained in a view or stored procedure.
Performance is always a consideration but, depending on you app and the specific example, maybe not a big one. Your concerns about maintainability and probably very valid and some of the performance benefits i’ve mentioned are very slight so beware of premature optimisation.
Also, if other systems are accessing the DB directly (eg. for reporting, or imports/exports) you’ll benefit from having more logic in the DB. For example, if you want to import users from another datasource directly, something like an email validation function would be reusable is implemented in SQL.
Short answer: it depends. 🙂
I don’t like reinventing the wheel. I also like to use the best tool possible for the task needed to be done, so:
- When I can get the resultset straight from DB without further processing I do it – your case it’s a simple query with a simple
WHEREclause. Imagine what happens when you have 10 millions users and you get them to PHP, just to need 100 of them – you guessed it – it’s very possible for your web server to crash
- When you need to get data from 2 or more tables at once, again, MySQL is much better than PHP
- When you need to count records – the DB is great at it
- I tend to favor application level processing to FK constraints
- Also, I tend to avoid stored procedures, preferring to implement that business logic at application level (unless, of course we are talking about huge data sets).
In conclusion, I would say that your colleague is right in the case presented
If you put half your logic in the database and the other half in the php, then 6 months down the track when you come to make a change it will take you twice as long to figure out what is going on.
Having said that though, your database queries should have just enough logic so that they provide your php with exactly the data it needs. If you are finding yourself looping through thousands of mysql records in your php code, then you are doing something wrong. On the other end of the scale though, if you’re running if / else statements in your mysql queries you are also doing something wrong (probably just need to rewrite your query).
I’d steer clear of stored procedures. While they are a great concept in theory you can usually accomplish the same result in the php with a much faster development time and you also have the added benefit of knowing where all the logic is.
MySQL will scale better as result sets increase. Frankly, treating a database as a “dumb data” repository is a waste of resources…
Maintainability tends to be tainted by familiarity. If you’re not familiar with PHP, it wouldn’t be your initial choice for maintainability — would it?
The time taken to fetch the data in SQL is time consuming but once its done calculations are more over same. It won’t be much time consuming either way after data is fetched but doing it smartly in the SQL can give better results for large data sets.
If you are fetching data from MYSQL and then doing the calculations in PHP over the fetched data, then its far better to fetch the required result and avoid PHP processing, as it will increase more time.
Some basic points:
Date formatting in MYSQL is strong, most formats are available in Mysql. If you have very specific date format then you can do it PHP.
String manipulation just suck in SQL, better do that work in PHP. If you have not big string manipulation needed to do, then you can do it in Mysql SELECTs.
When selecting, anything that reduces the number of records should be done by the SQL and not PHP
Ordering data should always be done in Mysql
Aggregation should be always done in Mysql because DB engines are specifically designed for this.
Sub-Queries and Joins should always be DB-side. It will reduce your lots of PHP code. When you need to get data from 2 or more tables at once, again, SQL is much better than PHP
Want to count records, SQL is great.