A general question, without a specific case in mind – is it usually preferred to use MySQL stored procedures over writing a PHP script that performs the same calculations and queries?
What are the benefits of each method?
Point/Counter Point with Jeff Atwoods “Who Needs Stored Procedures, Anyways?” from 2004:
Response: The “S” in “SQL” means “Structured”, not “Standardized” – PLSQL and TSQL are both custom extensions of SQL, which also bring ANSI SQL into play because there is very little SQL that is database agnostic. Generally, if you want a query that performs well you can’t rely on ANSI SQL.
ORM isn’t a silver bullet – because of the database abstraction, most support running native stored procedures/functions in order to get a well performing query. Which is nice, but utterly defeats the purpose of ORM…
2) Stored Procedures typically cannot be debugged in the same IDE you write your UI. Every time I isolate an exception in the procs, I have to stop what I am doing, bust out my copy of Toad, and load up the database packages to see what’s going wrong. Frequently transitioning between two totally different IDEs, with completely different interfaces and languages, is not exactly productive.
3) Stored Procedures don’t provide much feedback when things go wrong. Unless the proc is coded interally with weird T-SQL or PL/SQL exception handling, we get cryptic ‘errors’ returned based on the particular line inside the proc that failed, such as Table has no rows. Uh, ok?
Response: Your lack of familiarity does not a poor language make. Like you’ve never had to google for a weird error in your language of choice… At least Oracle & MySQL give you error reference numbers.
4) Stored Procedures can’t pass objects. So, if you’re not careful, you can end up with a zillion parameters. If you have to populate a table row with 20+ fields using a proc, say hello to 20+ parameters. Worst of all, if I pass a bad parameter– either too many, not enough, or bad datatypes– I get a generic “bad call” error. Oracle can’t tell me which parameters are in error! So I have to pore over 20 parameters, by hand, to figure out which one is the culprit.
Response: SQL is SET based, completely unlike procedural/OO programming. Types are close to objects, but at some point there needs to be a mapping between procedural/OO objects and database entities.
5) Stored Procedures hide business logic. I have no idea what a proc is doing, or what kind of cursor (DataSet) or values it will return to me. I can’t view the source code to the proc (at least, without resorting to #2 if I have appropriate access) to verify that it is actually doing what I think it is– or what the designer intended it to do. Inline SQL may not be pretty, but at least I can see it in context, alongside the other business logic.
Response: This is a Good Thing(tm) – that’s how you get Model-View-Controller (MVC), so you can have a front end in any multitude of languages without having to duplicate the logic every time while dealing with each languages’ quirks to replicate that logic. Or is it good that the database allows bad data to be added if someone connects directly to the database? Trips back & forth between the application and the database waste time & resources your application will never recoup.
I think Jeff Atwood hit the nail on the head in 2004 regarding stored procs:
Having used both stored procedures and dynamic SQL extensively I definitely prefer the latter: easier to manage, better encapsulation, no BL in the data access layer, greater flexibility and much more. Virtually every major open-source PHP project uses dynamic SQL over stored procs (see: Drupal, WordPress, Magento and many more).
This conversation almost seems archaic: get yourself a good ORM, stop fretting over your data access and start building awesome applications.
For us using stored procedures is absolutely critical. We have a fairly large .net app. To redeploy the entire app can take our users offline for a brief period which simply is not allowed.
However, while the application is running we sometimes have to make minor corrections to our queries. Simple things like adding or removing a NOLOCK, or maybe even change the joins involved. It’s almost always for performance reasons. Just today we had a bug caused by an extraneous NOLOCK. 2 minutes to locate the problem, determine solution, and deploy new proc: zero downtime. To do so with queries in code would have caused at least a minor outage potentially pissing off a lot of people.
Another reason is security. With proc’s we pass the user id (non-sequential, non-guessable) into each proc call. We validate the user has access to run that function in the web app, and again inside the database itself. This radically raises the barrier for hackers if our web app was compromised. Not only couldn’t they run any sql they want, but even to run a proc they would have to have a particular authorization key.. Which would be difficult to acquire. (and no that’s not our only defense)
We have our proc’s under source control, so that isn’t an issue. Also, I don’t have to worry about how I name things (certain ORM’s hate certain naming schemes) and I don’t have to worry about in flight performance. You have to know more than just SQL to properly tune an ORM.. You have to know the ORM’s particular behaviors.
Stored procedure 99 times out of 100. If I were topick 1 reason then it would be that if your php web app does all database acces via stored procedures and your the database user only has permision toexecute said stored procedures then you are 100% protected against SQL injection atacks.
For me, the advantage of keeping anything to do with the database within the database is debugging. If you have your calculations (at least most of them) done within the stored procedure, and you need to make a change, then you just modify it, test it, save it. There would be no changes to your PHP code.
If you’re storing major calculations within your PHP code, you need to take the SQL statements from the code, clean it up, then modify it, test it and then copy it back in and test it again.
Ease of maintenance comes to mind with keeping things separate. The code look cleaner, and be easier to read if you use stored procedures, because we all know that come SQL scripts just get to be ridiculously large. Keep all that database logic in the database.
If the database is properly tuned, you’ll probably have slightly quicker times for execution of the query, because rather than having PHP parse the string, then send it to the database, then the database executes it and sends it back, you can just push parameters into the database with the stored procedure, it will have a cached execution plan for the stored procedure, and things will be slightly quicker. A few carefully placed indexes can help speed up any data retrieval because really – the web server is just a conduit, and PHP scripts don’t load it up that much.
I would say “don’t make too much magic with the database”. The worst case would be for a new developper on the project to notice that ** an operation ** is done, but he cannot see where in the code it is. So he keeps looking for it. But it’s done in the database.
So if you do some “invisible” database operations (I’m thinking about triggers), just write it in some code documentation.
// add a new user $user = new User("john", "doe"); $user->save(); // The id is computed by the database see MYPROC_ID_COMPUTATION print $user->getId();
In the other hand, writing functions for the DB is a good idea, and would provide the developer a good abstraction layer.
// Computes an ID for the given user DB->execute("SELECT COMPUTE_ID(" . $user->getLogin() . ") FROM DUAL");
Of course this is all pseudo-code, but I hope you understand my obscure idea.
Well, there’s a side of this argument that I very rarely hear, so I’ll write it here…
Code is version controlled. Databases are not. So if you have more than one instance of your code, you’ll need some way of performing migrations automagically upon update or you’ll risk breaking things. And even with that, you still face the problems of “forgetting” to add an updated SP to the migration script, and then breaking a build (potentially without even realizing it if you aren’t testing REALLY idepth).
From debugging and maintenance, I find SP’s 100x as hard to dissect as raw SQL. The reason is that it requires at least three steps. First, look in PHP code to see what code is called. Then go into database and find that procedure. Then finally look at the procedure’s code.
Another argument (along the lines of version control), is there’s no
svn st command for the SP’s. So if you get a developer who manually modifies a SP, you’re going to have a hell of a time figuring that out (assuming they are not all managed by a single DBA).
Where SP’s really shine is when you have multiple applications talking to the same database schema. Then, you only have one place where DDL and DML is stored, and both applications can share it without having to add a cross dependency in one or more libraries.
So, in short, my view is as follows:
Use Stored Procedures:
- When you have multiple applications working off the same dataset
- When you have the need to loop over queries and execute other queries (avoiding the TCP layer losses can GREATLY improve efficiency)
- When you have a really good DBA, as it will enforce all SQL being handled by him/her.
Use raw SQL/ORM/Generated SQL just about in any other case (Just about, since there are bound to be edge cases I am not thinking about)…
Again, that’s just my $0.02…
I use stored procedures as much as possible for a number of reasons.
Reduce round trips to the database
If you need to alter multiple related tables at once, then you can use a single stored procedure so that only one call to the database is made.
Clearly define business logic
If certain things must be true about a query, then a store procedure lets someone who knows SQL (a fairly simple language) ensure that things are done right.
Create simple interfaces for other programmers
Your non-sql competent teammates can use much simpler interfaces to the database and you can be certain that they cannot put relationships in a bad state on accident.
SELECT a.first_name, IFNULL( b.plan_id, 0 ) AS plan_id FROM account AS a LEFT JOIN subscription AS s ON s.account_id = a.id WHERE a.id = 23
CALL account_get_current_plan_id( 23 );
Write them a nice little wrapper to take care of handling stored procedure calls and they are in business.
Update all usages in a system at once
If everyone uses stored procedures to query the database and you need to change how something works, you can update the stored procedure and it is updated everywhere as long as you don’t change the interface.
If you can use just stored procedures to do everything within your system, then you can give seriously restricted permissions to the user account which accesses the data. No need to give them UPDATE, DELETE, or even SELECT permissions.
Easy error handling
Many people don’t realize it, but you can create your stored procedures in such a manner that tracking down most problems becomes very easy.
You can even integrate your code base to properly handle the errors returned if you use a good structure.
Here is an example that does the following:
- Uses an exit handler for serious problems
- Uses a continue handler for less serious problems
- Does non table scan validation up front
- Does table scan validation next if validation has not failed
- Does processing in a transaction if things validate
- Rolls back everything if there is a problem
- Reports any problems found
- Avoids unnecessary updates
Here is the internals of a made up stored procedure that accepts an account id, the closing account id, and an ip address and then uses them to update as appropriately. The delimiter has already been set to $$:
BEGIN # Helper variables DECLARE r_code INT UNSIGNED; DECLARE r_message VARCHAR(128); DECLARE it_exists INT UNSIGNED; DECLARE n_affected INT UNSIGNED; # Exception handler - for when you have written bad code # - or something really bad happens to the server DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN ROLLBACK; SELECT 0 as `id`, 10001 as `code`, CONCAT(r_message, ' Failed with exception') as `message`; END; # Warning handler - to tell you exactly where problems are DECLARE CONTINUE HANDLER FOR SQLWARNING BEGIN SET r_code = 20001, r_message = CONCAT( r_message, 'WARNING' ); END; SET r_code = 0, r_message = '', it_exists = 0, n_affected = 0; # STEP 1 - Obvious basic sanity checking (no table scans needed) IF ( 0 = i_account_id ) THEN SET r_code = 40001, r_message = 'You must specify an account to close'; ELSEIF ( 0 = i_updated_by_id ) THEN SET r_code = 40002, r_message = 'You must specify the account doing the closing'; END IF; # STEP 2 - Any checks requiring table scans # Given account must exist in system IF ( 0 = r_code ) THEN SELECT COUNT(id) INTO it_exists FROM account WHERE id = i_account_id; IF ( 0 = it_exists ) THEN SET r_code = 40001, r_message = 'Account to close does not exist in the system'; END IF; END IF; # Given account must not already be closed # - if already closed, we simply treat the call as a success # - and don't bother with further processing IF ( 0 = r_code ) THEN SELECT COUNT(id) INTO it_exists FROM account WHERE id = i_account_id AND status_id = 2; IF ( 0 < it_exists ) THEN SET r_code = 1, r_message = 'already closed'; END IF; END IF; # Given closer account must be valid IF ( 0 = r_code ) THEN SELECT COUNT(id) INTO it_exists FROM account WHERE id = i_updated_by_id; END IF; # STEP 3 - The actual update and related updates # r-message stages are used in case of warnings to tell exactly where a problem occurred IF ( 0 = r_code ) THEN SET r_message = CONCAT(r_message, 'a'); START TRANSACTION; # Add the unmodified account record to our log INSERT INTO account_log ( field_list ) SELECT field_list FROM account WHERE id = i_account_id; IF ( 0 = r_code ) THEN SET n_affected = ROW_COUNT(); IF ( 0 = n_affected ) THEN SET r_code = 20002, r_message = 'Failed to create account log record'; END IF; END IF; # Update the account now that we have backed it up IF ( 0 = r_code ) THEN SET r_message = CONCAT( r_message, 'b' ); UPDATE account SET status_id = 2, updated_by_id = i_updated_by_id, updated_by_ip = i_updated_by_ip WHERE id = i_account_id; IF ( 0 = r_code ) THEN SET n_affected = ROW_COUNT(); IF ( 0 = n_affected ) THEN SET r_code = 20003, r_message = 'Failed to update account status'; END IF; END IF; END IF; # Delete some related data IF ( 0 = r_code ) THEN SET r_message = CONCAT( r_message, 'c' ); DELETE FROM something WHERE account_id = i_account_id; END IF; # Commit or roll back our transaction based on our current code IF ( 0 = r_code ) THEN SET r_code = 1, r_message = 'success'; COMMIT; ELSE ROLLBACK; END IF; END IF; SELECT r_code as `code`, r_message as `message`, n_affected as `affected`; END$$
Status code meanings:
- 0: should never happen – bad result
- 1: success – the account was either already closed or properly closed
- 2XXXX – problems with logic or syntax
- 3XXXX – problems with unexpected data values in system
- 4XXXX – missing required fields
Rather than trust programmers who are not familiar with databases (or simply not familiar with the scheme), it is much simpler to provide them interfaces.
Instead of doing all of the above checks, they can simply use:
CALL account_close_by_id( 23 );
And then check the result code and take appropriate action.
Personally, I believe that if you have access to stored procedures and you are not using them, then you really should look into using them.
Whereever possible, the end-user will benefit from the abstraction of the data from the UI. Therefore, you should try and leverage stored procedures as much as possible.
You don’t necessarily need the underlying values if the calculations are performed on the database, then let the database do them. This helps keep the volume of data transfer between database an PHP script to a minimum; but generally calculations with database data are best performed by the database itself.
I’ve heard people say “let the database do as much as it can” and others cried like “wtf, what are you doing to my database performance”.
So I guess it should mostly be a decision of usage rate (stored procedures will stress the MySQL process and PHP code will stress the web server process).