I am working with a team of developers on a website. The website will be using classes. I am in charge of creating the data access layer for the classes. There is an understanding that all user input will be escaped upon retrieval (from the post or get). Having little control over the input level (unless I personally review everyone’s code), I thought it would be cool to throw in escaping on my end as well (right before it hits the database). The problem is that I don’t know how to use mysql_real_escape_string without adding even more slashes.
Since user input may very well contain a slash I can’t check to make sure there are slashes in it. I might be able to check for all the things that need escaping and make sure they have a slash in front of them, but that doesn’t seem like the best way to do it.
Have you considered not escaping the data until it hits the data access layer? I ask, because their are some gotchas with the approach your team is taking:
- If you need to display form data to the user (e.g., to redisplay the form with an error message because some validation failed), you need to de-escape the data (because
'is not special to HTML) and then re-escape the data (because
<is special). If you need to display form data to the user pulled from the database, you mustn’t do that de-escape step (because it was done by the database, when the data was saved), but still must do the HTML escape step. If you make a mistake and do the wrong procedure, you corrupt data or worse introduce security problems.
- You can deal with the different formats from different sources problem by decided all data passed around your app will be escaped. So, your data access layer will re-escape the data upon getting it from the database. But, as different parts of the app need slightly (or completely) different escapes, this quickly leads to a lot of de-escape/re-escape nonsense. Grab the data from the database, escape it, de-escape it, escape it for HTML, output it.
- Your front-end form handling code has to have intimate knowledge of your database. For example, what does
\'mean to your database? How should a
\be escape — if at all? If you ever change your database engine, or even change its settings, those may change. And then you have a bunch of escaping/de-escaping code to find. Missing a single escape/de-escape may lead to SQL injection.
- Alternatively, you can take that knowledge of the database out of the front-end code by having the database layer do a de-escape/escape cycle to convert from your app-standard escape sequence to your database’s. But this seems rather silly!
There is another way: Let whichever layer needs the data escaped escape it itself. Data is always passed between layers in raw, unescaped form. So your data access layer does all database escaping. Your HTML output code does all HTML escaping. When you decide you want to generate PDFs, your PDF code does all PDF escaping.
- Now, when you do form output, its clear what to do: always HTML escape the data. No matter where it came from. Never run a de-escape.
- There is now no de-escape/escape nonsense, as everything is passed around raw. It is only escaped when necessary.
- Your front-end code doesn’t care about the data access layer’s implementation. The data access layer stores and returns any arbitrary string.
- You have only one place to look in your app to make sure you have no SQL injection problems.
- You can easily make use of database driver features such as placeholders. Then not even your data access layer needs to be aware of each database’s escaping requirements; the database driver handles it.
There is no way you could add an automatic decision to escape or not if you don’t know if the input has been escaped. You can attempt to analyze it but it will never be good and you will encounter double backslash pairs and such.
Take the decision once that data sent to your access layer should be clean and handle the escaping in one place. If you do, the other developers will not have to worry about it (they probably don’t want to anyway) and it will be much easier to move to another database in the future. It will also give you the freedom to move over to prepared statements at any time.
Edit: Forgot this:
Having little control over the input
level (unless I personally review
I think it is worth the pain to have them discover it themselves if you just make it very clear that escaping is something that belongs to the database layer and should not be done elsewhere.
If I were in your position, I would not be lazy enough to not review everyone’s code. Even if you’re not reviewing for the escaping of the user input, you still might want to see if their code is done efficiently. Or perhaps, it’s not you to do the reviewing, but someone has to do it.
I’ve experienced an almost similar setup no so long ago where we divided the tasks by layers. One worked on the model, I worked on the controller, and the other one worked on the views. Because we trusted everyone so much that everybody else’s code will work the way we expected it to work, we didn’t bother reviewing the other’s code until the time that we needed to merge them. What happened was we discovered inefficient code in the model late in the development. And it wasn’t just inefficient, it didn’t work! Because of that, we had to overhaul huge chunks of the code which cost us more time.
I suggest that you create a technical requirements specifications document where it is specified in it the acceptable inputs from the users. This document should be followed by the ones who will code the part that will accept the user input. Better yet, create unit tests to see if those requirements are being followed strictly so you do not have to worry if the data they’re going to pass to you is invalid.
Another thing… since you’re using PHP, why not use a good framework? Most available frameworks come with their own DAL where you no longer need to worry a lot about escaping database input (well, not that much). The frameworks should do it for you.
Also, you might want to look at ‘prepared statements’.