I’ve had this question for a while: how exactly is the mime type of a file determined? I believe this is done by checking if specific bytes of the file contain any of the known magic numbers / file signatures, right?
If so, this poses another question, lets say I upload a bash script with a fake GIF file signature to a website that only allows images to be uploaded, what is going to happen? Either:
- the mimetype detection routine is smart enough to detect fake signatures, or
image/gifis wrongly returned as the mimetype and the upload is allowed to continue
I don’t have an HEX editor installed ATM, and I don’t like to form security-related conclusions from tests as I might miss (or misinterpret) something, so my question is: which one of the above options is correct?
Also, are there any other best practices (besides checking the mimetype) to assure that any given file is in fact what it seems / needs (or is allowed) to be? Thanks in advance.
PS: Just to be clear, I’m not asking about the
type index in the
My understanding is the MIME determination routines in the file upload code are extremely crude and that the MIME type in the $_FILES array simply can’t be trusted. It’s been my experience that it’s easily foxed.
You’re better off using the Fileinfo library, which provides more robust file type detection.
If you’re talking about
$_FILES['userfile']['type'] then this information is sent by the browser. It may or may not be present and even if its present you should treat it just like any other user input.
If you’re interested in checking for images you can use the getimagesize function to determine file type. This function returns NULL for images it cannot understand. Even if it returns a valid image type you can still reject the file e.g. if you’re expecting GIF and JPEGs and you get a TIFF instead.
Also, a webserver will determine whether to execute a file of not depending on file permissions (the execute bit and the shebang line) and file extension. If you keep a check on these two you’re probably OK.
My understanding is that this (vulnerable MIME types) is the reason that filename’s should be encrypted through various means when they’re uploaded and then stored in a database to be retrieved via ID numbers. Basically should someone manage to upload a malicious script, they’ll never be able to find it to run it?