Home » Php » php – How to detect MIME type of plain text files: CSS, Javascript, ini, sql?

php – How to detect MIME type of plain text files: CSS, Javascript, ini, sql?

Posted by: admin July 12, 2020 Leave a comment

Questions:

Detecting the MIME type of a file with PHP is trivial – just use PEAR’s MIME_Type package, PHP’s fileinfo or call file -i on a Unix machine.
This works really well for binary files and all others that have some kind of “magic bytes” through which they can be detected easily.

What I’m failing at is detecting the correct MIME type of plain text files:

  • CSS
  • Diff
  • INI (configuration)
  • Javascript
  • rST
  • SQL

All of them are identified as “text/plain”, which is correct, but too unspecific for me. I need the real type, even if it costs some time to analyze the file content.

So my question: Which solutions exist to detect the MIME type of such plain text files? Any Libraries? Code snippets?


Note that I neither have a filename nor a file extension, but I have the file content.


If I used ruby, I could integrate github’s linguist. Ohloh’s ohcount is written in C, but has a command line tool to detect the type: ohcount -d $file

What I’ve tried

ohcount

Detects xml and php files correctly, all other not.

Apache tika

Detects xml and html, all other tests files were only seen as text/plain.

How to&Answers:

Since I didn’t find a proper library, I wrote my own magic file that detects all of my test files properly.

My application first tries my custom magic file for detection and falls back to the normal/system magic file if no type is detected.

The code it on github, see https://github.com/cweiske/MIME_Type_PlainDetect .
The magic file is at data/programming.magic and can be used with file -f programming.magic /path/to/source

Answer:

I think Magical detection from Apache Tika could help you:

http://tika.apache.org/

Answer:

How to :

  • .ini To check ini files, you’ll use parse_ini_file function. It return false if the ini file is wrong.
  • .css First check if you find something like that body {, html { or body, html {. You can also try keywords from CSS like font-family, background, border, etc.
  • .sql You will likely find something like INSERT INTO, UPDATE (.*) SET, CREATE TABLE, etc, again look for keywords.
  • .js For Javascript, you will have to find parse everything again for keywords…

For others, I don’t know them.

Answer:

I found this library: http://pear.php.net/package/MIME_Type/

According to its description it “Provides functionality for dealing with MIME types.” and gives the following features:

  • Parse MIME type.
  • Supports full RFC2045 specification.
  • Many utility functions for working with and determining info about types.
  • Most functions can be called statically.
  • Autodetect a file’s mime-type, either with fileinfo extension, mime_magic extension, the ‘file’ command or an in-built mapping list