I can not decrypt the data from the stream like:
56 0 obj
<< /Length 1242 /Filter /FlateDecode >>
stream
x]êΩnƒ Ñ{ûbÀKq¬æ\âê¢....(whole binary is omitted)
endstream
endobj
I tried isolate the binary content (x]êΩnƒ Ñ{ûbÀKq¬æ\âê¢....
) in a file and in a binary string. Decoding function gzinflate($encripted_data)
sends me error of decoding, and I think it happens because encoded content in not “deflated” or so.
In PDF Reference v 1.7, (six edition), on a page 67, I found the description of /FlateDecode filter as: …Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data
I need real raw solution, aka php function or/and algorithm what to do with this “\FlateDecoded” stream.
Thank You!
Since you didn’t tell if you need to access one decompressed stream only or if you need all streams decompressed, I’ll suggest you a simple commandline tool which does it in one go for the complete PDF: Jay Berkenbilt’s qpdf
.
Example commandline:
qpdf --qdf --object-streams=disable in.pdf out.pdf
out.pdf
can then be inspected in a text editor (only embedded ICC profiles, images and fonts could still be binary).
qpdf
will also automatically re-order the objects and display the PDF syntax in a normalized way (and telling you in a comment what the original object ID of the de-compressed object was).
Should you require to re-compress the file again (maybe after you edited it), just run this command:
qpdf out-edited.pdf out-recompressed.pdf
(You may see some warning message, telling that the utility was attempting to repair a damaged file….)
qpdf
is multi-platform and available from Sourceforge.
Answer:
header('Content-Type: text'); // I going to download the result of decoding
$n = "binary_file.bin"; // decoded part in file in a directory
$f = @fopen($n, "rb"); // now file is mine
$c = fread($f, filesize($n)); // now I know all about it
$u = @gzuncompress($c); // function, exactly fits for this /FlateDecode filter
$out = fopen("php://output", "wb"); // ready to output anywhere
fwrite($out, $u); // output to downloadable file
Jingle bells! Jingle bells!…
gzuncompress()
– the solution
Answer:
Long overdue, but someone might find it helpful. In this case:
<< /Length 1242 /Filter /FlateDecode >> all you need is to pass the isolated binary string (so basically everything between “stream” and “endstream”) to zlib.decompress:
import zlib
stream = b"êΩnƒ Ñ{ûbÀKq¬æ\âê" # binary stream here
data = zlib.decompress(stream) # Here you have your clean decompressed stream
However, if you have/DecodeParms in your PDF object thing become complicated. You will need the /Predictor value and columns number. Better use PyPDF2 for this.
Answer:
i just used
import de.intarsys.pdf.filter.FlateFilter;
from jpod / source forge
and it works well
FlateFilter filter = new FlateFilter(null);
byte[] decoded = filter.decode(bytes, start, end - start);
the bytes are straight from the pdf file