I find particularly difficult reading binary file with Python. Can you give me a hand?
I need to read this file, which in Fortran 90 is easily read by
int*4 n_particles, n_groups real*4 group_id(n_particles) read (*) n_particles, n_groups read (*) (group_id(j),j=1,n_particles)
In detail, the file format is:
Bytes 1-4 -- The integer 8. Bytes 5-8 -- The number of particles, N. Bytes 9-12 -- The number of groups. Bytes 13-16 -- The integer 8. Bytes 17-20 -- The integer 4*N. Next many bytes -- The group ID numbers for all the particles. Last 4 bytes -- The integer 4*N.
How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?
Read the binary file content like this:
with open(fileName, mode='rb') as file: # b is important -> binary fileContent = file.read()
then “unpack” binary data using struct.unpack:
The start bytes:
The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string
'i' to create the correct format for the unpack method:
struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])
The end byte:
In general, I would recommend that you look into using Python’s struct module for this. It’s standard with Python, and it should be easy to translate your question’s specification into a formatting string suitable for
Do note that if there’s “invisible” padding between/around the fields, you will need to figure that out and include it in the
unpack() call, or you will read the wrong bits.
Reading the contents of the file in order to have something to unpack is pretty trivial:
import struct data = open("from_fortran.bin", "rb").read() (eight, N) = struct.unpack("@II", data)
This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the
@ symbol). The
Is in the formatting string mean “unsigned integer, 32 bits”.
You could use
numpy.fromfile, which can read data from both text and binary files. You would first construct a data type, which represents your file format, using
numpy.dtype, and then read this type from file using