Skip to content Skip to sidebar Skip to footer

What Is The Fastest Way To Read A Specific Chunk Of Data From A Large Binary File In Python

I have a sensor unit which generates data in large binary files. File sizes can run into several tens of Gigabytes. I need to: Read the data. Process it to extract the necessary i

Solution 1:

Right now you are reading the whole file into memory when you do np.fromfile(fid, np.float32). If that fits and you want to access a significant number of traces (if you're calling your function with lots of different values for n), your only big speedup is to avoid reading it multiple times. So perhaps you might want to read the whole file and then have your function just index into that:

# just once:withopen(data_file, 'rb') as fid:
    alldata = list(np.fromfile(fid, np.float32)

# then use this functiondefget_data(alldata, n):
    return alldata[n*no_of_points_per_trace:(no_of_points_per_trace*(n+1))])

Now, if you find yourself needing only one or two traces from the big file, you can seek into it and just read the part you want:

defget_data(n):
    dtype = np.float32
    withopen(data_file, 'rb') as fid:
        fid.seek(dtype().itemsize*no_of_points_per_trace*n)
        data_array = np.fromfile(fid, dtype, count=no_of_points_per_trace)
    return data_array

You will notice I have skipped converting to list. This is a slow step and probably not required for your workflow.

Post a Comment for "What Is The Fastest Way To Read A Specific Chunk Of Data From A Large Binary File In Python"