What Is The Fastest Way To Read A Specific Chunk Of Data From A Large Binary File In Python
I have a sensor unit which generates data in large binary files. File sizes can run into several tens of Gigabytes. I need to: Read the data. Process it to extract the necessary i
Solution 1:
Right now you are reading the whole file into memory when you do np.fromfile(fid, np.float32)
. If that fits and you want to access a significant number of traces (if you're calling your function with lots of different values for n
), your only big speedup is to avoid reading it multiple times. So perhaps you might want to read the whole file and then have your function just index into that:
# just once:withopen(data_file, 'rb') as fid:
alldata = list(np.fromfile(fid, np.float32)
# then use this functiondefget_data(alldata, n):
return alldata[n*no_of_points_per_trace:(no_of_points_per_trace*(n+1))])
Now, if you find yourself needing only one or two traces from the big file, you can seek into it and just read the part you want:
defget_data(n):
dtype = np.float32
withopen(data_file, 'rb') as fid:
fid.seek(dtype().itemsize*no_of_points_per_trace*n)
data_array = np.fromfile(fid, dtype, count=no_of_points_per_trace)
return data_array
You will notice I have skipped converting to list. This is a slow step and probably not required for your workflow.
Post a Comment for "What Is The Fastest Way To Read A Specific Chunk Of Data From A Large Binary File In Python"