Shaping Data In Python
I'm currently working with data generated by eyelink. The csv (transformed from asc) is basically one large sequential list, i.e. columns are not created, so for example a row will
Solution 1:
If we assume that all lines have the same delemeter, this problem isn't as bad as it looks.
The key is realizing that all of the frame lines start with the key 'MSG'
:
import csv
# Header values
FRAME_KEY = 'MSG'
FRAME_IDX = 0
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 3
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 2
# Data values
XCOR_KEY = 'X-CoOr'
XCOR_IDX = 1
YCOR_KEY = 'Y-CoOr'
YCOR_IDX = 2
TIME_KEY = 'Time'
TIME_IDX = 3
IN_DELIM = '\t'
OUT_DELIM= '\t'
OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]
with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv') as out_file:
in_reader = csv.reader(in_file, delimeter = IN_DELIM)
out_writer= csv.DictWriter(out_file, OUT_HEADER, delimeter = OUT_DELIM)
out_writer.writeheader()
current_frame = None
current_trial = None
for row in in_reader:
if row[FRAME_IDX] == FRAME_KEY:
# Means we're at the start of a new frame
current_frame = row[FRAME_ID_IDX]
current_trial = row[TRIAL_ID_IDX]
else:
# Means we're in a data row
out_row = dict()
out_row[FRAME_ID_KEY] = current_frame
out_row[TRIAL_ID_KEY] = current_trial
out_row[XCOR_KEY] = row[XCOR_IDX]
out_row[YCOR_KEY] = row[YCOR_IDX]
out_row[TIME_KEY] = row[TIME_IDX]
out_writer.writerow(out_row)
Basically, when you hit a row with the 'MSG'
key, you know you're starting a new frame. Otherwise you write out the data. DictWriter
makes it easy to do this automatically without having to worry about order (the order is defined by the OUT_HEADER
)
Solution 2:
I've adapted the answer submitted by @aruisdante. This is because the original code did not record every instance of Frame IDs. I noticed this when doing a count of start_trial frame IDs and they fell short of the known total.
Here is the amended code:
FRAME_KEY = 'MSG'
FRAME_IDX = 0
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 1
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 2
# Data values
XCOR_KEY = 'X-CoOr'
XCOR_IDX = 1
YCOR_KEY = 'Y-CoOr'
YCOR_IDX = 2
TIME_KEY = 'Time'
TIME_IDX = 3
IN_DELIM = '\t'
OUT_DELIM= '\t'
OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]
currentframecount = 0
currentframecount1 = 0
out_row = dict()
with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv', 'w') as out_file:
in_reader = csv.reader(in_file, delimiter = IN_DELIM)
out_writer= csv.DictWriter(out_file, OUT_HEADER, delimiter = OUT_DELIM)
out_writer.writeheader()
current_frame = None
current_trial = None
for row in in_reader:
if row[FRAME_IDX] == FRAME_KEY:
# Means we're at the start of a new frame
current_frame = row[FRAME_ID_IDX]
current_trial = row[TRIAL_ID_IDX]
#out_row[TRIAL_ID_KEY] = current_trial
#out_row[FRAME_ID_KEY] = current_frame
#out_writer.writerow(out_row)
#if 'start_trial' in current_frame:
# currentframecount += 1
# print currentframecount
# Here ensures that 'start_trail' labels are recorded
if 'start_trial' in row[FRAME_ID_IDX]:
out_row[FRAME_ID_KEY] = row[FRAME_ID_IDX]
out_writer.writerow(out_row)
else:
# Means we're in a data row
#Here write everything except 'start_trial' to ensure no repetition of this particular label
if 'start_trial' not in current_frame:
out_row[FRAME_ID_KEY] = current_frame # think this is pulling value from last if statement on current_frame
out_row[TRIAL_ID_KEY] = current_trial
out_row[XCOR_KEY] = row[XCOR_IDX]
out_row[YCOR_KEY] = row[YCOR_IDX]
out_row[TIME_KEY] = row[TIME_IDX]
out_writer.writerow(out_row)
Post a Comment for "Shaping Data In Python"