Shaping Data In Python

August 20, 2022 Post a Comment

I'm currently working with data generated by eyelink. The csv (transformed from asc) is basically one large sequential list, i.e. columns are not created, so for example a row will

Solution 1:

If we assume that all lines have the same delemeter, this problem isn't as bad as it looks.

The key is realizing that all of the frame lines start with the key 'MSG':

import csv
# Header values
FRAME_KEY = 'MSG'
FRAME_IDX = 0
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 3
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 2
# Data values
XCOR_KEY     = 'X-CoOr'
XCOR_IDX     = 1
YCOR_KEY     = 'Y-CoOr'
YCOR_IDX     = 2
TIME_KEY     = 'Time'
TIME_IDX     = 3

IN_DELIM = '\t'
OUT_DELIM= '\t'

OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]

with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv') as out_file:
    in_reader = csv.reader(in_file, delimeter = IN_DELIM)
    out_writer= csv.DictWriter(out_file, OUT_HEADER, delimeter = OUT_DELIM)
    out_writer.writeheader()
    current_frame = None
    current_trial = None
    for row in in_reader:
        if row[FRAME_IDX] == FRAME_KEY:
            # Means we're at the start of a new frame
            current_frame = row[FRAME_ID_IDX]
            current_trial = row[TRIAL_ID_IDX]
        else:
            # Means we're in a data row
            out_row = dict()
            out_row[FRAME_ID_KEY] = current_frame
            out_row[TRIAL_ID_KEY] = current_trial
            out_row[XCOR_KEY]     = row[XCOR_IDX]
            out_row[YCOR_KEY]     = row[YCOR_IDX]
            out_row[TIME_KEY]     = row[TIME_IDX]
            out_writer.writerow(out_row)

Basically, when you hit a row with the 'MSG' key, you know you're starting a new frame. Otherwise you write out the data. DictWriter makes it easy to do this automatically without having to worry about order (the order is defined by the OUT_HEADER)

Solution 2:

I've adapted the answer submitted by @aruisdante. This is because the original code did not record every instance of Frame IDs. I noticed this when doing a count of start_trial frame IDs and they fell short of the known total.

Here is the amended code:

FRAME_KEY = 'MSG'
FRAME_IDX = 0
FRAME_ID_KEY = 'Frame ID'
FRAME_ID_IDX = 1
TRIAL_ID_KEY = 'Trial ID'
TRIAL_ID_IDX = 2
# Data values
XCOR_KEY     = 'X-CoOr'
XCOR_IDX     = 1
YCOR_KEY     = 'Y-CoOr'
YCOR_IDX     = 2
TIME_KEY     = 'Time'
TIME_IDX     = 3

IN_DELIM = '\t'
OUT_DELIM= '\t'

OUT_HEADER = [TRIAL_ID_KEY, FRAME_ID_KEY, XCOR_KEY, YCOR_KEY, TIME_KEY]

currentframecount = 0
currentframecount1 = 0
out_row = dict()


with open('P1E2E_Both_New_trial_data.csv', 'rb') as in_file, open('P1E2E_Long_Format.csv', 'w') as out_file:
in_reader = csv.reader(in_file, delimiter = IN_DELIM)
out_writer= csv.DictWriter(out_file, OUT_HEADER, delimiter = OUT_DELIM)
out_writer.writeheader()
current_frame = None
current_trial = None

for row in in_reader:
    if row[FRAME_IDX] == FRAME_KEY:
        # Means we're at the start of a new frame
        current_frame = row[FRAME_ID_IDX]
        current_trial = row[TRIAL_ID_IDX]

        #out_row[TRIAL_ID_KEY] = current_trial
        #out_row[FRAME_ID_KEY] = current_frame
        #out_writer.writerow(out_row)
        #if 'start_trial' in current_frame:
        #   currentframecount += 1
        #  print currentframecount
        # Here ensures that 'start_trail' labels are recorded
        if 'start_trial' in row[FRAME_ID_IDX]:
            out_row[FRAME_ID_KEY] = row[FRAME_ID_IDX]
            out_writer.writerow(out_row)


    else:
        # Means we're in a data row
        #Here write everything except 'start_trial' to ensure no repetition of this particular label
        if 'start_trial' not in current_frame:
            out_row[FRAME_ID_KEY] = current_frame # think this is pulling value from last if statement on current_frame

            out_row[TRIAL_ID_KEY] = current_trial
            out_row[XCOR_KEY]     = row[XCOR_IDX]
            out_row[YCOR_KEY]     = row[YCOR_IDX]
            out_row[TIME_KEY]     = row[TIME_IDX]
            out_writer.writerow(out_row)

Python Library

Shaping Data In Python

Solution 1:

Solution 2:

Post a Comment for "Shaping Data In Python"