-2

I'm not really sure how to explain this one but I have a spreadsheet that i pulled into python with no headers, and the data pertaining to each unique id is grouped into rows of three so the columns at this point don't really mean anything when read until it's 'flattened' out. The formatting is consistent but I'm having a hard time figuring out what I need to search in order to get the output I'm looking for.

If this was in Excel I would basically be copying rows two and three to the right of row 1, then delete rows 2 and 3 and repeat until i get to the bottom of the sheet.

This is how i would receive the data

id1 criteria1   criteria2   criteria3
criteria4   criteria5   criteria6   criteria7
criteria8   criteria9   criteria10  criteria11
id2 criteria1   criteria2   criteria3
criteria4   criteria5   criteria6   criteria7
criteria8   criteria9   criteria10  criteria11
id3 criteria1   criteria2   criteria3
criteria4   criteria5   criteria6   criteria7
criteria8   criteria9   criteria10  criteria11

this is what i'm trying to achieve

id1 criteria1   criteria2   criteria3   criteria4   criteria5   criteria6   criteria7   criteria8   criteria9   criteria10  criteria11
id2 criteria1   criteria2   criteria3   criteria4   criteria5   criteria6   criteria7   criteria8   criteria9   criteria10  criteria11
id3 criteria1   criteria2   criteria3   criteria4   criteria5   criteria6   criteria7   criteria8   criteria9   criteria10  criteria11
raleta2
  • 13
  • 2
  • Is this coming from a CSV file? – S3DEV Aug 23 '20 at 21:43
  • It's coming from a heavily formatted excel file that i would receive daily. I've stripped what I could and now I'm stuck at this part. – raleta2 Aug 23 '20 at 21:45
  • 2
    How come you have twice the amount of criteria 5 to 7 than Id's? – Leander Aug 23 '20 at 21:45
  • 5
    Please edit the code of your attempt to solve this problem into your question. If you haven’t written any code then note that StackOverflow isn’t a code-writing service and if you don’t edit code into your question this may be downvoted/voted to close as off-topic. – DisappointedByUnaccountableMod Aug 23 '20 at 21:48
  • @Leander boxes labeled criteria1-criteria8 pertain to id1. the data's coming to me 'stacked' instead of having critera1-criteria8 laid out as their own individual columns. – raleta2 Aug 23 '20 at 21:48
  • 2
    @Leander has a valid point. In your source data, criteria5-criteria7 appear duplicated. Whereas in the output, they appear once. In other words, you have 12 input columns and 9 output. – S3DEV Aug 23 '20 at 21:51
  • @raleta2 I see you get your data stacked, however if I read from left to right top to bottom I see between id1 and id2 criteria 1 to 4 and criteria 8 occurring one time while is see criteria 5 to 7 occurring twice. This would mean that if you copy row 2 and 3 next to row 1 you get [id, 1,2,3,4,5,6,7,5,6,7,8] while you seem to want [id,1,2,3,4,5,6,7,8] – Leander Aug 23 '20 at 21:52
  • 1
    What data type are you reading the spreadsheet into? List of lists, list of dicts, dataframe, ...? What have you already tried to solve the problem? Please include these details in the question, something like a [mre]. BTW welcome to SO! Check out the [tour], and [ask] if you want other tips. – wjandrea Aug 23 '20 at 21:55

1 Answers1

0

See if this gets you going on the right track. Caveat: It's difficult to provide a full solution given the ambiguity of the source data (as mentioned in the comments) ... but give this a go.

Using your sample input, I converted it to a simple CSV file.

Input CSV:

id1,criteria1,criteria2,criteria3
criteria4,criteria5,criteria6,criteria7
criteria8,criteria9,criteria10,criteria11
id2,criteria1,criteria2,criteria3
criteria4,criteria5,criteria6,criteria7
criteria8,criteria9,criteria10,criteria11
id3,criteria1,criteria2,criteria3
criteria4,criteria5,criteria6,criteria7
criteria8,criteria9,criteria10,criteria11

Conversion:

Note: The reshape values of [3, 12] (or [rows, columns]) will need to be modified based on the shape of your input data.

import pandas as pd

array = pd.read_csv('folded.csv', header=None).to_numpy().reshape([3, 12])
pd.DataFrame(array).to_csv('unfolded.csv', index=False, header=False)

Output CSV:

id1,criteria1,criteria2,criteria3,criteria4,criteria5,criteria6,criteria7,criteria8,criteria9,criteria10,criteria11
id2,criteria1,criteria2,criteria3,criteria4,criteria5,criteria6,criteria7,criteria8,criteria9,criteria10,criteria11
id3,criteria1,criteria2,criteria3,criteria4,criteria5,criteria6,criteria7,criteria8,criteria9,criteria10,criteria11
S3DEV
  • 8,768
  • 3
  • 31
  • 42