I have this file for work (and 7000 others of the same format) that is very messy and not tidy in any way. I've been reading about tidying data using Pandas but feel I'm spinning my wheels at this point...
Here is the raw data viewed in Excel:
Here is some example text from the CSV:
Section 6. Reserve Summary
Ten Minute Reserve Requirement:, 1801
Ten Minute Reserve Estimate:, 1801
Thirty Minute Reserve Requirement:, 626
Thirty Minute Reserve Estimate:, 1926
Expected Actions of OP 4:, 0
Additional Capacity Available from OP 4 Actions:, 0
Section 7. Interchange Summary
Description, Import Limit MW, Export Limit MW, Scheduled, Contract
Highgate, -225, 0, -225
NB, -550, 200, -432
NYISO AC, -1400, 1200, 0
NYISO CSC, -346, 330, 330
NYISO NNC, -200, 200, 194
Phase 2 -2000 1200 -1501
Section 8. Weather Forecast Summary for the Peak Hour
City, Conditions, Wind, High Temperature (F)
Boston, Partly Cloudy, NE-10, 66
Hartford, Mostly Clear, N-12, 77
You can see column A is useless so I can remove. Column B mostly has variable names but also has Section names (rows 7, 9, 11...). Sometime column B has the value, but most of the time the value is listed in Column C-- also sometimes listed in Column D. Lines 44- 54 have some extra formatting going on where there are is a table of variable names and values...
Anyway, I absolutely do not have the skills to turn this into a tidy dataframe and will need to throw this to someone else. However, I'm hoping anyone can give advice on what to do. Is this even called 'data cleaning' or 'data structuring'?
I dropped Col A, then transposed the data, but that is far from setting this dataframe up correctly. What are other techniques to move data into the tidy structure needed?
Any resources shared would be great! I searched for too long on 'tidy data', 'data cleaning', 'data structuring' but all were too simplistic compared to this application.