0

I am completely new to Python (I started last week!), so while I looked at similar questions, I have difficulty understanding what's going on and even more difficulty adapting them to my situation.

I have a csv file where rows are dates and columns are different regions (see image 1). I would like to create a file that has 3 columns: Date, Region, and Indicator where for each date and region name the third column would have the correct indicator (see image 2).

I tried turning wide into long data, but I could not quite get it to work, as I said, I am completely new to Python. My second approach was to split it up by columns and then merge it again. I'd be grateful for any suggestions.

enter image description here enter image description here

Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • are you using pure python to do this or are you using a library such as `numpy` or `pandas` also as a new user user welcome to SO, please have a read of [ask] and [mcve] pictures are generally not accepted here as they are not reproducible. Adding your own code also shows the users taking time to help you that you've put in some effort. – Umar.H Apr 07 '21 at 14:50
  • 1
    I recomend you to try with the pandas package. It has many options to deal with your problem, for example creating a multi-index or pivot tables. – Ivan Calderon Apr 07 '21 at 14:52
  • Thank you. I have pandas, just trying to figure out how to use it – Igor Zverev Apr 07 '21 at 15:38

2 Answers2

0

first, you're region column is currently 'one hot encoded'. What you are trying to do is to "reverse" one hot encode your region column. Maybe check if this link answers your question: Reversing 'one-hot' encoding in Pandas.

AlonB
  • 21
  • 2
0

This gives your solution using stack() in pandas:

import pandas as pd

# In your case, use pd.read_csv instead of this:
frame = pd.DataFrame({
    'Date': ['3/24/2020', '3/25/2020', '3/26/2020', '3/27/2020'],
    'Algoma': [None,0,0,0],
    'Brant': [None,1,0,0],
    'Chatham': [None,0,0,0],
})
solution = frame.set_index('Date').stack().reset_index(name='Indicator').rename(columns={'level_1':'Region'})
solution.to_csv('solution.csv')

This is the inverse of doing a pivot, as explained here: Doing the opposite of pivot in pandas Python. As you can see there, you could also consider using the melt function as an alternative.

sougonde
  • 3,438
  • 3
  • 26
  • 35
  • Thank you very much! This does exactly what I need. I really appreciate your help! – Igor Zverev Apr 08 '21 at 15:52
  • @IgorZverev Great! If this answers your question please press the "check" mark next to my question to mark the it as the answer. :) This helps people see that the question has been answered. – sougonde Apr 08 '21 at 15:57