0

I have a dataframe in python:

    pID     sID     time 
0   2133    152414  2018-06-16
1   1721    152912  2018-06-17
2   2264    152912  2018-06-18

I want to create a new table with sID as the key and list of pID:

        pID time
152414 2133 2018-06-16
152912 1721 2018-06-17
       2264 2018-06-18

What is the best way to do it without iterating over all the dataframe? I tried:

df.pivot(index='sID', columns=['pID', 'time'])

But got:

ValueError: all arrays must be same length

For these table of 3 rows Thanks!

oren_isp
  • 729
  • 1
  • 7
  • 22
  • @mxmt it doesn't help:: df.set_index('sID') returns a dataframe with 3 rows : meaning there are 2 rows with the index 152912 , and I get a KeyError. I need to have a dataframe with only two rows – oren_isp Aug 05 '18 at 12:10

1 Answers1

0

Try this:

import io
import pandas as pd

f = io.StringIO('''
2133    152414  2018-06-16
1721    152912  2018-06-17
2264    152912  2018-06-18''')

df = pd.read_csv(f, sep='\s+', header=None, names=['pID', 'sID', 'date'])
df.set_index(['sID', 'pID'])

Results

  • I get this table but now what i do `pIDs = df[152912]` I get `KeyError: 152912` instead of a table of 2 rows – oren_isp Aug 05 '18 at 12:57
  • @oren_ISP You should read Pandas manual as you have no understanding of indexing at all. Use `df.loc[152912]` – Maksim Terpilowski Aug 05 '18 at 14:45
  • to my understanding, the difference between `df[i]` and `df.loc[i]` is that the latter support editing. Isn't this the case? – oren_isp Aug 07 '18 at 05:47