0

I am new to pandas and I have a data frame that looks like this:

9861:1.0    9620:1.0    9865:1.0    30260:1.0   30026:1.0   10770:1.0   
10772:1.0   10771:0.5   10774:0.5   10773:0.0   9750:1.0    9755:1.0    
9632:1.0    30255:1.0   30012:1.0   30015:1.0   30251:1.0   11639:1.0   

Looks like a dictionary, but entries are not ordered in columns. The string before the colon is an id and the one after is a score. I need a function to retrieve specific values for all rows. The outcome should be a new data frame that:

1) keeps the index per row (which doesn't show in the snippet but it's in my original data frame).

2)Creates columns the titles of which are the ids that I specify and the data in the cell is the score (let's say the column 9865 should contain the scores that currently are after "9865:")

Your help would be really amazing. Thank you.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 2
    Could you post a code example of how your data is created, or be more precise on the format (also maybe use code formatting). Right now it seems that you have a dataframe with a dict with one entry in each column – gionni May 31 '17 at 16:24

1 Answers1

0
d1 = df.stack().str.split(':', expand=True)

s = pd.Series(
    d1.iloc[:, 1].values,
    [d1.index.get_level_values(0), d1.iloc[:, 0].values]
)

s

0  9861     1.0
   9620     1.0
   9865     1.0
   30260    1.0
   30026    1.0
   10770    1.0
1  10772    1.0
   10771    0.5
   10774    0.5
   10773    0.0
   9750     1.0
   9755     1.0
2  9632     1.0
   30255    1.0
   30012    1.0
   30015    1.0
   30251    1.0
   11639    1.0
dtype: object

You can reference your data as

s.loc[(0, 9865)]

1.0

You can unstack that result and reference as a dataframe

s.unstack().loc[0, 9865]

1.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Ok, getting close. But each entry in the data frame I posted has an index which I want to keep. In each row I have the same pairs, but they are not in order. So, 11639:1.0 in the third row in the first column, then in the 50th row and in the 45th column I have 11639:0.5 etc. What I want to do is create a new data frame that keeps the indexes, creates a unique column per first part of the pair for all entries (i.e. column 11639) and the cells would amount to the float after the colon (i.e. 1.0 in the third row and 0.5 in the 45th). – jondoff Jun 02 '17 at 14:18