2

I have a pandas dataframe with multiple column values that are the same. I'd like to select the series in the column where I pass it the column name, and then they have a unique identifier in one of the index/rows. My data frame looks like this:

                       TestPart     TestPart    OtherPart
attribute                                                                                  
Location                 Plant       Factory      Plant
Lead Time                25            56          30
Value1                   5             10          15
Value2                   A             B           C

There are many more parts, but these 2 serve as a good example. I'd like to select all the values in the column TestPart and Plant.

I'm able to isolate these to columns specifically with df.loc[:, "TestPart"], but I'd like to further isolate down by somehow selecting the column based on the Location value.

Is this possible? I've read through a lot of post trying iloc and other tricks, but can't seem to come up with the right recipe for this one yet.

EDIT:

Trying to add some more info for clarity. I have the dataframe shown above.

I would like to select the column with the Column = "TesPart" and the Location = "Plant".

The output I need is the entire series in that column like this:

attribute     
Location                         Plant
Lead Time                          25
Value1                             5
Value2                             A
  • For me it is not clear what you want. You still show us your input data. But your data frame output doesn't look well formated. It is not very clear. Do you have two columns with the name `TestPart`? And show use the desired output. How should the result look like? – buhtz Jun 09 '22 at 12:03
  • I think it is better if you could take a little more time to provide exactly what you need `input` and `output`. That would help everybody a lot – PTQuoc Jun 09 '22 at 12:05

2 Answers2

0

It sounds like what you’re looking for is df.loc[:, "TestPart"][df["Location"] == "Plant"]

Adel Hassan
  • 166
  • 1
  • 8
  • I was thinking along these lines, but every time I try this I'm getting a shortened error of `File "C:\User...`. Trying to get the rest of that error out now. – Daniel Mashburn Jun 09 '22 at 12:20
  • Additionally, everytime I try `df["Location"]` I get a `KeyError: 'Location'` – Daniel Mashburn Jun 09 '22 at 12:21
  • In that case, the other answer should probably work. You could also do it this way if you changed it to `df.loc[:, "TestPart"][df.loc["Location"] == "Plant"]`. Based on your edit, it seems that `Location` is a row index and not a column name, which would explain the KeyError – Adel Hassan Jun 09 '22 at 12:31
0

Here need compare row Location selected by DataFrame.loc and columns names, for filter columns use again DataFrame.loc with : for all rows and columns by condition:

df = df.loc[:, df.loc['Location'].eq('Plant') & (df.columns == 'TestPart')]
print (df)
          TestPart
Location     Plant
Lead Time       25
Value1           5
Value2           A

If create MultiIndex by original columns and first row of data solution is simplier - select by tuple:

df.columns = [df.columns, df.iloc[0]]
df1 = df.iloc[1:].reset_index(drop=True)
print (df1)
         TestPart         OtherPart
Location    Plant Factory     Plant
0              25      56        30
1               5      10        15
2               A       B         C

s = df1[('TestPart','Plant')]
print (s)
0    25
1     5
2     A
Name: (TestPart, Plant), dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This second method appears to work great, I'm getting the right column. However, it appears I've altered the series order so just trying to figure out how to get it back now. Thanks! – Daniel Mashburn Jun 09 '22 at 12:45