0

I have seen this error here. But my problem is not that.

I am trying to extract some column of large dataframe:

dfx = df1[["THRSP", "SERHL2", "TARP", "ADH1C", "KRT4", 
                 "SORD", "SERHL", 'C18orf17','UHRF1', "CEBPD",
                 'OLR1', 'TBC1D2', 'AXUD1',"TSC22D3",
                 "ADH1A", "VIPR1", "LRFN2", "ANKRD22"]]

It throws an error as follows:

KeyError: "['C18orf17', 'UHRF1', 'OLR1', 'TBC1D2', 'AXUD1'] not in index"

After removing the above columns it started working. fine

dfx = df1[["THRSP", "SERHL2", "TARP", "ADH1C", "KRT4", 
                 "SORD", "SERHL", "TSC22D3",
                 "ADH1A", "VIPR1", "LRFN2", "ANKRD22"]]

But, I want ignore this error by not considering the column names if not present and consider which overlap. Any help appreciated..

1 Answers1

1

Use Index.intersection for select only columns with list if exist:

L = ["THRSP", "SERHL2", "TARP", "ADH1C", "KRT4", 
      "SORD", "SERHL", 'C18orf17','UHRF1', "CEBPD",
      'OLR1', 'TBC1D2', 'AXUD1',"TSC22D3",
      "ADH1A", "VIPR1", "LRFN2", "ANKRD22"]

dfx = df1[df1.columns.intersection(L, sort=False)]

Or filter them in Index.isin, then need DataFrame.loc with first : for select all rows and columns by mask:

dfx = df1.loc[:, df1.columns.isin(L)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I would choose either of the options in this answer but you could also use `reindex` though it isn't as good because it adds the missing columns. You could drop them with `dropna` but you might be dropping prior existing columns. `dfx.reindex(columns=L).dropna(axis=1, how='all')` – piRSquared Mar 12 '21 at 06:15
  • 1
    @piRSquared - yop, but `reindex` change order of columns, what should not be necessary – jezrael Mar 12 '21 at 06:16