0

How can I return a list of tuples where each tuple contains the values of the row that has a nan value from a Pandas DF?

I saw suggestion to use this:

[odfscsv_df.iloc[i, j] for i, j in zip(*np.where(pd.isnull(odfscsv_df)))]

But this only shows you the nan values. I Want to return the entire row that has this nan value

edo101
  • 629
  • 6
  • 17

2 Answers2

2

Try this

list(map(tuple, odfscsv_df[odfscsv_df.isna().any(1)].values))

Or using list comprehension with the same concept

[tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values]
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • That worked, but please can you explainn this? I am new to python – edo101 Jun 30 '20 at 22:14
  • I have never used map before with the list thing. Actually that whole function struccture is new to me. Is the map version faster than the easier to read tuple version you posted @Andy L – edo101 Jun 30 '20 at 22:17
  • @edo101: `isna` turns `df` to boolean values with `True` on `NaN`. `any(1)` checks any existing `True` returns series of `True/False` on rows. After that, slice on `df`` using this series. `values` returns 2d-array. Finally, just use `map` or list comprehension to iterate each row of this 2d-array and convert it to `tuple` – Andy L. Jun 30 '20 at 22:20
  • @edo101: it is just the matter of reference. In some cases, list comprehension maginally faster than `map`. I personally using them interchangeably – Andy L. Jun 30 '20 at 22:22
  • So they'd perform abotu the same, I am still not quite understand this first suggestion for now so I am inclined to use list comprehension which I am used to. That being said, what if I want to check for another condition in that list comprehension. I also want to return entire row if a value in a particular column doesn't contain a regex match. How would I do this? – edo101 Jun 30 '20 at 22:26
  • @edo101: just use list comprehension. If you have another condition, check that condition to return a boolean mask of `True/False` and put it inside slicing `odfscsv_df[..]` with `&` or `|` condition depend on your requirement. – Andy L. Jun 30 '20 at 22:29
  • What do youu mean by "slicing"? Im sorry still trying to learn – edo101 Jun 30 '20 at 22:30
  • when you selecting rows or columns through `[...]` or `.loc[...]`, it is called `slicing`. Read this for more info: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels – Andy L. Jun 30 '20 at 22:33
  • I just posted a new question to clarify what I now want to do with two different conditions: https://stackoverflow.com/questions/62667161/use-list-comprehension-to-create-a-list-of-tuples-for-two-different-conditionals I'd greatly appreaciate your input @Andy L. – edo101 Jun 30 '20 at 22:42
  • @edo101 would suggest accepting an answer for this first... although more complicated, your new question is essentially the same, just now with one more condiiton – Derek Eden Jun 30 '20 at 23:00
1

how bout something like this?

sample df:

     1    2
0  0.0  0.0
1  NaN  2.0
2  1.0  NaN

This command checks for nans, then does a row-wise any check to see if any of the columns in each row is nan, this will return a series which is True wherever a row has at least 1 nan, and False otherwise.

This series is then used to mask the original df, then the results are sent to a records dictionary and converted to a list.

df[df.isna().any(axis=1)].to_records(index=False).tolist()

output:

[(nan, 2.0), (1.0, nan)]
Derek Eden
  • 4,403
  • 3
  • 18
  • 31