0

I have a data which is : https://github.com/mayuripandey/Data-Analysis/blob/main/Topic.csv,

For instance:

      Topic1_assignment                              |             Topic2_assignment
0   Int64Index([ 0, 1, 3, 7, 8, 11], dtype='int64    |  Int64Index([ 0, 4, 5, 9, 11, 14], dtype='int64)
1   NaN                                              | Int64Index([ 0, 2, 5, 7, 10, 14], dtype='int64)
2   Int64Index([ 0, 1, 2, 210, 219, 221], dtype='int64') |. Int64Index([ 256, 257, 258, 259, 260, 261], dtype='int64)

where I am trying to find the intersection between the two lists containing NaN values.

The code I used is :

df9['c'] = [len(set(a).intersection(b)) if all(pd.notna([a, b])) else 0
                for a, b in zip(df9.Topic1_assignment, df9.Topic2_assignment)],

But it gives an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-af99ddc88358> in <module>
----> 1 df9['c'] = [len(set(a).intersection(b)) if all(pd.notna([a, b])) else 0
      2                 for a, b in zip(df9.Topic1_assignment, df9.Topic2_assignment)]

<ipython-input-65-af99ddc88358> in <listcomp>(.0)
----> 1 df9['c'] = [len(set(a).intersection(b)) if all(pd.notna([a, b])) else 0
      2                 for a, b in zip(df9.Topic1_assignment, df9.Topic2_assignment)]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

What is the possible reason for this?

Noob Coder
  • 202
  • 3
  • 16

1 Answers1

0

pd.notna([a, b]) returns an array of true/false values. all will iterate that array, but each iterated value is a full row, not a single value. Its the truth value of this iterated array that is the problem. Lets say foo = pd.notna([a, b]). foo[0] is an array and bool(foo[0]) is ambiguous. You solve the problem with

pd.notna([a, b]).all()
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Could you please tell how i use that with my code to get the required output, as I need only the total intersection value for each column? I am still getting an error. Thanks in advance @tdelaney – Noob Coder Jan 20 '23 at 23:28