Easy Way to See if Two Columns are One-to-One in Pandas

Question

Working with data in Python 3+ with pandas. It seems like there should be an easy way to check if two columns have a one-to-one relationship (regardless of column type), but I'm struggling to think of the best way to do this.

Example of expected output:

A    B     C
0    'a'   'apple'
1    'b'   'banana'
2    'c'   'apple'

A & B are one-to-one? TRUE

A & C are one-to-one? FALSE

B & C are one-to-one? FALSE

what is A&B are one to one? what is the logic behind this, could you explain little more — Mohamed Thasin ah, Jun 01 '18 at 12:28
Can you please tell, is that A and B are values or column names ? Seems confusing — user96564, Jun 01 '18 at 12:28
One-to-one is a commonly used term to talk about the relationship between two data objects in a database. They are one-to-one if there is a unique relationship between values. In the above case, A and B are 1:1 because 0 always corresponds to 'a', 1 always corresponds to 'b', etc. And yes, A, B and C are column names. https://en.wikipedia.org/wiki/One-to-one_(data_model) — user1895076, Jun 01 '18 at 12:39
For my work it would also be useful to know this in a non-reciprocal way. For example, it is okay that different A values map to a particular C value, but I don't want any single A value to map to multiple C values. I don't need a 1:1 correspondence, but the direction is important. — Liz, Apr 15 '21 at 14:53

zipa · Accepted Answer · 2018-06-01T12:44:41.443

Well, you can create your own function to check it:

def isOneToOne(df, col1, col2):
    first = df.groupby(col1)[col2].count().max()
    second = df.groupby(col2)[col1].count().max()
    return first + second == 2

isOneToOne(df, 'A', 'B')
#True
isOneToOne(df, 'A', 'C')
#False
isOneToOne(df, 'B', 'C')
#False

In case you data is more like this:

df = pd.DataFrame({'A': [0, 1, 2, 0],
                   'C': ["'apple'", "'banana'", "'apple'", "'apple'"],
                   'B': ["'a'", "'b'", "'c'", "'a'"]})
df
#   A    B         C
#0  0  'a'   'apple'
#1  1  'b'  'banana'
#2  2  'c'   'apple'
#3  0  'a'   'apple'

Then you can use:

def isOneToOne(df, col1, col2):
    first = df.drop_duplicates([col1, col2]).groupby(col1)[col2].count().max()
    second = df.drop_duplicates([col1, col2]).groupby(col2)[col1].count().max()
    return first + second == 2

Thanks! This did the trick. You even caught an edge case I didn't think of. The second solution (with duplicates removed) works on my data. Thanks again! — user1895076, Jun 01 '18 at 12:57
Rather than adding drop_duplicates, I changed `.count()` to `.nunique()`. — JDenman6, Oct 23 '20 at 18:35

score 8 · Answer 2 · answered Nov 30 '18 at 04:34

df.groupby(col1)[col2]\
  .apply(lambda x: x.nunique() == 1)\
  .all()

should work fine if you want a true or false answer.

A nice way to visualize the relationship between two columns with discrete / categorical values (in case you are using Jupyter notebook) is :

df.groupby([col1, col2])\
  .apply(lambda x : x.count())\
  .iloc[:,0]\
  .unstack()\
  .fillna(0)

This matrix will tell you the correspondence between the column values in the two columns.

In case of a one-to-one relationship there will be only one non-zero value per row in the matrix.

score 4 · Answer 3 · answered Oct 12 '20 at 02:37

4

df.groupby('A').B.nunique().max()==1 #Output: True

df.groupby('B').C.nunique().max()==1 #Output: False

Within each value in [groupby column], count the number of unique values in [other column], then check that the maximum for all such counts is one

answered Oct 12 '20 at 02:37

Trevor B

41
2

this is a much nicer answer – Ollie Perkins Dec 12 '20 at 15:20

YaOzI · Answer 4 · 2019-07-03T13:15:32.567

Here is my solution (only two or three lines of codes) to check for any number of columns to see whether they are one to one match (duplicated matches are allowed, see the example bellow).

cols = ['A', 'B'] # or any number of columns ['A', 'B', 'C']
res = df.groupby(cols).count()
uniqueness = [res.index.get_level_values(i).is_unique 
              for i in range(res.index.nlevels)]
all(uniqueness)

Let's make it a function and add some docs:

def is_one_to_one(df, cols):
    """Check whether any number of columns are one-to-one match.

    df: a pandas.DataFrame
    cols: must be a list of columns names

    Duplicated matches are allowed:
        a - 1
        b - 2
        b - 2
        c - 3
    (This two cols will return True)
    """
    if len(cols) == 1:
        return True
        # You can define you own rules for 1 column check, Or forbid it

    # MAIN THINGs: for 2 or more columns check!
    res = df.groupby(cols).count()
    # The count number info is actually bootless.
    # What maters here is the grouped *MultiIndex*
    # and its uniqueness in each level
    uniqueness = [res.index.get_level_values(i).is_unique
                  for i in range(res.index.nlevels)]
    return all(uniqueness)

By using this function, you can do the one-to-one match check:

df = pd.DataFrame({'A': [0, 1, 2, 0],
                   'B': ["'a'", "'b'", "'c'", "'a'"],
                   'C': ["'apple'", "'banana'", "'apple'", "'apple'"],})

is_one_to_one(df, ['A', 'B'])
is_one_to_one(df, ['A', 'C'])
is_one_to_one(df, ['A', 'B', 'C'])
# Outputs:
# True
# False
# False

score 0 · Answer 5 · answered Jun 01 '18 at 12:51

one way to solve this ,

df['A to B']=df.groupby('B')['A'].transform(lambda x:x.nunique()==1)
df['A to C']=df.groupby('C')['A'].transform(lambda x:x.nunique()==1)
df['B to C']=df.groupby('C')['B'].transform(lambda x:x.nunique()==1)

Output:

   A  B       C  A to B  A to C  B to C
0  0  a   apple    True   False   False
1  1  b  banana    True    True    True
2  2  c   apple    True   False   False

To check column by column:

print (df['A to B']==True).all()
print (df['A to C']==True).all()
print (df['B to C']==True).all()

True
False
False

@user1895076 - did you try this one? – Mohamed Thasin ah Jun 01 '18 at 13:07 — Mohamed Thasin ah, Jun 01 '18 at 13:07

Easy Way to See if Two Columns are One-to-One in Pandas

5 Answers5