0

I have a dataframe with date of births as

pl.DataFrame({'idx':[1,2,3,4,5,6],
              'date_of_birth':['03/06/1990','3/06/1990','11/12/2000','01/02/2021','1/02/2021','3/06/1990']})

enter image description here

Here I would like to compare date of birth(Format: Month/Day/Year) of each row and tag yes if the months are equal such as 03 - 3, 01 -1.

There are dates as 03/06/1900, 3/06/1990, they are generally same. but here they are treated as different. How to figure out these kind scenarios ?

The expected output as:

enter image description here

myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30

3 Answers3

5

Unlike other answers that use pandas to compute the result, let me do your CPU a favor and give a polars only solution.

df = pl.DataFrame({"idx":[1,2,3,4,5,6],
              "date_of_birth":["03/06/1990","3/06/1990","11/12/2000","01/02/2021","1/02/2021","3/06/1990"]})


(df.with_columns([
    pl.when(
        pl.col("date_of_birth").str.strptime(pl.Date, "%m/%d/%Y").dt.month().is_in([1, 3])
    ).then("Yes")
    .otherwise("No").alias("match")
]))
shape: (6, 3)
┌─────┬───────────────┬─────────┐
│ idx ┆ date_of_birth ┆ match   │
│ --- ┆ ---           ┆ ---     │
│ i64 ┆ str           ┆ str     │
╞═════╪═══════════════╪═════════╡
│ 1   ┆ 03/06/1990    ┆ Yes     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 3/06/1990     ┆ Yes     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ 11/12/2000    ┆ No      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ 01/02/2021    ┆ Yes     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ 1/02/2021     ┆ Yes     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 6   ┆ 3/06/1990     ┆ Yes     │
└─────┴───────────────┴─────────┘

ritchie46
  • 10,405
  • 1
  • 24
  • 43
1

Let's convert date_of_birth to datetime then check if the month is satisfied

import numpy as np
import pandas as pd
import polars as pl

s = np.where(pd.to_datetime(df['date_of_birth']).month.isin([3, 1]), 'Yes', 'No')

out = df.with_columns([
    pl.lit(s).alias('flag')
])
print(out)

┌─────┬───────────────┬──────┐
│ idx ┆ date_of_birth ┆ flag │
│ --- ┆ ---           ┆ ---  │
│ i64 ┆ str           ┆ str  │
╞═════╪═══════════════╪══════╡
│ 1   ┆ 03/06/1990    ┆ Yes  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2   ┆ 3/06/1990     ┆ Yes  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3   ┆ 11/12/2000    ┆ No   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 4   ┆ 01/02/2021    ┆ Yes  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 5   ┆ 1/02/2021     ┆ Yes  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 6   ┆ 3/06/1990     ┆ Yes  │
└─────┴───────────────┴──────┘
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
0

You can try out with transform function to filter in specific column of df to get suitable output

df['match']=df['date_of_birth'].transform(lambda x : 'yes' if x.split("/")[0] in ['03','3','1','01'] else 'no')

Output:

idx date_of_birth   match
0   1   03/06/1990  yes
1   2   3/06/1990   yes
2   3   11/12/2000  no
3   4   01/02/2021  yes
4   5   1/02/2021   yes
5   6   3/06/1990   yes
Bhavya Parikh
  • 3,304
  • 2
  • 9
  • 19