0

i am trying to convert my two date columns in my dataframe. however some of the date is converted using "%d/%m/%Y" and a few of the data is getting converted using "%m/%d/%Y" . the issue is happening only from 01-May-2023 to 12-May-2023. from 13 may it is again reverting to using "%d/%m/%Y".

I am using the following to convert my data:

columns_to_convert_to_date= ['Date','Value Dt']
regex_pattern = r'\d{2}/\d{2}/\d{4}'
for column in columns_to_convert_to_date:
        full_bank_df_hdfc[column] = full_bank_df_hdfc[column].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y', errors='coerce') if re.match(regex_pattern, str(x)) else pd.to_datetime(x, errors='coerce'))

screenshot data before transformation: enter image description here

data after transformation: enter image description here

i have tried to force it to use the format to no avail. when using only

pd.to_datetime(full_bank_df_hdfc['Date'],format='%d/%m/%y', errors='coerce')

am i getting the desired output in the column , however that is resulting in my already existing datetime format roes to be NaT

1043 2023-04-30 1044 2023-05-01 1045 2023-05-01 1046 2023-05-01 1047 2023-05-02 1048 2023-05-02 1049 2023-05-03 1050 2023-05-03 1051 2023-05-03 1052 2023-05-04 1053 2023-05-04 1054 2023-05-06 1055 2023-05-06 1056 2023-05-07 1057 2023-05-08

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    I answered the same question yesterday: https://stackoverflow.com/q/76470553/15239951 – Corralien Jun 15 '23 at 21:15
  • @Corralien Thanks for answer. on a side note i have both datetime and "%d/%m/%Y" in my dataframe. how to handle that so i dont get error using your steps 0 2022-04-01 00:00:00 1 2022-04-01 00:00:00 2 2022-04-02 00:00:00 3 2022-04-02 00:00:00 4 2022-04-02 00:00:00 ... 1084 24/05/23 1085 24/05/23 1086 24/05/23 1087 24/05/23 1088 24/05/23 – Amirul Islam Jun 15 '23 at 21:57
  • Do you mean you have 3 different formats in your dataframe: `'%Y-%m-%d %H:%M:%S'`, `'%d/%m/%y'` and `'%m/%d/%y'`? – Corralien Jun 15 '23 at 22:06
  • i only have two `'%Y-%m-%d %H:%M:%S'` and `'%d/%m/%Y'` – Amirul Islam Jun 15 '23 at 22:17
  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Jun 16 '23 at 10:10

1 Answers1

0

You try to match the year part on 4 digits but your screenshot show only 2 digits for year?

Try:

columns_to_convert_to_date= ['Date','Value Dt']
regex_pattern = r'(\d{2})/(\d{2})/(\d{2})'
for column in columns_to_convert_to_date:
    dt = df[column].str.replace(regex_pattern, r'20\3-\2-\1 00:00:00', regex=True)
    df[column] = pd.to_datetime(dt)

Output:

# Before
>>> df
                     Date             Value Dt
0     2022-04-01 00:00:00  2022-04-01 00:00:00
1     2022-04-01 00:00:00  2022-04-01 00:00:00
2     2022-04-02 00:00:00  2022-04-02 00:00:00
3     2022-04-02 00:00:00  2022-04-02 00:00:00
4     2022-04-02 00:00:00  2022-04-02 00:00:00
1084             24/05/23             24/05/23
1085             24/05/23             24/05/23

# After
>>> df
           Date   Value Dt
0    2022-04-01 2022-04-01
1    2022-04-01 2022-04-01
2    2022-04-02 2022-04-02
3    2022-04-02 2022-04-02
4    2022-04-02 2022-04-02
1084 2023-05-24 2023-05-24
1085 2023-05-24 2023-05-24
Corralien
  • 109,409
  • 8
  • 28
  • 52