0

I am working on a project where I import SQLite database in data frame and then convert string dates to dates using

toCorrect = ["dt_ocurred", "init_action_ship_dt","target_dt", "final_action_ship_dt", "done_dt",
                      "update_dt", "ext_dt", "PSC_picdt", "PSC_info2ownr_dt", "PSC_info2chrtr_dt", "PSC_info2rtshp_dt",
                      "PSC_info2oilmaj_dt", "PSC_info2mmstpmgmt_dt", "PSC_sndr_offimport_dt"]#
        for someCol in toCorrect:
            df_data[someCol] = pd.to_datetime(df_data[someCol],errors='coerce').apply(lambda x: x.date())

after the conversion I am trying to get data for the current year for which i have the following query

curr_year = datetime.datetime.now().year
    
    df_currDRS = df_rawData.query(
        f"ship_name == '{shipName}' and (dt_ocurred.str.contains('{curr_year}') or done_dt.str.contains"
        f"('{curr_year}') or status.str.contains('OPEN'))", engine='python')

For some reason which i cannot figure out i get this error

 AttributeError: Can only use .str accessor with string values!
Traceback:
File "c:\users\sshukla\documents\drs2022\venv\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 443, in _run_script
    exec(code, module.__dict__)
File "C:\Users\sshukla\Documents\DRS2022\main.py", line 22, in <module>
    make_NewDRS()
File "C:\Users\sshukla\Documents\DRS2022\GetNewDRS.py", line 39, in make_NewDRS
    df_currDRS = df_rawData.query(
File "c:\users\sshukla\documents\drs2022\venv\lib\site-packages\pandas\core\frame.py",

I could figure out this much that if I do not convert the string dates to date object then the query runs fine. But I want to know how to construct the query if the dates have been converted from string to date.

Any help would be greatly appreciated.

  • Can you please share the types of dt_ocurred, done_dt, and status? Seems that at least one of them is not a string. Perhaps dates instead? Why are you using `str.contains` at all? – Markus Apr 26 '22 at 12:32
  • The frontend is excel where there are no rules for checking the correct format for dates. Some times dates have errors like instead of 2020 it is mistyped as 0202. the backend is SQLite where there is no defined datatype for date columns. can share the GitHub link for the entire project if you wish to explore further. – shashwat shukla Apr 26 '22 at 16:29
  • At least they are not string, so I would suggest not to use string operations on them. – Markus Apr 26 '22 at 19:44
  • @Markus thanks a lot for taking out the time to reply. Can you suggest some way to restructure the data frame query if string operations are not to be used. I am stuck on getting the year part from the date. when dates are not as string. – shashwat shukla Apr 27 '22 at 03:40

1 Answers1

0

If you want to filter a date column for a given year (e.g. 2022) you can check if it is between first and last day of the year:

df.query(f"20220101 <= my_date <= 20221231")

Applied to your code example this might look like:

curr_year = datetime.datetime.now().year
    
df_currDRS = df_rawData.query(
    f"ship_name == '{shipName}' and ("
    f"({curr_year}0101 <= dt_ocurred <= {curr_year}1231) or "
    f"({curr_year}0101 <= done_dt <= {curr_year}1231) or "
    f"status.str.contains('OPEN'))", engine='python')
Markus
  • 5,976
  • 5
  • 6
  • 21