0

I am reading dates in the formats "YYYY", "YYYY-MM" and "YYYY-MM-DD" from EXCEL to Pandas data frames and using pandas.Period to sort and compare them.

Most dates have years, months and days, so my script currently expects "days" as the default frequency.

When users search for dates via input, all three formats are converted as follows:

d1=pd.Period(exy[0], freq="D") # convert input to Period with day frequence

When matching these dates to data from the Excel table, I am trying to convert the divergent date information coming from the spreadsheet within a condition list, but this causes a TypeError ('Series' objects are mutable, thus they cannot be hashed'):

condlist = [f['pers_name'].str.contains(str(qn)) ^ f['pers_name'].str.contains('|'.join(qn)) ^ f['pers_name'].isin(qn), 
                                pd.Period(f['event_start'], freq="D").isin(exy), # matching one or several dates
                                f['inst_name'].isin(qi), 
                                f['pers_title'].isin(qt),
                                f['pers_function'].isin(qf),
                                f['rel_pers'].str.contains('|'.join(qr)) ^ f['rel_pers'].isin(qr)]

Is there a way to handle this within the condition list at all, or do I need to match the dates before applying the remaining conditions?

CURRENT SOLUTION WITH HANDLING DATES OUTSIDE CONDITION LIST:

Note: "@" is not part of the data and thus used to keep individual conditions from being triggered. If the user enters "@" in all fields but date, only the date search will be performed, the condition list will be ignored.


elif z=="3": #get dates before
    print("Searching for dates before", pd.Period(exy[0], freq="D"), "!")
    nf=pd.DataFrame(columns=column_names)
    for n in range(0, len(pers_f)):
        try: 
            if pd.Period(f['event_start'].iloc[n], freq="D") <= pd.Period(exy[0], freq="D"):
                nf=nf.append(f.iloc[n], ignore_index=False, sort=False)
            elif pd.Period(f['event_before-date'].iloc[n], freq="D") <= pd.Period(exy[0], freq="D"):
                nf=nf.append(f.iloc[[n]], ignore_index=False, sort=False)
            else:
                continue
        except ValueError:
            pass
    # define possible conditions and choices
    if "@" in qn and "@" in qi and "@" in qt and "@" in qf and "@" in qr:
        result_df1=nf
    else:
        condlist = [(nf['pers_name'].str.contains('|'.join(qn)) ^ nf['pers_name'].isin(qn)), 
                    nf['inst_name'].isin(qi), 
                    nf['pers_title'].isin(qt),
                    nf['pers_function'].isin(qf),
                    nf['rel_pers'].str.contains('|'.join(qr)) ^ nf['rel_pers'].isin(qr)]

        choicelist = [nf['pers_name'],
                    nf['inst_name'], 
                    nf['pers_title'],
                    nf['pers_function'],
                    nf['rel_pers']]

        output = np.select(condlist, choicelist)
        rows=np.where(output)
        new_array=nf.to_numpy()
        result_array=new_array[rows]
OnceUponATime
  • 450
  • 4
  • 12

0 Answers0