0

I have multiple tar files that in each there are multiple csv files. I want to open all csv files as a vaex dataframe and then make a new column with lambda function but I got bellow error. How can I do it?

def get_years_files(num_years):
    files = os.listdir("myfiles")
    years = [int(re.findall('\d+',file)[0]) for file in files]
    return years

def process(yearfiles):  
    lst = []
    for yearfile in yearfiles: 
        tar = tarfile.open("myfiles/" + yearfile, "r")
        for member in tar:
            if ".csv" in member.name:
                vx = vaex.from_csv(io.BytesIO(tar.extractfile(member).read()))
                vx['WMO'] = vx['STATION'].apply(lambda x: str(x)[:-5])
                lst.append(vx)

        tar.close()
    df_vx = vaex.concat(lst)
    return df_vx

Error:

ValueError: Unequal function lambda_function in concatenated dataframes are not supported yet
HMadadi
  • 391
  • 5
  • 22

1 Answers1

0

The error message is pretty clear. You can't use lambdas with concatenated dataframes. But you can use a standard function:

def f(x):
    return str(x)[:-5]

...

vx['STATION'].apply(f)
alec_djinn
  • 10,104
  • 8
  • 46
  • 71
  • thanks, bur I got an error. `error evaluating: WMO at rows 0-5 dataframe.py:4101 , Traceback (most recent call last): .....` – HMadadi Dec 08 '22 at 16:41
  • That is another error. You need to share the full stack and a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) otherwise I can't reproduce the error, which is the first step to fix it. – alec_djinn Dec 09 '22 at 09:28