Python Dask: Searching for a value in a column and get the value of a different column

Question

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})

Just imagine this dataframe now with pandas it is easy for me to find a column based on another column's value just like this:

df.loc[df["B"] == "three", "A"]

but with dask the output i receive if i use the same code doesn't really help me

df.loc[df["ActionGeo_Lat"] == "42#.5", "SQLDATE"]

after executing this line i receive the following output, which doesn't really help me:

Output after executing my code

The problem i'm having is that everytime i try to execute df.compute i receive

ValueError:ValueError: could not convert string to float: '42#.5'.

After cutting out some columns i found out that the error is caused somewhere in the column ActionGeo_Lat, now i would like to manually edit the csv file to resolve the error, but cannot find out on which date the error occurs.

Thanks for the help in advance!

score 2 · Answer 1 · answered Jan 21 '21 at 15:27

Looks like your underlying problem is with the loading/typing of your data. Here's an example that shows that the same pandas syntax works without problems on the dask dataframe:

import pandas as pd
import numpy as np
import dask.dataframe as dd

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
ddf = dd.from_pandas(df, npartitions=2)

print(df.loc[df['B'] == "three", "A"])
print(ddf.loc[ddf['B'] == "three", "A"].compute())

dask.dataframe is not a good way to debug csv files, so best way for you is to use shell/bash utilities for editing files, e.g.

grep -ai "42#.5" your_file_name_here.csv

OP: "Thanks grep worked for me, I was able to grep over all my csv files and found the wrong string" — Kermit, Jan 23 '21 at 16:01
Great that it worked for you, @Paul Bruder! (thanks for the commenting: @HashRocketSyntax) — SultanOrazbayev, Jan 23 '21 at 16:25

Python Dask: Searching for a value in a column and get the value of a different column

1 Answers1