I am trying to create a DataFrame
by reading a csv file separated by '#####' 5 hashes
The code is:
import dask.dataframe as dd
df = dd.read_csv('D:\temp.csv',sep='#####',engine='python')
res = df.compute()
Error is:
dask.async.ValueError:
Dask dataframe inspected the first 1,000 rows of your csv file to guess the
data types of your columns. These first 1,000 rows led us to an incorrect
guess.
For example a column may have had integers in the first 1000
rows followed by a float or missing value in the 1,001-st row.
You will need to specify some dtype information explicitly using the
``dtype=`` keyword argument for the right column names and dtypes.
df = dd.read_csv(..., dtype={'my-column': float})
Pandas has given us the following error when trying to parse the file:
"The 'dtype' option is not supported with the 'python' engine"
Traceback
---------
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/dataframe/io.py", line 69, in _read_csv
raise ValueError(msg)
So how to get rid of that.
If i follow the error then i would have to give dtype for every column, but if I have a 100+ columns then that is of no use.
And if i am reading without separator,then everything goes fine but there is ##### everywhere. So after computing it to pandas DataFrame
,is there a way to get rid of that?
So help me in this.