Extracting data from a large csv file:causes dtype warnings

Question

I work for a company and I recently switched from using spreadsheet package to python. Since, I am very new to python there are alot of things that I have difficulty grasping.Using python, I am trying to extract data from a large csv file(37791 rows and 316 columns.) Here is a piece of code I wrote:

Solution 1

import numpy as np
import pandas as pd
df=pd.read_csv=('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1)
data=df.loc[:,['Steps','Parameter']]

This command generates an error,i.e, it gives a DtypeWwarning:columns (0,1,2,3........81) have mixed types. Specify dtype option on import or set low memory= False

So, I found a workaround.

Solution 2

import pandas as pd
import numpy as np
df=pd.read_csv(('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1,error_bad_lines=False, index_col=False, dtype='unicode')
data=df.loc[:,['Steps','Parameter']]

Two questions:

i)I was able to get around the error, but now the columns that I want(Steps & Parameter)have been converted to objects(probably due to the dtype='unicode' command). How can I convert Steps column into an integer type and parameter into a float.

ii) Some people say that dtype warning isn't really an error. But, I found out that when I use Solution 1 and read the csv file. The Steps column contains some floats.The original csv file doesn't have any floats in Steps column. It looks as if, some floats have been placed by python itself!! Why does this happen?

(I am not able to upload the original csv file, because my company doesn't allow it!)

Try as suggested in the error message: add one more param `low memory= False` — Sergey Bushmanov, Apr 23 '17 at 14:07
Might be a duplicate question: http://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options — André C. Andersen, Apr 23 '17 at 15:10
Possible duplicate of [Pandas read\_csv low\_memory and dtype options](http://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options) — André C. Andersen, Apr 23 '17 at 15:12
Introducing the {low memory=False} doesn't solve the problem. — AP85', Apr 23 '17 at 15:13
If you are only going to use 2 columns out of 316, it is good practice to load only those 2 columns with the parameter `usecols = ['Steps','Parameter'] ` . Also, specifying the datatype of the columns using `dtype={'Steps': int, 'Parameter': float }` should address your data type problem. — VinceP, Apr 23 '17 at 16:59

Extracting data from a large csv file:causes dtype warnings

0 Answers0