Pandas reads wrong column

Question

I am having a csv file with columns sentence, length, category and 18 more columns. I am trying to filter out specific columns.

Assume I have x,y,a,b,c,d,e,f,g,h as last 10 columns. I am trying to filter out length, category and the last eight columns.

when I do it for the last 8 columns alone as,

col_req = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)

it is working perfectly. but when I try,

col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)

the output is,

('g', 'h', 'x', 'y', 'a', 'b', 'c', 'd', 'e', 'f')

I don't know where I am I going wrong.

You should read the [docs](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) if you find the behaviour different to what you expect, it's reasonably clear what the params do in this case — EdChum, Jan 31 '19 at 11:35

score 2 · Accepted Answer · answered Jan 31 '19 at 11:33

2

You need to use the argument use_cols to do that

 col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
 data = pd.read_csv('data.csv', use_cols=col_req)

answered Jan 31 '19 at 11:33

Jeril

7,858
3
52
69

score 0 · Answer 2 · answered Jan 31 '19 at 11:35

0

Check this answer. Might be col_names aren't correct

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)

answered Jan 31 '19 at 11:35

Venkatesh Garnepudi

316
1
10

score 0 · Answer 3 · answered Jan 31 '19 at 11:36

I am trying to filter out length, category and the last eight columns.

If you want to filter by a combination of label-based and integer positional indices, you can read your column labels first, calculate your required labels, and then use the result when you read your data:

# use nrows=0 to only read in column labels
cols_all = pd.read_csv('data'.csv, nrows=0).columns
cols_req = ['length', 'category'] + cols_all[-8:].tolist()

# use use_cols parameter to filter by specified labels
df = pd.read_csv('data.csv', use_cols=cols_req)

This assumes, of course, your labels are unique.

Thanks @jpp it works but the below listed answer are more precise :) — Arjun Sankarlal, Jan 31 '19 at 11:52

Pandas reads wrong column

3 Answers3