1

I am having a csv file with columns sentence, length, category and 18 more columns. I am trying to filter out specific columns.

Assume I have x,y,a,b,c,d,e,f,g,h as last 10 columns. I am trying to filter out length, category and the last eight columns.

when I do it for the last 8 columns alone as,

col_req = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)

it is working perfectly. but when I try,

col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req) 

the output is,

('g', 'h', 'x', 'y', 'a', 'b', 'c', 'd', 'e', 'f')

I don't know where I am I going wrong.

Arjun Sankarlal
  • 2,655
  • 1
  • 9
  • 18
  • 1
    You should read the [docs](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) if you find the behaviour different to what you expect, it's reasonably clear what the params do in this case – EdChum Jan 31 '19 at 11:35

3 Answers3

2

You need to use the argument use_cols to do that

 col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
 data = pd.read_csv('data.csv', use_cols=col_req) 
Jeril
  • 7,858
  • 3
  • 52
  • 69
0

Check this answer. Might be col_names aren't correct

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)

0

I am trying to filter out length, category and the last eight columns.

If you want to filter by a combination of label-based and integer positional indices, you can read your column labels first, calculate your required labels, and then use the result when you read your data:

# use nrows=0 to only read in column labels
cols_all = pd.read_csv('data'.csv, nrows=0).columns
cols_req = ['length', 'category'] + cols_all[-8:].tolist()

# use use_cols parameter to filter by specified labels
df = pd.read_csv('data.csv', use_cols=cols_req)

This assumes, of course, your labels are unique.

jpp
  • 159,742
  • 34
  • 281
  • 339