0

I have a huge csv file and I want to load only a small subset of the columns with fread(). In pandas read_csv(), I'd use the usecols argument for this and pass a list of desired columns.

How do I do this with a datatable? The documentation hints at the columns argument to fread(), but when I try this, it looks like that argument is for renaming the columns (similar as pandas header=0, names=[] arguments). The fread examples also give the same hint.

Toby
  • 2,174
  • 4
  • 22
  • 32
  • 2
    the examples in `fread` show how to do that, you can select with a `set` - so if you have columns a, b, and c, and you care for only a and b, you use : ``fread(data, columns={"a","b"})``, or with a list comprehension: ``fread(data, columns=lambda cols:[col.name in ("a","b") for col in cols])``. The fread examples that you refer provide these examples – sammywemmy Aug 10 '21 at 22:21
  • 1
    Also note that you can prune the file with the`cmd` option, loading even lesser lines – sammywemmy Aug 10 '21 at 22:26
  • Thanks! Now that you pointed my nose to it, I can see the example. I didn't expect a set to behave differently than a list, so I stopped looking after I had tried the list. And cmd is nice too! Allows me to put all grep and cut and head commands right inside by code for better documentation. – Toby Aug 11 '21 at 06:51

0 Answers0