1

I have a csv file that looks like

a,b,c,d
1,2,3,4
5,6,7,8

and I want to load it in as a Kedro CSVLocalDataSet, but I don't want to read the entire file. I only want a few columns (say a and b for example).

Is there any way for me to specify the list of columns to read/load?

Zain Patel
  • 983
  • 5
  • 14
  • 2
    When asking homework questions, show your best good faith attempt to solve it and tell what problems you are having to give us a better understanding of your intentions, what you might be doing wrong, and your goal. Please go through the [tour](https://stackoverflow.com/tour), the [help](https://stackoverflow.com/help), and the [How to Ask](https://stackoverflow.com/how-to-ask) sections to see how this site works and to help you improve your current and future questions. Please also have a look at [How do I ask and answer Homework questions?](https://meta.stackoverflow.com/questions/334822) – FailingCoder Nov 08 '19 at 12:45

1 Answers1

4

CSVLocalDataSet uses pandas.read_csv, which takes "usecols" parameter. It can easily be proxied by using load_args dataset parameter (all datasets support additional parameters passing via load_args and save_args):

my_cool_data:
  type: CSVLocalDataSet
  filepath: data/path.csv
  load_args: 
    usecols: ['a', 'b']

Also note the same parameters would work for any pandas-based dataset.