-1

I faced an error trying to import a CSV into R which had multiple duplicate columns, is there a way I can ignore those columns? It's easy to do that in case of small files and small number of columns but mine is a big one ~3k columns and 10M rows.

Ayush
  • 479
  • 2
  • 9
  • 24
  • 3
    What code were you running exactly and what was the exact error you are getting? I wouldn't think there's a problem reading a file even if it does have duplicate columns. – MrFlick Mar 20 '17 at 20:20
  • 1
    readr::read_csv and data.table::fread are both big improvements over read.csv and read.table in base. Perhaps try them if the base functions are giving you sorrow. – russellpierce Mar 20 '17 at 21:41

2 Answers2

2

Alternatively, set the check.names arg to FALSE.

user2502338
  • 889
  • 1
  • 8
  • 10
1

Read in the first row, I.e. the column headers, with readLines. strsplit to parse to vector. Rename duplicated elements. Then you can call read.csv with a col.names arg.

russellpierce
  • 4,583
  • 2
  • 32
  • 44