Avoid collision in importing data in R

Question

I faced an error trying to import a CSV into R which had multiple duplicate columns, is there a way I can ignore those columns? It's easy to do that in case of small files and small number of columns but mine is a big one ~3k columns and 10M rows.

What code were you running exactly and what was the exact error you are getting? I wouldn't think there's a problem reading a file even if it does have duplicate columns. — MrFlick, Mar 20 '17 at 20:20
readr::read_csv and data.table::fread are both big improvements over read.csv and read.table in base. Perhaps try them if the base functions are giving you sorrow. — russellpierce, Mar 20 '17 at 21:41

score 2 · Answer 1 · answered Apr 01 '17 at 16:06

2

Alternatively, set the check.names arg to FALSE.

answered Apr 01 '17 at 16:06

user2502338

889
1
8
10

score 1 · Answer 2 · answered Apr 01 '17 at 15:46

1

Read in the first row, I.e. the column headers, with readLines. strsplit to parse to vector. Rename duplicated elements. Then you can call read.csv with a col.names arg.

answered Apr 01 '17 at 15:46

russellpierce

4,583
2
32
44

Avoid collision in importing data in R

2 Answers2