0

I am trying to read a CSV file into R which makes use of two different separators: the "," and the ";". Below is an short example of the CSV format:

"car_brand; car_model","total"
"Toyota; 9289","29781"
"Seat; 20981","1610"
"Volkswagen; 11140","904"
"Suzuki; 11640","658"
"Renault; 13075","647"
"Ford; 15855","553"

The CSV file should contain 3 columns, car_brand, car_model, and total. However, car_brand and car_model are separated by a ";" rather than a ",". Any guidance on how to import such a file would be really appreciated.

  • Have a look at: [How to read data with different separators?](https://stackoverflow.com/q/23568981/10488504) – GKi Aug 30 '21 at 12:29

4 Answers4

3

A double-tap:

x1 <- read.csv("quux.csv", check.names = FALSE)
x2 <- read.csv2(text = x1[[1]], header = FALSE)
names(x2) <- unlist(read.csv2(text = names(x1)[1], header = FALSE))
cbind(x2, x1[,-1,drop=FALSE])
#    car_brand  car_model total
# 1     Toyota       9289 29781
# 2       Seat      20981  1610
# 3 Volkswagen      11140   904
# 4     Suzuki      11640   658
# 5    Renault      13075   647
# 6       Ford      15855   553

The use of check.names=FALSE is required because otherwise names(x1)[1] looks like "car_brand..car_model". While it can be parsed like this, I thought it better to parse the original text.

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

a tidyverse solution;

library(tidyverse)

read.csv('file.csv',header = T) %>%
separate(col='car_brand..car_model',into = c('car_brand','car_model'),sep = ';') %>%
 mutate(car_model=as.numeric(car_model)) 

output;

car_brand  car_model total
  <chr>          <dbl> <int>
1 Toyota          9289 29781
2 Seat           20981  1610
3 Volkswagen     11140   904
4 Suzuki         11640   658
5 Renault        13075   647
6 Ford           15855   553
Samet Sökel
  • 2,515
  • 6
  • 21
1

One option would be to use a combination of fread and gsub:

library(data.table)
fread(gsub(";", "", '"car_brand; car_model","total"
"Toyota; 9289","29781"
"Seat; 20981","1610"
"Volkswagen; 11140","904"
"Suzuki; 11640","658"
"Renault; 13075","647"
"Ford; 15855","553"
'))
   car_brand car_model total
1:         Toyota 9289 29781
2:          Seat 20981  1610
3:    Volkswagen 11140   904
4:        Suzuki 11640   658
5:       Renault 13075   647
6:          Ford 15855   553
user438383
  • 5,716
  • 8
  • 28
  • 43
  • (Frankly, this will work with `read.csv` as well, not just `fread`.) – r2evans Aug 30 '21 at 12:34
  • @r2evans Yup. I am basically in the habit of using ``fread`` for everything and never using the base-r versions since it is **much** faster and seems to also be a lot smarter with how it handles different delimiters and the like. AFAIK there's not really any reasont to not use ``fread``. – user438383 Aug 30 '21 at 12:35
  • `fread` does not deal with escaped double-quotes, https://github.com/Rdatatable/data.table/issues/5088. Compare `read.csv(text='col,col2\n"hi ""my friend""",2\n')` with `fread(text='col,col2\n"hi ""my friend""",2\n')`. – r2evans Aug 30 '21 at 12:41
  • (While most of the time I don't see the double-double-quotes in CSV files, RFC-4180 (section 2 bullet 7) clearly states the escaping of an embedded dquote with another dquote.) – r2evans Aug 30 '21 at 12:44
  • 1
    Ah I wasn’t aware of that. Fortunately I’m a strictly numbers person and have never had to suffer someone sending me data with embedded double quotes :) – user438383 Aug 30 '21 at 12:48
  • Yeah, it's a frustrating bug that has hit me a few times in the past; it's not too difficult to fix post-read (`gsub`), and while *that specific issue* is only a month old (it is a dupe), the problem has been around a while, see https://github.com/Rdatatable/data.table/issues?q=is%3Aissue+is%3Aopen+fread+quotes. – r2evans Aug 30 '21 at 12:52
0

If you write the csvImporter yourself, you simply have to change the separator dynamically (depending on the index) in the loop.

  • 1
    Please add further details to expand on your answer, such as working code or documentation citations. – Community Aug 30 '21 at 12:38