9

I need to process some data that are mostly csv. The problem is that R ignores the comma if it comes at the end of a line (e.g., the one that comes after 3 in the example below).

> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"

I'd like it to be read in as [1] "1" "2" "3" NA instead. How can I do this? Thanks.

smci
  • 32,567
  • 20
  • 113
  • 146
ceiling cat
  • 5,501
  • 9
  • 38
  • 51

3 Answers3

9

Here are a couple ideas

scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1]  1  2  3 NA

unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1]  1  2  3 NA

Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:

as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA 

Or, you could specify what="character" in scan, or colClasses="character" in read.csv for slightly different output

scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" "" 

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" "" 

You could also specify na.strings="" along with colClasses="character"

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""), 
       use.names=FALSE)
#[1] "1" "2" "3" NA 
GSee
  • 48,880
  • 13
  • 125
  • 145
7

Hadley's stringi (and previously stringr) libraries are a huge improvement on base string functions (fully vectorized, consistent function interface):

require(stringr)
str_split("1,2,3,", ",")

[1] "1" "2" "3" "" 

as.integer(unlist(str_split("1,2,3,", ",")))
[1]  1  2  3 NA
smci
  • 32,567
  • 20
  • 113
  • 146
  • 3
    `stringr` is slow, you should use `stringi` :) –  Apr 21 '15 at 00:52
  • 3
    @silvaran you are totally correct, I only became aware of `stringi` after I wrote this. (How on earth to stay on top of which-latest-greatest-package in R?) – smci Apr 21 '15 at 01:01
3

Using stringi package:

require(stringi)
> stri_split_fixed("1,2,3,",",")
[[1]]
[1] "1" "2" "3" "" 
## you can directly specify if you want to omit this empty elements
> stri_split_fixed("1,2,3,",",",omit_empty = TRUE)
[[1]]
[1] "1" "2" "3"
bartektartanus
  • 15,284
  • 6
  • 74
  • 102