3

In R am reading a file with comments as csv using

read.data.raw = read.csv(inputfile, sep='\t', header=F, comment.char='')

The file looks like this:

#comment line 1
data 1<tab>x<tab>y
#comment line 2
data 2<tab>x<tab>y
data 3<tab>x<tab>y

Now I extract the uncommented lines using

comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data = read.data.raw[-comment_ind,]

Which leaves me:

 data 1<tab>x<tab>y
 data 2<tab>x<tab>y
 data 3<tab>x<tab>y

I am modifying this data through some separate script which maintains the number of rows/cols and would like to put it back into the original read data (with the user comments) and return it to the user like this

#comment line 1
modified data 1<tab>x<tab>y
#comment line 2
modified data 2<tab>x<tab>y
modified data 3<tab>x<tab>y

Since the data I extracted in read.data preserves the row names row.names(read.data), I tried

original.read.data[as.numeric(row.names(read.data)),] = read.data

But that didn't work, and I got a bunch of NA/s

Any ideas?

Omar Wagih
  • 8,504
  • 7
  • 59
  • 75
  • How exactly did it change the data? If it turned factors into characters, or similar changes in data types, that would account for the NAs. – David Robinson Aug 27 '12 at 19:52
  • Also, you're going to get NAs after the comment line in any column if you force the column to be numeric. R wasn't really meant to read in comment data along with the data frame, though you could find ways around it. In any case, you'd have to be more specific about the type of data you read in and how you modified it – David Robinson Aug 27 '12 at 19:58
  • The data I'm reading in is a 5 column formatted data: Column 1-3(numeric) column 4-5 character strings In most cases I am replacing values in specific cells of the data frame (example data[5,8]=NA) and sometimes replacing the whole column (example data[[3]]=1:100) I forced R to read the comment data, because when I set comment.char to '#', I lost the comment lines. So by getting R to read it that way, I can extract the uncommented lines, leaving commented lines behind. At least that was my logic behind my choices – Omar Wagih Aug 27 '12 at 20:22
  • 1
    Why not edit your original question to include a fully reproducible example? – David Robinson Aug 27 '12 at 20:24

1 Answers1

1

Does this do what you want?

read.data.raw <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L, 5L),
   .Label = c("#comment line 1", "#comment line 2", "data 1", "data 2", 
   "data 3"), class = "factor"), V2 = structure(c(1L, 2L, 1L, 2L, 2L), 
   .Label = c("", "x"), class = "factor"), V3 = structure(c(1L, 2L, 1L,
   2L, 2L), .Label = c("", "y"), class = "factor")), .Names = c("V1", 
   "V2", "V3"), class = "data.frame", row.names = c(NA, -5L))

comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data <- read.data.raw[-comment_ind,]
# modify V1
read.data$V1 <- gsub("data", "DATA", read.data$V1)
# rbind() and then order() comments into original places
new.data <- rbind(read.data.raw[comment_ind,], read.data)
new.data <- new.data[order(as.numeric(rownames(new.data))),]
dcarlson
  • 10,936
  • 2
  • 15
  • 18