How to remove rows starting with special character in R

Question

I have a data frame and I want to remove all the rows starting with # . Can any body tell me how to do it. Thanks in advance.

#ID_REF = The name of the probe set, blank for control probes           
    #VALUE = The signal value calculated by MAS5, normalized            
    #ABS_CALL = The detection value calculated by the MAS5          
    #DETECTION P-VALUE = The detection p-value calculated by the MAS5           
    *ID_REF**   VALUE** ABS_CALL**  DETECTION P-VALUE*
    AFFX-BioB-5_at  757.7   P   0.00039
    AFFX-BioB-M_at  933.7   P   0.000095
    AFFX-BioB-3_at  525.6   P   0.000095
    AFFX-BioC-5_at  1999.5  P   0.000044
    AFFX-BioC-3_at  2339.5  P   0.000044
    AFFX-BioDn-5_at 4321.3  P   0.000044
    AFFX-BioDn-3_at 9229.4  P   0.00007
    AFFX-CreX-5_at  21949.9 P   0.000044
    AFFX-CreX-3_at  26022.8 P   0.000044
    AFFX-DapX-5_at  1171.1  P   0.00006

Possible duplicate of http://stackoverflow.com/questions/28433328/skip-comment-line-in-csv-file-using-r — akrun, Feb 10 '15 at 16:36
@akrun , it removes some rows with `#`, but mergers all the data in one row — AwaitedOne, Feb 10 '15 at 16:38
@akrun. read.table(file.choose(),skip=3, header=T) gives the following error: ` Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 2 elements` — AwaitedOne, Feb 10 '15 at 16:44
I tried `d1 <- read.table(file='awaited.txt', comment.char='#', fill=TRUE); d2 <- d1[-1,]` — akrun, Feb 10 '15 at 16:45
@akrun .I have csv file. i tried the above for csv ,but it merges all the data in one column. — AwaitedOne, Feb 10 '15 at 16:55
I copy pasted the data you showed and saved as .txt. Couldnt reproduce it in txt file — akrun, Feb 10 '15 at 16:57
Is there any chance to share the file in dropbox or something — akrun, Feb 10 '15 at 17:58
Thanks for sharing. I changed `lines2 <-...` and was able to read it correctly. Please check the updated post — akrun, Feb 11 '15 at 09:09

akrun · Accepted Answer · 2015-02-11T09:17:50.713

The comment character (#) in some lines were not the first character. One way would be to remove the lines having the comment character (#) using grep ("lines2") and then read using read.csv

lines <- readLines('awaited.csv')
lines1 <- gsub('^ +| +$', '', lines)
lines2 <- lines1[!grepl('^#|^.*#', lines1)]
d1 <- read.csv(text=lines2, check.names=FALSE, stringsAsFactors=FALSE)
str(d1)
#'data.frame':  54682 obs. of  4 variables:
# $ *ID_REF**         : chr  "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" ...
# $ VALUE**           : num  758 934 526 2000 2340 ...
# $ ABS_CALL**        : chr  "P" "P" "P" "P" ...
# $ DETECTION P-VALUE*: num  3.9e-04 9.5e-05 9.5e-05 4.4e-05 4.4e-05 4.4e-05 7.0e-05 4.4e-05 4.4e-05 6.0e-05 ...
head(d1,3)
#       *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
#1 AFFX-BioB-5_at   757.7          P            3.9e-04
#2 AFFX-BioB-M_at   933.7          P            9.5e-05
#3 AFFX-BioB-3_at   525.6          P            9.5e-05

Or you could use comment.char='#' argument in read.csv after removing all the other characters before # in those lines with # (sub(.*...)).

d2 <- read.csv(text=sub('.*(#.*)', '\\1', lines),
   check.names=FALSE, stringsAsFactors=FALSE, comment.char='#')
dim(d2)
#[1] 54682     4
head(d2,3)
#       *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
#1 AFFX-BioB-5_at   757.7          P            3.9e-04
#2 AFFX-BioB-M_at   933.7          P            9.5e-05
#3 AFFX-BioB-3_at   525.6          P            9.5e-05

For me same error : `Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 2 did not have 2 elements` — AwaitedOne, Feb 10 '15 at 17:16
@AwaitedOne I was copy/pasting your dataset and it worked for me — akrun, Feb 10 '15 at 17:26
may b there is some mystery behind. The data is containing 54000 rows. I am not sure whether something is going wrong there. Always it gives me the same error with `read.table` to load the data. I am comfortable to load the data with read.csv. — AwaitedOne, Feb 10 '15 at 17:30

How to remove rows starting with special character in R

1 Answers1