read.csv counts from first column, read.table and read.delim count from second column?

Question

I'm totally new to R. I have a few scripts that read data from .cvs files to plot them. They read the complete table and plot the data in a particular column (e.g. column 6):

tablecsv <- read.csv("myfile.csv", header=TRUE)
plot.values_csv <- tablecsv[,6]

I then changed them to read the data from tab-seperated files instead (using read.delim and/or read.table).

tabletab <- read.delim("myfile.tab", header=TRUE, sep="\t")
plot.values_tab <-tabletab[,6]

The weird thing is, that now the numbering of the colums has shifted. The column 6 in "tabletab" always corresponds to column 7 in "tablecsv". And column 1 in "tabletab" would correspond to column 2 in "tablecsv". Therfore it seems that by using read.table or read.delim, the first column in the input file is ignored or interpreted a a comment. I can't seem to turn this off with any parameter. I tried setting skip = 0, but that didn't change anything and is the default parameter anyway. Also the first column doesn't contain a # character, which is the default comment symbol, as far as i understood.

Has anybody got an explanation for this behaviour? (I know that it is not hard to work around by just changing the column-number in my script. It's just that this behaviour makes no sense to me).

Edit: Here are the first few lines of the .csv and the .tab input-files, respectively:

myfile.csv:

name,A,B,C,D,E,F,G,H,I,J,K
xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_-,0.242718447,0.35483871,0.166666667,0.2,0.368932039,0.451612903,0.333333333,0.418604651,0.333333333,0.763109267,0.711142183
xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+,0.152317881,0.1875,0.214285714,0.120879121,0.231788079,0.25,0.464285714,0.35,0.153846154,0.635002306,0.612372436

myfile.tab:

name    A   B   C   D   E   F   G   H   I   J   K
xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.242718447 0.35483871  0.166666667 0.2 0.368932039 0.451612903 0.333333333 0.418604651 0.333333333 0.763109267 0.711142183
xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.152317881 0.1875  0.214285714 0.120879121 0.231788079 0.25    0.464285714 0.35    0.153846154 0.635002306 0.612372436

Edit2: This is what my tabletab look like now:

    > tabletab1[1:3,]
                                               name         A          B
  1     xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871
  2     xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000
  3     xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667
            C         D         E         F         G         H         I
  1 0.1666667 0.2000000 0.3689320 0.4516129 0.3333333 0.4186047 0.3333333
  2 0.2142857 0.1208791 0.2317881 0.2500000 0.4642857 0.3500000 0.1538462
  3 0.1000000 0.1518987 0.2403101 0.1333333 0.3000000 0.2000000 0.2658228
            J         K
  1 0.7631093 0.7111422
  2 0.6350023 0.6123724
  3 0.7236342 0.5617433

>   tablecsv1[1:3,]
                                             name         A          B
1     xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871
2     xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000
3     xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667
          C         D         E         F         G         H         I
1 0.1666667 0.2000000 0.3689320 0.4516129 0.3333333 0.4186047 0.3333333
2 0.2142857 0.1208791 0.2317881 0.2500000 0.4642857 0.3500000 0.1538462
3 0.1000000 0.1518987 0.2403101 0.1333333 0.3000000 0.2000000 0.2658228
          J         K
1 0.7631093 0.7111422
2 0.6350023 0.6123724
3 0.7236342 0.5617433

Seems to be fine now. These are however from input files that i resaved with excel, after i obscured the sample-names a little. The original files yiedled a result that looked like this:

tabletab1[1:3,] name A xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667 B xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.1666667 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.2142857 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1000000 C xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2000000 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1208791 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1518987

So the "name" column was included with every other column. These files contained were generated with a java program run under unix which seems use other meta-characters for "\t" and "\n" (couldn't see this in a text editor) So problem solved, i guess, but since i run the java program on a unix machine, and run R also on a unix machine if find it strange that the tables work better after I re-save them using Excel on a windows operating system? Also when i Run Dos2Unix on any table they get these metacharacters again which cause these problems.

This is really hard to debug without the actual file. Could you post the first few lines of both types of files? Are you sure there isn't an extra tab in the tab-delimited file? — nograpes, Sep 16 '13 at 11:12
Please post a reproducible example of your source files. And in the meantime, try just typing `tablecsv[1:3,]` and `tabletab[1:3,]` to see what you actually did load. — Carl Witthoft, Sep 16 '13 at 11:17
SO both "table1<-read.table("myfile.csv", header=TRUE)" AND "table1<-read.table("myfile.tab", header=TRUE, sep="\t")" work the same with you? "table1[,6]" points to the same column in both cases? Can this then be a problem with the interpreter or the operating system? I'm running R verion 3.0.1 on x86_64-suse-linux-gnu (64-bit) — ThisIsMe, Sep 16 '13 at 13:35
I don't thinks so. Why don't you construct some artificial, completely new tab- and comma-delimited files using a text editor and try to reproduce your problems with these. My guess is that your original input files have some formatting issue you are overlooking. — ROLO, Sep 16 '13 at 14:05
Again: please post similar samples of `tablecsv` and `tabletab` so we can see what is actually in each column! — Carl Witthoft, Sep 16 '13 at 14:33
Sorry, had misread you...posting the tablecsv and tabletab right now! — ThisIsMe, Sep 16 '13 at 15:56

read.csv counts from first column, read.table and read.delim count from second column?

0 Answers0