I'm totally new to R. I have a few scripts that read data from .cvs files to plot them. They read the complete table and plot the data in a particular column (e.g. column 6):
tablecsv <- read.csv("myfile.csv", header=TRUE)
plot.values_csv <- tablecsv[,6]
I then changed them to read the data from tab-seperated files instead (using read.delim
and/or read.table
).
tabletab <- read.delim("myfile.tab", header=TRUE, sep="\t")
plot.values_tab <-tabletab[,6]
The weird thing is, that now the numbering of the colums has shifted. The column 6 in "tabletab" always corresponds to column 7 in "tablecsv". And column 1 in "tabletab" would correspond to column 2 in "tablecsv". Therfore it seems that by using read.table
or read.delim
, the first column in the input file is ignored or interpreted a a comment. I can't seem to turn this off with any parameter. I tried setting skip = 0
, but that didn't change anything and is the default parameter anyway. Also the first column doesn't contain a #
character, which is the default comment symbol, as far as i understood.
Has anybody got an explanation for this behaviour? (I know that it is not hard to work around by just changing the column-number in my script. It's just that this behaviour makes no sense to me).
Edit: Here are the first few lines of the .csv and the .tab input-files, respectively:
myfile.csv:
name,A,B,C,D,E,F,G,H,I,J,K
xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_-,0.242718447,0.35483871,0.166666667,0.2,0.368932039,0.451612903,0.333333333,0.418604651,0.333333333,0.763109267,0.711142183
xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+,0.152317881,0.1875,0.214285714,0.120879121,0.231788079,0.25,0.464285714,0.35,0.153846154,0.635002306,0.612372436
myfile.tab:
name A B C D E F G H I J K
xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.242718447 0.35483871 0.166666667 0.2 0.368932039 0.451612903 0.333333333 0.418604651 0.333333333 0.763109267 0.711142183
xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.152317881 0.1875 0.214285714 0.120879121 0.231788079 0.25 0.464285714 0.35 0.153846154 0.635002306 0.612372436
Edit2: This is what my tabletab look like now:
> tabletab1[1:3,]
name A B
1 xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871
2 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000
3 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667
C D E F G H I
1 0.1666667 0.2000000 0.3689320 0.4516129 0.3333333 0.4186047 0.3333333
2 0.2142857 0.1208791 0.2317881 0.2500000 0.4642857 0.3500000 0.1538462
3 0.1000000 0.1518987 0.2403101 0.1333333 0.3000000 0.2000000 0.2658228
J K
1 0.7631093 0.7111422
2 0.6350023 0.6123724
3 0.7236342 0.5617433
> tablecsv1[1:3,]
name A B
1 xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871
2 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000
3 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667
C D E F G H I
1 0.1666667 0.2000000 0.3689320 0.4516129 0.3333333 0.4186047 0.3333333
2 0.2142857 0.1208791 0.2317881 0.2500000 0.4642857 0.3500000 0.1538462
3 0.1000000 0.1518987 0.2403101 0.1333333 0.3000000 0.2000000 0.2658228
J K
1 0.7631093 0.7111422
2 0.6350023 0.6123724
3 0.7236342 0.5617433
Seems to be fine now. These are however from input files that i resaved with excel, after i obscured the sample-names a little. The original files yiedled a result that looked like this:
tabletab1[1:3,] name A xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2427184 0.35483871 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1523179 0.18750000 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1240310 0.06666667 B xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.1666667 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.2142857 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1000000 C xxx_NODE_25653_yyy_272_zzz_2.529412_1_312_- 0.2000000 xxx_NODE_22738_yyy_415_zzz_2.453012_1_455_+ 0.1208791 xxx_NODE_52133_yyy_348_zzz_3.123563_1_388_- 0.1518987
So the "name" column was included with every other column. These files contained were generated with a java program run under unix which seems use other meta-characters for "\t" and "\n" (couldn't see this in a text editor) So problem solved, i guess, but since i run the java program on a unix machine, and run R also on a unix machine if find it strange that the tables work better after I re-save them using Excel on a windows operating system? Also when i Run Dos2Unix on any table they get these metacharacters again which cause these problems.