0

I am very new to R (computer programming in general) and am working on a bioinformatics project. I made a MySQL database and using RMySQL connected to that database in the MySQL server in R. From here I issued queries to select a certain field from a table, fetch this data and make it into a data frame in R as seen below:

> rs = dbSendQuery(con, "select mastitis_no from experiment")
> data = fetch(rs, n=-1)
> data
   mastitis_no
1            5
2            2
3            8
4            6
5            2
....

> rt = dbSendQuery(con, "select BMSCC from experiment")
> datas = fetch(rt, n=-1)
> datas
   BMSCC
1  14536
2  10667
3  23455
4  17658
5  14999
....

> ru = dbSendQuery(con, "select cattle_hygiene_score_avg from experiment")
> dat = fetch(ru, n=-1)
> dat
   cattle_hygiene_score_avg
1                      1.89
2                      1.01
3                      1.21
4                      1.22
5                      1.93
....

My first 2 data frames are integers and my third data frame is in decimal format. I am able to run a simple correlation test on these data frames, but a detailed test (or plot) cannot be run as seen below.

> cor(data, datas)
                BMSCC
mastitis_no 0.8303017
> cor.test(data, datas)
Error in cor.test.default(data, datas) : 'x' must be a numeric vector

Therefore I accessed the data inside those data frames using the usual list idexing device $, however the decimal data frame did not work as noted below.

> data$mastitis
 [1] 5 2 8 6 2 0 5 6 7 3 0 1 0 3 2 2 0 5 2 1

> datas$BMSCC
 [1] 14536 10667 23455 17658 14999  5789 18234 22390 19069 13677 13536 11667 13455
[14] 17678 14099 15789  8234 21390 16069 13597

> dat$hygiene
NULL

by doing this I am able to perform a spearman rank correlation test and scatter plot on the first two data frames but not the decimal data frame. Any suggestion on what I need to do? I am sure the answer is quite simple but I cannot find the coding necessary for this simple task. Any help would be much appreciated.

Arun
  • 116,683
  • 26
  • 284
  • 387
  • Have you tried coercing the vectors you are performing the tests on into numeric type using `as.numeric()`? – sriramn Apr 27 '14 at 20:57
  • 2
    I think you're just using the wrong column name, it looks like what you want is ```dat$cattle_hygiene_score_avg```, not ```dat$hygiene```. Also, you could get all your columns in a single call to the data base ```select mastitis_no, BMSCC, cattle_hygiene_score_avg from experiment``` – sebkopf Apr 27 '14 at 20:58
  • There are also several functions you can use to find out more about your data, such as head(dat) and str(dat). – James Trimble Apr 27 '14 at 21:15
  • the original comment from sebkopf solved the issue. It seems you cannot assign any word after the "$", it has to at least contain the start or initial wording of the variable. "dat$hygiene" will result in NULL. "dat$cattle" will result in the desired [1] 1.89 1.01 1.21 1.22 1.93 2.23 1.04 1.19 2.56 2.34 1.19 1.81 1.31 2.22 1.89 1.23 [17] 2.67 1.01 2.86 2.34 – user3579186 Apr 28 '14 at 20:53

0 Answers0