-1

I have a series of numbers that are 0 or 1. total length is 35115 or if you take 35115/15 you have 2341 blocks. I want to step through each block and see if my vector tmp matches the column of interest that contains the blocks. I want to seq along my dataframe, but take steps of 15 and ask does these 15 match my vector. What am I doing wrong? can someone help me? Thank you all for teaching me something useful. Cheers

x;
        V1       V2 V3 V4  V5 V6 V7
 3R 11024348  A  G  A1  0 61
 3R 11024348  A  G  A2  1 30
 3R 11024348  A  G  A3  0 68
 3R 11024348  A  G  A4  0 57
 3R 11024348  A  G  A5  0 63
 3R 11024348  A  G  A6  0 49
 3R 11024348  A  G  A7  0 60
 3R 11024348  A  G  B1  0 63
 3R 11024348  A  G  B2  0 64
 3R 11024348  A  G  B3  0 71
 3R 11024348  A  G  B4  1 51
 3R 11024348  A  G  B5  0 37
 3R 11024348  A  G  B6  0 52
 3R 11024348  A  G  B7  0 47
 3R 11024348  A  G AB8  0 83
 3R 11024410  C  T  A1  0 45
 3R 11024410  C  T  A2  1 54
 3R 11024410  C  T  A3  0 76
 3R 11024410  C  T  A4  0 48
 3R 11024410  C  T  A5  0 49
 3R 11024410  C  T  A6  1 48
 3R 11024410  C  T  A7  0 45
 3R 11024410  C  T  B1  0 48
 3R 11024410  C  T  B2  0 81
 3R 11024410  C  T  B3  1 58
 3R 11024410  C  T  B4  1 50
 3R 11024410  C  T  B5  0 65
 3R 11024410  C  T  B6  1 45
 3R 11024410  C  T  B7  0 66
 3R 11024410  C  T AB8  0 58


tmp<-c(1,1,0,1,1,1,1,1,1,1,1,1,0,0,0)
for(i in seq(from=1, to=length(X$V6), by=15)){print(matchID<-match(tmp,X$V6[i]))}
Genetics
  • 279
  • 2
  • 11
  • Please provide a minimal reproducible example. – Roland Mar 02 '16 at 15:53
  • Roland any data would have worked, but here is a small slice of the real data. I want to match x$V6 to tmp in steps of 15. Here would be two steps. – Genetics Mar 02 '16 at 15:57
  • Why is this getting negative votes? I provide a specific problem with specific needs and an example of a loop that I cant get to work? – Genetics Mar 02 '16 at 16:02

2 Answers2

1

I'm not entirely sure regarding the expected output, but maybe this:

First reproduce the data:

x <- read.table(text = "        V1       V2 V3 V4  V5 V6 V7
 3R 11024348  A  G  A1  0 61
                3R 11024348  A  G  A2  1 30
                3R 11024348  A  G  A3  0 68
                3R 11024348  A  G  A4  0 57
                3R 11024348  A  G  A5  0 63
                3R 11024348  A  G  A6  0 49
                3R 11024348  A  G  A7  0 60
                3R 11024348  A  G  B1  0 63
                3R 11024348  A  G  B2  0 64
                3R 11024348  A  G  B3  0 71
                3R 11024348  A  G  B4  1 51
                3R 11024348  A  G  B5  0 37
                3R 11024348  A  G  B6  0 52
                3R 11024348  A  G  B7  0 47
                3R 11024348  A  G AB8  0 83
                3R 11024410  C  T  A1  0 45
                3R 11024410  C  T  A2  1 54
                3R 11024410  C  T  A3  0 76
                3R 11024410  C  T  A4  0 48
                3R 11024410  C  T  A5  0 49
                3R 11024410  C  T  A6  1 48
                3R 11024410  C  T  A7  0 45
                3R 11024410  C  T  B1  0 48
                3R 11024410  C  T  B2  0 81
                3R 11024410  C  T  B3  1 58
                3R 11024410  C  T  B4  1 50
                3R 11024410  C  T  B5  0 65
                3R 11024410  C  T  B6  1 45
                3R 11024410  C  T  B7  0 66
                3R 11024410  C  T AB8  0 58", header = TRUE)

tmp<-c(1,1,0,1,1,1,1,1,1,1,1,1,0,0,0)

Now use integer division to define the blocks and then use aggregate or the "split-apply-combine" function of your choice:

aggregate(x$V6, list(block = (seq_len(nrow(x)) - 1) %/% 15), FUN = function(v) all(v == tmp))
#  block     x
#1     0 FALSE
#2     1 FALSE

A probably faster alternative would be to transform the column of your data.frame into a matrix and do this:

colSums(matrix(x$V6, nrow = 15) == tmp) == 15L
#[1] FALSE FALSE
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Hi Roland, this is a great way of doing what I am asking that I didnt even consider. What about a forloop? I ask, because it bothers me I couldnt figure of the forloop. – Genetics Mar 02 '16 at 16:19
  • I don't understand what you do there with `print` and `match`, but the basic idea is that you need to calculate the indices, i.e., `i + 0:14`. – Roland Mar 02 '16 at 16:22
  • Hi roland, I am not use to the aggregate function very much. If you do find a match is there an easy way to reverse course and pull out match in the dataframe? – Genetics Mar 02 '16 at 17:04
1

If you really want to use a for loop, you need to define a datastructure in order to store your loop results (could be a vector of numbers, strings, list, matrix, etc).

something like matchID<-vector()

let's look at your code:

for(i in seq(from=1, to=length(X$V6), by=15)){print(matchID<-match(tmp,X$V6[i]))}

your for function calls a loop from 1 to 30 (length of V6) jump by 15 (length of tmp), so it is going to return:

>for(i in seq(1,30,15)) print(i)
[1] 1
[1] 16

so if you index your V6 vector by i, the loop will only return the value at 1 and 16.

Here is my solution:

matchID<-vector() # stores the loop return in a vector
for(i in 1:length(x[, "V6"]){  
  matchID[i]<- as.numeric(tmp == x[, "V6"])[i]
}

you can see that for ith element in matchID, it always equals to the ith element in the vector comparing temp and "V6".

However, you really don't need a loop in this case

matchID<-as.numeric(tmp == x[, "V6"])  
fhlgood
  • 479
  • 4
  • 9