0

I have one vector v, created with c(), that has this data:

 v[a,b,d,z,e,f], it must be unordered

And I have a txt file with the form:

     label      1            2          3       ....
      b        100        2000          15
      z        123          14          12
      a         55         565          55
     .....

I have extracted the txt file, that is delimited with tabs with strplit

      ext_data<-strsplit(file,"\t") 

What I want to do is to see if the elements of vector V matches one of the elements of label, it can be there like not, and after that extract the corresponding elements of the column 1 of the txt file, then the elements of column 2 and so on

I have made the matching using for loop, but is taking too much time, because the txt file contains too much data, like this (algorithmically)

      for i=1 to length(v)
             for pos=2 to ext_data      #I put pos=2 because I start in the second row
                  if match(vector) and ext_data(pos,1)  
                       retrieve data from column C     

Any suggestion?

In rough way what I want is to know if there could be a way to use the match, but with columns, maybe transforming the column label in a row?

Layla
  • 5,234
  • 15
  • 51
  • 66
  • Can you post a bigger selection of your data? – TARehman Oct 08 '12 at 13:08
  • I don't understand why you would use something like `strsplit` to read a delimited file rather than, say, `read.delim`...? – joran Oct 08 '12 at 13:08
  • well I used because after that I can jump into the values like a vector – Layla Oct 08 '12 at 13:12
  • @TARehman, sorry the original file is quite big and messy – Layla Oct 08 '12 at 13:13
  • @Manolo No prob. I suspect that you can use %in% to do this, but give me a little while and I'll post some code. – TARehman Oct 08 '12 at 13:15
  • But if you're only looking for matches in the label column, then having the whole file as a single vector will vastly increase the time spent looking for matches. – joran Oct 08 '12 at 13:18
  • @joran Yes, you're absolutely right; I didn't notice this before in the description. You can read the data into a data.frame and then treat the first column like a vector (in fact, that's what I do in my answer). – TARehman Oct 08 '12 at 13:20

1 Answers1

1

Just creating some test data to illustrate my solution:

testdata <- data.frame(namecol=c("b","r","a","j","z","l","s","n","t"),
                       v1=sample(1:1000,9),
                       v2=sample(1:1000,9),
                       v3=sample(1:1000,9))
vecfind <- c("a","b","d","z","e","f")

Using [[]] or $, you can select the first element of a data frame as a vector, and then using the which and %in% functions, you can get the numeric row indices and then extract the elements, like this:

v1_elements <- testdata[which(testdata[[1]] %in% vecfind),2]
v2_elements <- testdata[which(testdata[[1]] %in% vecfind),3]
v3_elements <- testdata[which(testdata[[1]] %in% vecfind),4]
TARehman
  • 6,659
  • 3
  • 33
  • 60
  • 1
    You don't need `which` here. Also, there is no reason to subset three times if once is sufficient, e.g., `elements <- testdata[testdata[,1] %in% vecfind,]; elements[,2] ;elements[,3]; elements[,4]`, which should be more efficient. – Roland Oct 08 '12 at 14:02
  • @Roland (+1) All around, good points. I put the `which()` in there mostly for the conceptual clarity it provides on how the subset works. Good point on the elements - that's much more efficient. – TARehman Oct 08 '12 at 14:12
  • thanks, but I get an error in testdata[testdata[,1]..., how I can do it if I can access my elements only like testdata[[1]][[1]]? – Layla Oct 08 '12 at 15:19
  • You can select rows as vectors by using [n,]. – Roland Oct 09 '12 at 07:00
  • Deleted my last comment to amend; I forgot that you can get rows as vectors using [n,]. I thought that returned a data frame of n elements. – TARehman Oct 09 '12 at 14:48