1

I'm new to R and not sure how to get the right output from my data below:

My data:

row1    101 woody   5
row2    101 woody   0
row3    111 kiln    23
row4    200 weez    2
row5    315 rowt    0

For example, in row 3, element in column 3 is greater than 0, and the its column 1 value is between 101 (row1 ) and 111 (row3). So the condition is, for any row, if value in column3 is greater than 0, and also if its column 1 value is between that of the column above and below.

Required output:

        col1 col2   col3
row1    101 woody   After_none
row2    101 woody   0
row3    111 kiln    Between_woody_weez
row4    200 weez    Between_Kiln_rowt
row5    315 rowt    0

I would be pleased if someone could help me. Thanks

Added more data to run Akru's code:

col1    col2    col3
255 mwu 21
77031   netw    0
77031   netw    0
77031   netw    0
82513   cuu 91
88206   cxum    0
88206   cxum    0
88206   cxum    0
188450  xaii    25
188450  xaii    0
188450  xaii    0
188450  xaii    0
188450  xaii    0
199800  aau 0

Code runs with this data sample but, the output not quite right:

col1 col2 col3              colN
255  mwu   21        After_none
77031 netw    0              <NA>
77031 netw    0              <NA>
77031 netw    0              <NA>
82513  cuu   91  Between_mwu_netw
88206 cxum    0              <NA>
88206 cxum    0              <NA>
88206 cxum    0              <NA>
188450 xaii   25 Between_netw_cxum
188450 xaii    0              <NA>
188450 xaii    0              <NA>
188450 xaii    0              <NA>
188450 xaii    0              <NA>
199800  aau 0                 <NA>

But the expected output is:

col1 col2 col3              
255  mwu   21        
77031 netw    0              
77031 netw    0              
77031 netw    0              
82513  Between_mwu_cxum   91
88206 cxum    0              
88206 cxum    0              
88206 cxum    0              
188450 Between_cxum_aau   25 
188450 xaii    0              
188450 xaii    0              
188450 xaii    0              
188450 xaii    0   
199800  aau 0           

OR with the extra column "colN" will be fine

Expected output:

col1 col2 col3              
255  mwu   21        
77031 netw    0              
77031 netw    0              
77031 netw    0              
82513  Between_mwu_cxum   91
88206 cxum    0              
88206 cxum    0              
88206 cxum    0              
188450 Between_cxum_aau   25 
188450 xaii    0              
88450 xaii    0              
188450 xaii    0              
188450 xaii    0   
199800  aau 0   
Rob John
  • 13
  • 4
  • Your expected output is confusing. Why do you have `Between_mwu_cxum" for the row `82513`? Wouldn't that be `Between_netw_cxum`? Or the new rule is that we have to select only from the rows that are >0? – akrun Oct 23 '14 at 08:20
  • Sorry for the confusion. You're right akrun. it should be Between_netw_cxum. – Rob John Oct 23 '14 at 08:23
  • I didn't understand the comment `col2 did not include the replacement strings "Between..."`. Could you clarify? – akrun Oct 28 '14 at 03:45
  • What do you want it to be there? You showed an expected output without any `Between` for those rows. Anyway, I spent some time on this. – akrun Oct 28 '14 at 08:26
  • Sorry, I was only showing examples of where there are no replacement. I will now show full example. Thanks – Rob John Oct 28 '14 at 09:07
  • Have you updated the post (seems like an edit made now) – akrun Oct 28 '14 at 09:12
  • I updated. Hope it helps. – akrun Oct 28 '14 at 10:29
  • @akrun, I really appreciate your efforts. Many thanks!!! – Rob John Oct 28 '14 at 14:22

1 Answers1

0

One way would be:

  indx <- df$col3 >0
  df$colN <- df$col3
  df$colN[indx] <- sapply(which(indx), function(i) {
      i1 <- 1:(i - 1)
      i2 <- (i + 1):nrow(df)
      indx1 <- with(df, col1[i] > col1[i1])
      indx2 <- with(df, col1[i] < col1[i2])
      if (any(indx1) & any(indx2)) 
       paste("Between", df$col2[i1][max(which(indx1))], df$col2[i2][min(which(indx2))], 
             sep = "_") else df$col3[i]
   })

  df
  #     col1 col2 col3              colN
  #1     255  mwu   21                21
  #2   77031 netw    0                 0
  #3   77031 netw    0                 0
  #4   77031 netw    0                 0
  #5   82513  cuu   91 Between_netw_cxum
  #6   88206 cxum    0                 0
  #7   88206 cxum    0                 0
  #8   88206 cxum    0                 0
  #9  188450 xaii   25  Between_cxum_aau
  #10 188450 xaii    0                 0
  #11 188450 xaii    0                 0
  #12 188450 xaii    0                 0
  #13 188450 xaii    0                 0
  #14 199800  aau    0                 0

Update

If you want to change the col2, just do:

 df$col2[indx] <-sapply(which(indx), function(i) {
     i1 <- 1:(i - 1)
     i2 <- (i + 1):nrow(df)
     indx1 <- with(df, col1[i] > col1[i1])
     indx2 <- with(df, col1[i] < col1[i2])
     if (any(indx1) & any(indx2)) 
     paste("Between", df$col2[i1][max(which(indx1))], df$col2[i2][min(which(indx2))], 
         sep = "_") else df$col2[i] #replaced here
   })

 df
 #    col1              col2 col3
 #1     255               mwu   21
 #2   77031              netw    0
 #3   77031              netw    0
 #4   77031              netw    0
 #5   82513 Between_netw_cxum   91
 #6   88206              cxum    0
 #7   88206              cxum    0
 #8   88206              cxum    0
 #9  188450  Between_cxum_aau   25
 #10 188450              xaii    0
 #11 188450              xaii    0
 #12 188450              xaii    0
 #13 188450              xaii    0
 #14 199800               aau    0

data

 df <-  structure(list(col1 = c(255L, 77031L, 77031L, 77031L, 82513L, 
 88206L, 88206L, 88206L, 188450L, 188450L, 188450L, 188450L, 188450L, 
 199800L), col2 = c("mwu", "netw", "netw", "netw", "cuu", "cxum", 
 "cxum", "cxum", "xaii", "xaii", "xaii", "xaii", "xaii", "aau"
 ), col3 = c(21L, 0L, 0L, 0L, 91L, 0L, 0L, 0L, 25L, 0L, 0L, 0L, 
 0L, 0L)), .Names = c("col1", "col2", "col3"), class = "data.frame", 
 row.names = c(NA,-14L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could I ask what is the condition for `after_none`? Is it for the first entry of the duplicated number (101)? – jazzurro Oct 22 '14 at 10:46
  • @jazzurro yes, it is for the first entry. I am guessing that there is no row above it, it would be named `after_none`. But, I am a bit confused by the question. So far, I haven't heard any reply from the OP. – akrun Oct 22 '14 at 10:48
  • @jazzurro I guess `between` from `dplyr` may be another option? – akrun Oct 22 '14 at 10:54
  • I was approaching with `getMyRows` in the SOfun package, which allows me to do similar processes. My code is not compact like yours; I think I would not post it. How would you approach with `between` here? – jazzurro Oct 22 '14 at 15:03
  • @jazzurro I didn't get time to think about the process. Also, I didn't get any feedback from the OP. So, I will wait. – akrun Oct 22 '14 at 15:11
  • Good idea. I will find out more tomorrow. I will also see how/if between would work. Thanks for sharing the idea. – jazzurro Oct 22 '14 at 15:15
  • Thanks so much akrun and jazzuro. I apologise for my late reply. I tried akrun's code, and for some reason threw the following error: In df$colN[indx] <- c("After_none"[indx2], : number of items to replace is not a multiple of replacement length. Thinking about it more now, I guess that that I should have decided to have "before whatever the topmost item is", e.g.,"before_woody" as the output instead of "After none". Really apreciate your help guys. – Rob John Oct 22 '14 at 19:36
  • Regarding akrun'previous questions. The table is ordered according to col1, but two or more rows may have the same number. For example you may have the rows of col1 as follows: 101 101 111 200 315 315 315 600 600 600 600 600. – Rob John Oct 22 '14 at 19:42
  • @Rob John Could you post a small example that gives the error and also the new info about the rows with same number. It will make it easier to debug. Today it's too late. I will check tomorrow for any update. – akrun Oct 22 '14 at 19:44
  • Thanks akrun, I posted more data above. Although it runs with this sample data, but not giving the expected results. Thank you. – Rob John Oct 22 '14 at 23:05
  • @RobJohn Thanks for more data. I can't really try this till late today. But, your new data isn't necessarily clear to me. You showed "wrong" output using @akrun's code. Would it be possible for you to upload expected outcome? One more, could you describe the rule for `After_none`? – jazzurro Oct 23 '14 at 01:48
  • Thanks jazzuro. The expected output of the second data sample is added. The "After_none" rule indicates that there is no row above. But I later reviewed this in one of my posts above. Thanks – Rob John Oct 23 '14 at 07:31
  • @akrun I've tried some. As you suggested, this pattern gives me the impression that there is some space for dplyr. But I could not come up with any solution. I will invest more time tomorrow. Thanks for your encouragement. I really appreciate it. – jazzurro Oct 25 '14 at 15:40
  • @akrun I think you can definitely solve [this](http://stackoverflow.com/questions/26550558/how-to-calculate-time-weighted-average-and-create-lags/26559006#26559006). I am sure you had a look of it. – jazzurro Oct 25 '14 at 15:46
  • @jazzurro I didn't looked at it. It is a bit late today, anyway. – akrun Oct 25 '14 at 19:33
  • Apologies for my late reply. Many thanks for your help akrun. It works fantastically! I really appreciate your time and help. But the only thing left is that col2 did not include the replacement strings "Between..." as in the example output. Thanks – Rob John Oct 28 '14 at 01:10
  • @Rob John Can you show a small example where it would not include the replacement strings? – akrun Oct 28 '14 at 03:36