3

I have a data frame DF, with three columns and n rows shown below:

Month Year  Default
1   2015    T
2   2015    T
3   2015    F
4   2015    T
5   2015    T
6   2015    T
7   2015    F

I would like to check if there are 3 T in a roll and keep going then print out all the starting year and month into a new DF.

I need to obtain the output as shown above. The output should like:

Month   Year
4   2015
David Arenburg
  • 91,361
  • 17
  • 137
  • 196

3 Answers3

2

Here's an attempt using data.table devel version on GH and the new rleid function

library(data.table) # v 1.9.5+
setDT(df)[, indx := rleid(Default)]
df[(Default), if(.N > 2) .SD[1L], by = indx]
#    indx Month Year Default
# 1:    3     4 2015    TRUE

What we are basically doing here, is to set a unique index per consecutive events in Default, then by looking only when Default == TRUE we are checcking per each group if the group size is bigger than 2, if so, select the first instance in that group.


A shorter version (proposed by @Arun) would be

setDT(df)[, if(Default && .N > 2L) .SD[1L], by = .(indx = rleid(Default), Default)]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • just another way: `setDT(df)[, if (Default && .N > 2L) .SD[1L], by = .(indx = rleid(Default), Default)]` – Arun Jul 23 '15 at 05:26
  • @Arun ow that was my initial thought but couldn't figure out how to get `Default` into there and it kept returning an error cause it couldn't find `Default` (didn't think to add it to the `by` statement too) – David Arenburg Jul 23 '15 at 06:25
1

This might not be the best solution but my first try would be - paste together the third column into a string - use a regexpr to find all occurences of "TTT" in that string, which will give you a vector. - use this vector to subset your original dataframe by row, omitting the last column

EDIT

Now with code:

def_str <- paste(as.integer(DF$default), collapse="")
indices <- unlist(gregexp("111+", def_str))
if (!indices[1]==-1){
  # if there is no match, indices will be -1
  DF[indices,-3]
}
else {
  print("someting dramatic about no 3 months rolling T's")
}
liesb
  • 53
  • 7
  • 1
    It's a very good idea, but we usually answer with a solution complete with code – Rich Scriven Jul 22 '15 at 21:35
  • I just wanted to comment but I'm not allowed! So I had to write an answer. I'm sure there is a good reason for this particular stackoverflow rule, but I don't see it... – liesb Jul 22 '15 at 22:15
1

A way of doing it with rle in base R without data.table, although data.table is a very sweet package! But sometimes people just want to use base R without other dependencies.

dt <- data.frame(Month = c(1, 2, 3, 4, 5, 6, 7), Year = 2015, Default = c(T, T, F, T, T, T, F))

runData <- rle(dt$Default)

whichThree <- which(runData$lengths == 3 & runData$values)

idx <- unlist(lapply(whichThree - 1, function(x) sum(runData$lengths[1:x])))
idx <- idx + 1

dt[idx, 1:2]
SJWard
  • 3,629
  • 5
  • 39
  • 54