-2

Looking to add a column based on the values of other columns, but over more than one row.

DF:
Studios: A, B, C ...
Year:2000-2020
Id:r1,r2,r3,r4...
Top1:0,1,0,0,0

Starting data set looks like this :

id Year Studio Top1
r1 2020 A 0
r2 2001 B 0
r3 2019 A 1
r45 2005 D 1
r34 2005 E 1

Ifelse Statemets (probably): What I am trying to achieve is "create column PreviousTop1.If column Studio1=studioX where year=year-1 and Top1=1. *For clarification: studio1 is referring to the studio1 my id is currently at. While studioX is the same studio as studio1 in different id.

Desired Output:

id Year Studio Top1 PreviousTop1
r1 2020 A 0 1
r2 2001 B 0 0
r3 2019 A 1 0
r45 2005 D 1 0
r34 2005 E 1 0

Reasoning for proper Answer PreviousTop=1,in row 1 because r3 exists. So I need to check the whole column for 1 instance that conditions are true and then break.

2 Answers2

0

I would suggest to solve your problem differently than with an ifelse:

Create a table with only the Top1 per year. Join this table with your first table just on a year=year-1 basis:

top1 <- table %>% filter(Top1=1) %>% mutate(Year = Year +1)
table %>% left_join(top1, by="Year")
c0bra
  • 1,031
  • 5
  • 22
  • I don't understand really how this helps, by ```top1```I have all id's that was top1 and just add 1 in the year. i dont follow how this help tbh. – user15560482 Apr 06 '21 at 12:57
  • If you find Studio A beeing top 1 in 2019, you add 1 to receive 2020. Now in the join, the StudioA will be added to the 2020 line – c0bra Apr 06 '21 at 13:14
0

You can try this base R approach :

#Initialise PreviousTop1 column to 0
df$PreviousTop1 <- 0
#Find whose previous year data is present in the dataframe
inds <- which((df$Year - 1) %in% df$Year)
#Assign 1 to PreviousTop1 if Top1 = 1
df$PreviousTop1[inds] <- +(df$Top1[match(df$Year[inds] - 1, df$Year)] == 1)
df

#   id Year Studio Top1 PreviousTop1
#1  r1 2020      A    0            1
#2  r2 2001      B    0            0
#3  r3 2019      A    1            0
#4 r45 2005      D    1            0
#5 r34 2005      E    1            0

data

df <- structure(list(id = c("r1", "r2", "r3", "r45", "r34"), Year = c(2020L, 
2001L, 2019L, 2005L, 2005L), Studio = c("A", "B", "A", "D", "E"), 
Top1 = c(0L, 0L, 1L, 1L, 1L)), row.names = c(NA, -5L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Guys, thanks for the recommendation and sorry for ghosting but I had focused on my task at hand. the code that worked for me is : SQLtop10 <-sqldf( 'select s.id, s.year, s.production, "1" as PreviousTop10 from DF s join (SELECT DISTINCT production as production, year as year FROM DF WHERE InTop10 = 1 and production is not null) as temp on production = temp.production and year = temp.year + 1; ') where production is studio but in my df. – user15560482 Apr 30 '21 at 14:23