0

I have a dataset of portfolio components that gives information about the 10 stocks held in my portfolio at each rebalancing date(monthly).

An example dataframe with just 4 dates and 4 stocks:

Date <- c(rep(as.Date("2010/12/14"), 4), rep(as.Date("2011/01/13"), 4), rep(as.Date("2011/02/10"), 4),rep(as.Date("2011/03/10"), 4))

Name <- c("A","B","C","D","A","C","F","G","A","B","F","H","A","F","H","I")

df <- data.frame(Date, Name)

The stocks vary between the dates, so i need to compute direct transaction costs based on wether i buy, hold or sell each stock. What i would like is to add a column with values of 0,1,2 depending on this, so that:

  • value of 0 if the stock is held at time t-1 and time t
  • value of 1 if the stock is held at time t-1, but sold at time t
  • value of 1 if the stock is bought at time t-1, and held(not sold) at time t
  • value of 2 if if the stock is bought at time t-1, and sold at time t

With the values being assigned to the stock at t-1.

example of what this would look like:

Date          Stock      Status
2010-12-14    A          1
2010-12-14    B          2
2010-12-14    C          1
2010-12-14    D          2

2011-01-13    A          0
2011-01-13    C          1       
2011-01-13    F          1
2011-01-13    G          2

2011-02-10    A          0
2011-02-10    B          1
2011-02-10    F          0
2011-02-10    H          1

2011-03-10    A
2011-03-10    F
2011-03-10    H
2011-03-10    I
signe
  • 37
  • 4

1 Answers1

0

I think your example data is not enough to show all cases, but this should generally do:

 Date <- c(rep(as.Date("2010/12/14"), 4), rep(as.Date("2011/01/13"), 4), rep(as.Date("2011/02/10"), 4),rep(as.Date("2011/03/10"), 4))
    dummy <- rep(1, 16)
    Name <- c("A","B","C","D","A","C","F","G","A","B","F","H","A","F","H","I")
    df <- data.frame(Date, Name, dummy)

    le = LETTERS[1:9]
    temp = CJ(Name=le, Date=unique(Date))

    df = data.table(df)
    setkey(df, Name, Date)
    setkey(temp, Name, Date)
    df = df[temp]


    df[,value:=case_when(dummy==1 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==1 ~ 0,
                         dummy==0 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==1 ~ 1,
                         dummy==1 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==0 ~ 1,
                         dummy==0 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==0 ~ 2,
                         TRUE~88), by=Name][dummy==1]
otwtm
  • 1,779
  • 1
  • 16
  • 27
  • Thank you for your answer @otwtm! I am unfamiliar with the data.table, but think I understand the logic and it looks as if it should work. I am trying to make it fit to my actual data, but I end up getting 1 for all the stocks, which I know is not correct. In the last part of your solution, what does the _TRUE ~ 88_ refer to? – signe Apr 29 '20 at 13:58
  • Hi again, could you provide an example with more data that covers all 4 cases? Then I can figure something out. About the TRUE~88, this means in all other cases choose 88. Was meant as a dummy to see whether a row is not true for any of the 4 cases. – otwtm Apr 30 '20 at 11:18
  • Thank you! I actually got it to work, so it should be fine! Thanks for your help. – signe May 02 '20 at 07:09