I have a pretty bad way to convert my input logs to the input dataset. I have an SFrame sf with the following format:
user_id int
timestamp datetime.datetime
action int
reasoncode str
action column takes up 9 values ranging from 1 to 9.
So, every user_id can perform more than 1 action, more than once.
I am trying to obtain all unique user_id from sf and create an op_sf in the following manner:
y = 225
def calc_class(a,x):
diffd = a['timestamp'].apply(lambda x: (dte - x).days)
g = 0
b = 0
for i in diffd:
if i > y:
g += 1
else:
b += 1
if b>= x:
return 4
elif b!= 0:
return 3
elif g>= 0:
return 2
else:
return 1
l1 = []
ids = z['user_id'].unique()
for idd in ids:
temp = sf[sf['user_id']== idd]
zero1 = temp[temp['action'] == 1]
zero2 = temp[temp['action'] == 2]
zero3 = temp[temp['action'] == 3]
zero4 = temp[temp['action'] == 4]
zero5 = temp[temp['action'] == 5]
zero6 = temp[temp['action'] == 6]
zero7 = temp[temp['action'] == 7]
zeroh8 = temp[temp['reasoncode'] == 'xyz']
zero9 = temp[temp['reasoncode'] == 'abc']
/* I'm getting clas1 to clas9 from function calc_class for each action
clas1 to clas9 are 4 integers ranging from 1 to 4
*/
clas1 = calc_class(zero1,2)
clas2 = calc_class(zero2,2)
clas3 = calc_class(zero3,2)
clas4 = calc_class(zero4,2)
clas5 = calc_class(zero5,2)
clas6 = calc_class(zero6,2)
clas7 = calc_class(zero7,2)
clas8 = calc_class(zero8,2)
clas9 = calc_class(zero9,2)
l1.append([idd,clas1,clas2,clas3,clas4,clas5*(-1),clas6*(-1),clas7*(-1),clas8*(-1),clas9])
I wanted to know if this is the fastest way of doing this. Specifically if it is possible to do the same thing without generating the zero1 to zero9 SFrames.
An example sf:
user_id timestamp action reasoncode
574 23/09/15 12:43 1 None
574 23/09/15 11:15 2 None
574 06/10/15 11:20 2 None
574 06/10/15 11:21 3 None
588 04/11/15 10:00 1 None
588 05/11/15 10:00 1 None
555 15/12/15 13:00 1 None
585 22/12/15 17:30 1 None
585 15/01/16 07:44 7 xyz
588 06/01/16 08:10 7 abc
l1 corresponding to the above sf:
574 1 2 2 0 0 0 0 0 0
588 3 0 0 0 0 0 0 0 3
555 3 0 0 0 0 0 0 0 0
585 3 0 0 0 0 0 0 3 0