I have this dataframe which looks like this:
user_id : Represents user
question_id : Represent question number
user_answer : which option user has opted for the specific question from (A,B,C,D)
correct_answer: What is correct answer for that specific question
correct : 1.0 it means user answer is right
elapsed_time : it represents time in minutes user took to answer that question
timestamp : UNIX TIMESTAMP OF EACH INTERACTION
real_date : I have added this column and converted timestamp to human date & time
** user_*iD *** | ** question_*id *** | ** user_*answer *** | ** correct_answer ** | ** correct ** | ** elapsed_*time *** | ** solving_*id *** | ** bundle_*id *** | timestamp | real_date |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | A | A | 1.0 | 5.00 | 1 | b1 | 1547794902000 | Friday, January 18, 2019 7:01:42 AM |
1 | 2 | D | D | 1.0 | 3.00 | 2 | b2 | 1547795130000 | Friday, January 18, 2019 7:05:30 AM |
1 | 5 | C | C | 1.0 | 7.00 | 5 | b5 | 1547795370000 | Friday, January 18, 2019 7:09:30 AM |
2 | 10 | C | C | 1.0 | 5.00 | 10 | b10 | 1547806170000 | Friday, January 18, 2019 10:09:30 AM |
2 | 1 | B | B | 1.0 | 15.0 | 1 | b1 | 1547802150000 | Friday, January 18, 2019 9:02:30 AM |
2 | 15 | A | A | 1.0 | 2.00 | 15 | b15 | 1547803230000 | Friday, January 18, 2019 9:20:30 AM |
2 | 7 | C | C | 1.0 | 5.00 | 7 | b7 | 1547802730000 | Friday, January 18, 2019 9:12:10 AM |
3 | 12 | A | A | 1.0 | 1.00 | 25 | b12 | 1547771110000 | Friday, January 18, 2019 12:25:10 AM |
3 | 10 | C | C | 1.0 | 2.00 | 10 | b10 | 1547770810000 | Friday, January 18, 2019 12:20:10 AM |
3 | 3 | D | D | 1.0 | 5.00 | 3 | b3 | 1547770390000 | Friday, January 18, 2019 12:13:10 AM |
104 | 6 | C | C | 1.0 | 6.00 | 6 | b6 | 1553040610000 | Wednesday, March 20, 2019 12:10:10 AM |
104 | 4 | A | A | 1.0 | 5.00 | 4 | b4 | 1553040547000 | Wednesday, March 20, 2019 12:09:07 AM |
104 | 1 | A | A | 1.0 | 2.00 | 1 | b1 | 1553040285000 | Wednesday, March 20, 2019 12:04:45 AM |
I need to do some encoding , I don't know which encoding should I do and how?
What i need a next dataframe to look like this :
user_id | b1 | b2 | b3 | b4 | b5 | b6 | b7 | b8 | b9 | b10 | b11 | b12 | b13 | b14 | b15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 3 |
3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 3 | 0 | 0 | 0 |
104 | 1 | 0 | 0 | 2 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
As you can see with the help of timestamp and real_date ; the question_id of each user is not sorted, The new dataframe should contain which of the bundles user has interacted with, time-based sorted.