Here is my problem: For example, I have a table contains people's behavior information in a month (multiple features), each person has a unique ID and a unique label (0 and 1). What I want to do is using these features to predict whether a customer belongs to group 0 / 1.
However, the problem is the features of each ID are collected and recorded multiple times, which means I have multiple rows belong to a same ID. So how can I structure my data and build a feature matrix where one ID corresponds one row of features and one lable?
Feature
ID feature1 feature2 feature3 ...
1 2 1.5 1 ...
2 1 3 0 ...
3 1 2 1 ...
1 2.5 1 1 ...
3 0.8 1 0 ...
...
Lable
ID lable
1 0
2 1
3 0
...
sample: two dataframe
Is there a way that can take these multiple rows of features into account as much as possible and create a feature matrix corresponding one by one?
My personal idea so far: First, compute the time that each ID shows as a new feature. Second, clusterng each ID into two clusters and use the cluster center of the majority one as the feature array of that ID.
Anyone can help me? Thanks a lot!