-2

I've got a dataframe which contains a csv of selling KPIs (quantity, article number and the corresponding date) I need to split the dataframe into multiple with each containing the data to one article number (e.g. frame1= 123, frame2=345 and so on. )

How can I dynamically split like this for a further use in sklearns kmean? (match different article numbers and their selling KPI) thanks a lot

Markus
  • 684
  • 5
  • 11
Seppl98
  • 21
  • 4
  • pls *share a concrete example* of your input & required output - as is, your question is too vague and unclear – desertnaut Jul 04 '18 at 18:17

1 Answers1

0

You can group by the article number using groupBy.

grouped = df.groupby(['article_number'])    

You can then access the individual groups using

grouped.groups

or directly apply aggregation functions like grouped.sum(['quantity']) to get a new frame with the respective values for each group.

Also refer to the User Guide.

Markus
  • 684
  • 5
  • 11
  • Thanks, found that out too. The problem is the dataframes are not equal in their length, so kmeans does not accept the array of splitted frames (in your answer grouped, whoch means kmeans.fit(grouped) does not work). So should i fill the splitted frames with 0 till they have the same length? and how to do this? – Seppl98 Jul 04 '18 at 11:46
  • I do not understand what you are trying to achieve by using KMeans onto the grouped DF. Maybe you can edit your question and describe your expected outcome in more detail. – Markus Jul 04 '18 at 11:54
  • i just want to run the kmeans on all article numbers to match some equalities within them – Seppl98 Jul 04 '18 at 11:55
  • KMeans clusters (or in other words "groups") entries. It seems counter-intuitive to try to cluster pre-existing groups all together. Do you want to (i) apply the clustering to each individual group, or (ii) apply the clustering to the whole dataset and then see which clusters entries from each group fall into? Again, I advise to rethink your expected outcome and verbosely describe it in the original question. – Markus Jul 04 '18 at 12:09