2

I imported a CSV into Python with Pandas and I would like to be able to use one as the columns as a transaction ID in order for me to make association rules.

(link: https://github.com/antonio1695/Python/blob/master/nearBPO/facturas.csv)

I hope someone can help me to:

Use UUID as a transaction ID for me to have a dataframe like the following:

UUID     Desc
123ex    Meat,Beer

In order for me to get association rules like: {Meat} => {Beer}.

Also, a recommendation on a library to do so in a simple way would be appreciated.

Thank you for your time.

Antonio López Ruiz
  • 1,396
  • 5
  • 20
  • 36
  • Sorry are you after `df.loc[df['UUID'] == some_id', 'Desc']`? Or something like `df.groupby('UUID')['Desc'].apply(list)`? – EdChum Jun 29 '16 at 18:40
  • Second worked perfectly! However, the type that it gives me back is **pandas.core.series.Series**, is there a way to keep it as a dataframe? If it is editable as any dataframe alike and import as one I guess you just answered my question. So you can post it in answers and I can put you +1 and mark my question as answered. :) @EdChum – Antonio López Ruiz Jun 29 '16 at 18:51
  • I also thought of something like this: ``pd.pivot_table(df_du,index=["UUID"], values=["Desc"])`` but it isn't working. @EdChum – Antonio López Ruiz Jun 29 '16 at 18:59
  • Not sure what your aversion is to `Series` you can use them pretty much the same as a df, you can also just call `reset_index` on the groupby object – EdChum Jun 29 '16 at 19:03
  • I need it to be dataframe in order for me to export it to R. Still, ``df.reset_index()`` worked excellent. Can you put your comment on answer? – Antonio López Ruiz Jun 29 '16 at 19:10
  • is there some kind of special format for `R`, are you exporting to csv? you can still do that with a series – EdChum Jun 29 '16 at 19:10
  • I am exporting to csv, R takes dataframes, and I don't know very well series. – Antonio López Ruiz Jun 29 '16 at 19:12

1 Answers1

2

You can aggregate values into a list by doing the following:

df.groupby('UUID')['Desc'].apply(list)

This will give you what you want, if you want the UUID back as a column you can call reset_index on the above:

df.groupby('UUID')['Desc'].apply(list).reset_index()

Also for a Series you can still export this to a csv same as with a df:

df.groupby('UUID')['Desc'].apply(list).to_csv(your_path)

You may need to name your index prior to exporting or if you find it easier just reset_index to restore the index back as a column and then call to_csv

EdChum
  • 376,765
  • 198
  • 813
  • 562