Column to Transacction ID for association rules on dataframes from Pandas Python.

Question

I imported a CSV into Python with Pandas and I would like to be able to use one as the columns as a transaction ID in order for me to make association rules.

(link: https://github.com/antonio1695/Python/blob/master/nearBPO/facturas.csv)

I hope someone can help me to:

Use UUID as a transaction ID for me to have a dataframe like the following:

UUID     Desc
123ex    Meat,Beer

In order for me to get association rules like: {Meat} => {Beer}.

Also, a recommendation on a library to do so in a simple way would be appreciated.

Thank you for your time.

Sorry are you after `df.loc[df['UUID'] == some_id', 'Desc']`? Or something like `df.groupby('UUID')['Desc'].apply(list)`? — EdChum, Jun 29 '16 at 18:40
Second worked perfectly! However, the type that it gives me back is **pandas.core.series.Series**, is there a way to keep it as a dataframe? If it is editable as any dataframe alike and import as one I guess you just answered my question. So you can post it in answers and I can put you +1 and mark my question as answered. :) @EdChum — Antonio López Ruiz, Jun 29 '16 at 18:51
I also thought of something like this: ``pd.pivot_table(df_du,index=["UUID"], values=["Desc"])`` but it isn't working. @EdChum — Antonio López Ruiz, Jun 29 '16 at 18:59
Not sure what your aversion is to `Series` you can use them pretty much the same as a df, you can also just call `reset_index` on the groupby object — EdChum, Jun 29 '16 at 19:03
I need it to be dataframe in order for me to export it to R. Still, ``df.reset_index()`` worked excellent. Can you put your comment on answer? — Antonio López Ruiz, Jun 29 '16 at 19:10
is there some kind of special format for `R`, are you exporting to csv? you can still do that with a series — EdChum, Jun 29 '16 at 19:10
I am exporting to csv, R takes dataframes, and I don't know very well series. — Antonio López Ruiz, Jun 29 '16 at 19:12

score 2 · Accepted Answer · answered Jun 29 '16 at 19:12

You can aggregate values into a list by doing the following:

df.groupby('UUID')['Desc'].apply(list)

This will give you what you want, if you want the UUID back as a column you can call reset_index on the above:

df.groupby('UUID')['Desc'].apply(list).reset_index()

Also for a Series you can still export this to a csv same as with a df:

df.groupby('UUID')['Desc'].apply(list).to_csv(your_path)

You may need to name your index prior to exporting or if you find it easier just reset_index to restore the index back as a column and then call to_csv

Column to Transacction ID for association rules on dataframes from Pandas Python.

1 Answers1