In Pandas, is it possible to have a dataframe with a column that contains a varying number of subcolumns?
For example, suppose I have this CSV file:
transactionId, userName, date, itemList, totalCost
where the itemList
contains a variable number of itemId;itemPrice
pairs, with the pairs separated by a pipe (|
). There is no upper bound on the number of itemId;itemPrice
pairs in the list.
itemId ; itemPrice | itemId ; itemPrice
Here are some examples of rows:
transactionId, userName, date, itemList, totalCost
123, Bob , 7/29/2017, ABC;10|XYZ;20, 30
234, Alice, 7/31/2017, CDE;20|QRS;15|KLM;10, 45
The first row has two itemId;itemPrice
pairs, while the second row has three pairs.
How can I create a dataframe to contain this information? Would I need a dataframe inside a dataframe?
There are other Stackoverflow posts on variable number of columns, but they assume a maximum number of columns.