How to split dataframe with multiple types of information into separate dataframes based on string?

Question

I have the following dataframe where there are multiple data sets in one CSV file. In this case "Cash Balance", "Account Order History", and "Equities" (information left blank intentionally). I want to create the Cash Balance information into one dataframe and Account Order History into another. My thinking here was to look at the index of the first column and see if it equals "Cash Balance" and then read each row until the index = "Account Order History" and so on and so forth but not sure if this is the correct approach.

How do I code this out using python? Please help thanks!

Cash Balance
Date        Time     Type    ID#    Commission  Amount   Balance
11/9/20     9:30am   Single  1234   2%          $200     $2500
11/9/20     9:40am   Single  1234   2%          $200     $2500
11/9/20     9:45am   Single  2234   2%          $200     $2500
Account Order History
Notes                Time    Spread Side        Qty      Price   Symbol Order
                     9:30am  STOCK  BUY         10       $42.87  NIO    Filled
                     9:30am  STOCK  Sell        10       $43.87  NIO    Filled
Equities

Does this answer your question? [Pandas split DataFrame by column value](https://stackoverflow.com/questions/33742588/pandas-split-dataframe-by-column-value) — Cainã Max Couto-Silva, Dec 16 '20 at 03:20
No it does not @CainãMaxCouto-Silva because I want to make a separate Cash Balance Df and a Account Order History Dataframe. I think it will need to be an index related answer. — python_learner, Dec 16 '20 at 03:25
Account Order History dataframe will consist of notes and all rows below it until Equities. Equities will be a separate dataframe — python_learner, Dec 16 '20 at 03:40

score 0 · Answer 1 · answered Dec 16 '20 at 04:04

Is this what you are looking for ?

import pandas as pd
df = pd.read_csv("new.csv",header=None)
df

0   1   2   3   4   5   6   7
0   Cash Balance    NaN NaN NaN NaN NaN NaN NaN
1   Date    Time    Type    ID# Commission  Amount  Balance NaN
2   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
4   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN
5   Account Order History   NaN NaN NaN NaN NaN NaN NaN
6   Notes   Time    Spread  Side    Qty Price   Symbol  Order
7   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
8   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled

table_names = ["Cash Balance", "Account Order History"]
groups = df[0].isin(table_names).cumsum()
df_combined = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}


cash_balance = df_combined['Cash Balance'].reset_index(drop=True)
cash_balance.columns = cash_balance.iloc[0]
cash_balance.drop(cash_balance.index[0], inplace = True)

cash_balance

Date    Time    Type    ID# Commission  Amount  Balance NaN
1   11/09/2020  9:30am  Single  1234    2%  $200    $2,500  NaN
2   11/09/2020  9:40am  Single  1234    2%  $200    $2,500  NaN
3   11/09/2020  9:45am  Single  2234    2%  $200    $2,500  NaN

acct_order_hist = df_combined['Account Order History'].reset_index(drop=True)
acct_order_hist.columns = acct_order_hist.iloc[0]
acct_order_hist.drop(acct_order_hist.index[0], inplace = True)
acct_order_hist

Notes   Time    Spread  Side    Qty Price   Symbol  Order
1   NaN 9:30am  STOCK   BUY 10  $42.87  NIO Filled
2   NaN 9:30am  STOCK   Sell    10  $43.87  NIO Filled

Cainã Max Couto-Silva · Accepted Answer · 2020-12-16T04:18:12.843

There are multiple ways to get it done.

Using numpy.split():

idx = [(df.index=='Account Order History').tolist().index(True)]
idx.append((df.index=='Equities').tolist().index(True))
np.split(df, idx)

or with list comprehension:

idx = [0] + idx + [df.shape[0]] # needed to use the code chunk above
[df.iloc[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

Using pandas category and groupby:

cat = pd.Categorical(df.index, categories=['Cash Balance','Account Order History','Equities'])
cat = cat.fillna('Account Order History')
[v for k, v in df.groupby(cat)]

As an alternative to more complex situations, you can first split the dataframe by indexes into a dictionary of dataframes, as presented in this question:

dict(tuple(df.groupby(level=0)))

where the keys are the indexes, and the values the dataframes. This alone should not solve your problem, but may help you to manage the dataframes as intended.

Thank you so much Caina! this works perfectly – python_learner Dec 16 '20 at 05:57 — python_learner, Dec 16 '20 at 05:57

How to split dataframe with multiple types of information into separate dataframes based on string?

2 Answers2