0

I have a table shown below that I would like to separate into two. The catch is the number of rows varies and I need the table to separate at the third row that contains the string 'CPT'. This row is not always the 10th row. The table is represented by the dataframe dfm. I want to be able to create two new dataframes, one containing the rows above the third row containing 'CPT' and one containing the rows below the third row containing 'CPT'

Dataframe dfm

I have the following code so far to identify all the rows and specifically the third row containing 'CPT' but I'm not sure if I'm on the right track.

dfm.columns
i= dfm.index[dfm['LABEL(click to hide)'] == 'CPT'].tolist()
i
i[-1]
Filter_dfm  = dfm[dfm.index.isin(i)]
Filter_dfm

Edit: I used the following code to split the dataframe, dfm, into multiple dataframes at the rows containing 'CPT'

dfm.rename(columns = {'LABEL(click to hide)' : 'ExSD', '(I)' : 'Risk'}, inplace = True)

m = dfm.ExSD.str.contains('CPT').cumsum()
d = {f'dfm{i}': g for i, g in dfm.groupby(m)}

for k, v in d.items():
    print(k)
    print(v, end='\n\n')

Then to return the data frames I wanted I used Input:

d['dfm2']

Output:

ExSD    Risk
1   CPT     % Utilization Cumulative (With No New Charge) ...
2   05/1622:00  115.97 %
3   05/1700:45  37.3 %
4   05/1705:00  34.21 %
5   05/1712:00  22.27 %
6   05/1714:00  30.01 %
7   05/1716:00  33.93 %
8   05/1717:00  42.04 %
9   05/1718:00  46.48 %

Input:

d['dfm3']

Output:

    ExSD    Risk
10  CPT     % Utilization Cumulative (With No New Charge) ...
11  05/1622:00  8.24 %
12  05/1700:45  53.05 %
13  05/1705:00  36.04 %
14  05/1712:00  21.21 %
15  05/1714:00  20.65 %
16  05/1716:00  21.13 %
17  05/1717:00  23.09 %
18  05/1718:00  23.46 %
Jean123
  • 129
  • 9
  • Please [don't post images of textual data](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors). – tripleee May 11 '21 at 04:57

1 Answers1

0

The answer has been revised because the requirement has been added for splitting with irregular number of lines. The modification is to find the index, which is supposed to be the header of the data frame, and split it using the numpy split function. Then use that split information to split in the original data frame, delete the rows that appear to be headers, and update the column names.

mask = df[df['value1'] == 'CPT'].index.tolist()
mask
[0, 1, 10]
mask2 = np.array_split(df.index, np.array(mask))
mask2
[Int64Index([], dtype='int64'),
 Int64Index([0], dtype='int64'),
 Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64'),
 Int64Index([10, 11, 12, 13, 14, 15, 16, 17, 18], dtype='int64')]

df1 = df[df.index.isin(mask2[2])]
df2 = df[df.index.isin(mask2[3])]

df1.drop(df.index[1], inplace=True)
df2.drop(df.index[10], inplace=True)

df1.columns = df.iloc[0]
df2.columns = df.iloc[0]

df1

CPT % Utilization Cumulative(With No New Change)
2 04/2905:00 173.31%
3 04/2912:00 29.19%
4 04/2914:00 32.62%
5 04/2916:00 35.4%
6 04/2917:00 45.3%
7 04/2918:00 50.12%
8 04/2922:00 44.16%
9 04/3000:00 41.58%
r-beginners
  • 31,170
  • 3
  • 14
  • 32
  • 1
    If the data had been provided as text instead of images, we would have gotten a quicker response. If my answer helped you, please consider accepting it as the correct answer – r-beginners May 11 '21 at 04:54
  • Thank you for your response. When I run this I get a key error: 'value1'. Also say if 'CPT' was in the 9th column or 11th column would this still work? As the key issue is splitting the table but the number of columns is not fixed. If it stills applies I do apologize, I'm by no means an expert. – Jean123 May 12 '21 at 16:09
  • The assumption of my answer is that the code succeeds when the number of rows to be split is the same. Since the data provided had the same number of rows, I used reshape() to handle it. – r-beginners May 13 '21 at 02:32
  • That's the issue I'm having. The third line containing 'CPT', in this instance the 10th row, can potentially be in another row. Hence I'm trying to find a solution where I locate the 3rd row containing 'CPT' and split it at that point. In the code I shared I managed to create an array that returned the index of just the rows with 'CPT' but got stuck with progressing further. – Jean123 May 13 '21 at 16:28
  • Added support for irregular line splitting. – r-beginners May 15 '21 at 05:23
  • I've managed to find a solution using str.contains('CPT'). I've made the edit above in case it can be of help to anyone else. – Jean123 May 16 '21 at 22:03
  • We have identified another approach. String search is also a valid approach. If my answer was helpful, please accept my response. – r-beginners May 17 '21 at 01:18