I am new to using regex.
I have a string in the form
Waco, Texas
Unit Dose 13 and
SECTION 011100 SUMMARY OF WORK
INDEX PAGE
PART 1. - GENERAL 1
1.1. RELATED DOCUMENTS 1
1.2. PROJECT DESCRIPTION 1
1.3. OWNER 1
1.4. ARCHITECT/ENGINEER 2
1.5. PURCHASE CONTRACTS 2
1.6. OWNER-FURNISHED ITEMS 2
1.7. CONTRACTOR-FURNISHED ITEMS 3
1.8. CONTRACTOR USE OF PREMISES 3
1.9. OWNER OCCUPANCY 3
1.10. WORK RESTRICTIONS 4
PART 2. - PRODUCTS - NOT APPLICABLE 4
PART 3. - EXECUTION - NOT APPLICABLE 4
I apologize for the extra white space, but this is the form of the word document I parsed to obtain the string.
I need to capture all of the heading between PART 1 PART 2 and PART 3 and store them in a different list. So far I have
matchedtext = re.findall('(?<=PART) (.*?) (?=PART)', text, re.DOTALL)
If I understand correctly, these look arounds should use PART as a sort of base point and grab the text in between. However, matchedtext does not fill with anything when I run the code.
The second part of my problem is once I have the text in between the different occurrences of PART how can I save just the capitalized headings in a list with a string for each heading. Some of my strings from the word documents contain lowercase words, but I just want the words that are all in caps.
So to summarize how can I grab the text between specific words in a string and once I have them how can I save the words as individual strings in a list.
Thanks for the help! :D