2

I got a hardship dealing with some python codes. I've browsed many questions which ask similar questions, but literally I couldn't find keys to solve my problem.

what I really want to do is to delete whole specific paragraph that has random contents (which means, for the example below, from 'paragraph_a' to parenthesis')' ) by using the name of 'paragraph_b' in code.

Here is an input text file format of my code.

some random texts (100+ lines)

...

paragraph_a A_story(    
...
some random texts
...
)

paragraph_b different_story(
...
some random texts
...
)

and below is the desired output

some random texts (100+ lines)

...

story "A is deleted"

paragraph_b different_story(
...
some random texts
...
)

To sum up,

Here summarizes what I want to do.

  1. delete paragraph_a by using the name of the next paragraph(paragraph_b) in the code. (I think I need to set the clear range for this.)
  2. and then, Add certain text like: story "A is deleted" on the deleted part.

I've tried open input files on read mode generated output files on write mode and by readlines() then I made some flags which only becomes '1' if not read paragraph_a.

but It only deletes the first line of the paragraph..

below code is I've tried so far

def erase(file_name: str, start_key: str, stop_key: str):
    try: 
        # read the file lines
        with open('input.txt', 'r+') as fr: 
            lines = fr.readlines()
        # write the file lines except the start_key until the stop_key
        with open('output.txt', 'w+') as fw:

            delete = False

            for line in lines:

                if line.strip('\n') == start_key:
                     delete = True

                elif line.strip('\n') == stop_key:
                     delete = False

                if not delete:
                    fw.write(line)
    except RuntimeError as ex: 
        print(f"erase error:\n\t{ex}")

def main():
    erase('input.txt','paragraph_a','paragraph_b')

if __name__== "__main__":
    main()

but the output becomes same as input..

How can I deal with this? Any answer or clue would be greatly helpful.

Thanks.

Parine
  • 71
  • 6

1 Answers1

1

You can apply a multiline regex on the file content as a whole,

r"^(\w+ \w+\((?:(.|\n)*)\))\s*^paragraph_b"

and then replace the matching group.

See the regex in action here: https://regex101.com/r/pwGVbe/1

Python's re module provides this functionality to you.

ypnos
  • 50,202
  • 14
  • 95
  • 141
  • Thanks!! Do I use that like newline = re.sub(r'^(\w+ \w+\((?:(.|\n)*)\))\s*^paragraph_b', '', line) and write the newline to the output? – Parine Aug 21 '22 at 11:18
  • Can you please give me any detailed example? it seems work but when I tried, actually didn't replace anything.. – Parine Aug 21 '22 at 12:18
  • As I had written, you need to use the regex on the content as a whole. Not on single lines. As it is a multiline regex and includes several lines. – ypnos Aug 21 '22 at 17:19