1

I have a large .txt file that is a result of a C-file being parsed containing various blocks of data, but about 90% of them are useless to me. I'm trying to get rid of them and then save the result to another file, but have hard time doing so. At first I tried to delete all useless information in unparsed file, but then it won't parse. My .txt file is built like this:

//Update: Files I'm trying to work on comes from pycparser module, that I found on a GitHub.

File before being parsed looks like this:

enter image description here

And after using pycparser

file_to_parse = pycparser.parse_file(current_directory + r"\D_Out_Clean\file.d_prec")

enter image description here

I want to delete all blocks that starts with word Typedef. This module stores this in an one big list that I can access via it's attribute. enter image description here

Currently my code looks like this:

len_of_ext_list = len(file_to_parse.ext)
i = 0
while i < len_of_ext_list:
    if 'TypeDecl' not in file_to_parse.ext[i]:
        print("NOT A TYPEDECL")
        print(file_to_parse.ext[i], type(file_to_parse.ext[i]))
        parsed_file_2 = open(current_directory + r"\Zadanie\D_Out_Clean_Parsed\clean_file.d_prec", "w+")
        parsed_file_2.write("%s%s\n" %("", file_to_parse.ext[i]))
        parsed_file_2.close
        #file_to_parse_2 = file_to_parse.ext[i]
    i+=1

But above code only saves one last FuncDef from a unparsed file, and I don't know how to change it. So, now I'm trying to get rid of all typedefs in parsed file as they don't have any valuable information for me. I want to now what functions definitions and declarations are in file, and what type of global variables are stored in parsed file. Hope this is more clear now.

1 Answers1

1

I suggest reading the entire input file into a string, and then doing a regex replacement:

with open(current_directory + r"\D_Out\file.txt", "r+") as file:
    with open(current_directory + r"\D_Out_Clean\clean_file.txt", "w+") as output:
        data = file.read()
        data = re.sub(r'type(?:\n\{.*?\}|[^;]*?;)\n?', '', data, flags=re.S)
        output.write(line)

Here is a regex demo showing that the replacement logic is working.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks @Tim Biegeleisen for an answer. It works ok, as it deletes almost the content. But I have a follow up questions. Sometimes there is something after the **}** i.e. { uint16 n; uint16 k; }std_VersionType And right now this line is left untouched. What should be added to delete it? – Redal_Snake Nov 15 '22 at 09:50
  • I can only reply to you by seeing the real data. If you want me to edit my answer, then update your question. – Tim Biegeleisen Nov 15 '22 at 09:54
  • Ok, I updated my question with all the details. I'm sorry if at first it was not clear, as english is not my first language and I was in a rush. – Redal_Snake Nov 16 '22 at 08:20