0

I'm working on a Python(3.6) project in which I need to parse some text files from a directory structure.

Directory structure as:

--easy(root dir)
----sub_dir
-------another_sub_dir
-----------description( another sub dir)
------------------ description.txt (file)

I need to iterate through all of the descriptions.txt files from subdirectories and then parse them into the database.

the description.txt file is formatted in a standard formate as:

Start with a text paragraph then we have Input, output, constraints, Example > input, output and Explanation headings. We need to save the description.txt file in the database as these headings will convert into a DB table column.

I have tried to iterate through the directory structure to find all description.txt files as:

import os
for root, dirs, files in os.walk(os.path.join('easy')):
    for file in files:
        if file.endswith('description.txt'):
            print(os.path.join(root, file))

In this way we can get all descriptions.txt files but how we can parse them by using headings inside that text file and save them into the database.

How can we accomplish that? Hlep me, please!

Thanks in advance!

Abdul Rehman
  • 5,326
  • 9
  • 77
  • 150
  • Is the description.txt delimited by commas? I would need more details but usually my go to tool would be pandas and its read_csv function. See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html . – Kent Shikama Dec 23 '17 at 07:25
  • it’s delimited by a blank lines! – Abdul Rehman Dec 23 '17 at 07:26
  • Here's you can take a look at a sample file: https://mega.nz/#!4MU2wKgC!-YcaMXRAFi-cqUzDVABvYzOMKj8015Q1XDGwc1FK0lI – Abdul Rehman Dec 23 '17 at 07:57

1 Answers1

0

You can just save the header as a list and then split it later:-

with open(description.txt) as desc_file:
    Heading1 = "keep reading until you get 2 blank lines in a row"
    Heading2 = "keep reading until you get 2 blank lines in a row"
    .
    .
    Last_ Heading = ditto

Now you can map those headers to your table columns.

EDIT:- With open you might want to specify what kind of encoding file has, it defers from system to system.

Muku
  • 538
  • 4
  • 18
  • it returns that error: `UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1667: ordinal not in range(128)` – Abdul Rehman Dec 23 '17 at 07:59
  • @AbdulRehman, thats a encoding error, you might want to see what kind of encoding your file has. It defers from system to system. Try using with "open(description.txt, encoding = 'utf-8')" – Muku Dec 23 '17 at 08:07
  • I just have prints the `desc_header_list` and it returns, first lines from all description.txt files, I need to parse the files on the base of headings like input, output, you can see an example file here at: mega.nz/#!4MU2wKgC!-YcaMXRAFi-cqUzDVABvYzOMKj8015Q1XDGwc1FK0‌​lI – Abdul Rehman Dec 23 '17 at 08:14
  • @AbdulRehman I see, the orders or headings are consistent and will be same for the all the file. Also see that whenever a headings comes it has 2 blank lines before it. You can just keep storing values until 2 blank line comes and then store the next lines into the upcoming heading – Muku Dec 23 '17 at 08:22
  • @AbdulRehman just try yourself, i am sure you will be able to do it. – Muku Dec 23 '17 at 08:25