1


I'm in an introductory Python undergraduate class and I'm working on a text file.
An example of its contents can be seen below:

Special Type A Sunflower 
2017-10-19 18:20:30
Asteraceae
Brought to the USA by Europeans
Ingredient for Sunflower Oil
Needs full sun
Moist Soil, with heavy mulch
Water only when top 2 inches of soil is dry

Tropical Sealion
2020-04-25 12:10:05
Pinnipeds 
Mostly found in zoos
Likes Fish
Likes Balls
Likes Zookeepers

Honey Badger
2018-06-06 16:15:25
Mustelidae
Eats anything

Currently, I'm trying to convert these lines to become the values of a dictionary, by making only 3 keys.

The first key is "Name", the corresponding value would be every first line of every text block.
The second key is "Date", the corresponding value would be every second line of every text block.
The third key is "Information", the corresponding value would be every third line and beyond of every text block, stopping at the space between the text blocks. I believe this should be a list of values too.

My progress is here:

import itertools
import os

MyFilePath = os.getcwd() # absolute directory the file is in
ActualFile = "myplants.txt"
FinalFilePath = os.path.join(MyFilePath, ActualFile)

def TextFileToDictionary():

    dictionary_1 = {}

    textfile = open(FinalFilePath, 'r')
    first_line = textfile.readline()
    second_line = textfile.readline()
    third_line = textfile.readline()
    for line in textfile:
        dictionary_1["name"] = first_line
        dictionary_1["date"] = second_line
        dictionary_1["information"] = third_line
    print(dictionary_1)
    textfile.close()

TextFileToDictionary()

Although I have parsed the lines as values in a dictionary,
I am unable to iterate them over every text block to ensure all text blocks become dictionary values.
I am also unable to convert every third line and beyond, to become a list of values.

Do note that the text blocks are of uneven lengths.

So the end result should resemble:

dictionary_1 = {'Name' : "Special Type A Sunflower", 'Date' : "2017-10-19 18:20:30", 'Information' : ["Asteraceae, Brought to the USA by Europeans, Ingredient for Sunflower Oil, Needs full sun, Moist Soil, with heavy mulch, Water only when top 2 inches of soil is dry"]}

dictionary_2 = {'Name' : "Tropical Sealion", "Date" : "2020-04-25 12:10:05", "Information" : ["Pinnipeds, Mostly found in zoos, Likes Fish, Likes Balls, Likes Zookeepers"]}

And so on.

Does anyone know how to change the code to resemble the desired end result?
Many thanks!

TropicalMagic
  • 104
  • 2
  • 11
  • Are you sure it should be `"Asteraceae, Brought to the USA by Europeans, Ingredient for Sunflower Oil, Needs full sun, Moist Soil, with heavy mulch, Water only when top 2 inches of soil is dry"` rather than `"Asteraceae", "Brought to the USA by Europeans", "Ingredient for Sunflower Oil", "Needs full sun", "Moist Soil", "with heavy mulch", "Water only when top 2 inches of soil is dry"`? – Acccumulation Aug 14 '20 at 03:50
  • `third_line` you need to read until you find a empty line or end of the file – deadshot Aug 14 '20 at 03:57
  • @Acccumulation would it be possible for the first option? – TropicalMagic Aug 14 '20 at 05:10

3 Answers3

3

My solution for the TextFileToDictionary() function is as follows:

data = [] #Blank list
with open(FinalFilePath, "r") as file:  #Open file
  sections = file.read().split("\n\n")  #Split it by double linebreaks
  for section in sections:              #Iterate through sections
    lines = section.split("\n")         #Split sections by linebreaks
    if len(lines) < 3:                  #Make sure that there is the correct amount of lines
      return "ERROR!"
    data.append({                       #Add a dictionary to the data with:
      "Name": lines[0],                 #First line: name
      "Date": lines[1],                 #Second line: date
      "Information": lines[2:]          #Third line and onwards: info
    })
return data                             #Returns a list of dictionaries containing the data about each species

If you ran the function on your sample file, it should return the following:

[
  {
    "Name": "Special Type A Sunflower",
    "Date": "2017-10-19 18:20:30",
    "Information": ["Asteraceae", "Brought to the USA by Europeans" etc... ]
  },
  {
    "Name": "Tropical Sealion",
    "Date": "2020-04-25 12:10:05",
    "Information": ["Pinnipeds", "Mostly found in zoos" etc... ]
  } #and so on.
]
  • Many thanks! It was what I was looking for! I was also wondering if the list of values for the key : "Information" could be combined into a single element as well. – TropicalMagic Aug 14 '20 at 05:01
  • @TropicalMagic How would you want them combined? You can use `"SEPARATOR".join(lines[2:])` to join them with a separator, or use the 'pythony' way of list comprehensions: `"".join([(info) for info in lines[2:]])`, and edit the first bit in brackets. – ObsoleteAwareProduce Aug 14 '20 at 18:53
  • Nice! Thanks for the tip! – TropicalMagic Aug 16 '20 at 06:09
1

Simpler Version:

def dicter(file):
    with open(file, 'r') as f:
        dics = []
        blocks = [x. split('\n') for x in f.read().split('\n\n')]
        for block in blocks:
            dics.append(dict(Name=block[0], Date=block[1], Information=block[2:]))
        return dics

print(dicter('your/path/to/file'))
omdo
  • 161
  • 6
1

I would use regex, split and destructuring assignment.

I would suggest you read your file through with statement, this way you don't have to explicitly close the file.

with open('myplants.txt') as file:
    text = file.read()

Supposed you already read your file and text is the content.

import re

text = """
Special Type A Sunflower 
2017-10-19 18:20:30
Asteraceae
Brought to the USA by Europeans
Ingredient for Sunflower Oil
Needs full sun
Moist Soil, with heavy mulch
Water only when top 2 inches of soil is dry

Tropical Sealion
2020-04-25 12:10:05
Pinnipeds 
Mostly found in zoos
Likes Fish
Likes Balls
Likes Zookeepers

Honey Badger
2018-06-06 16:15:25
Mustelidae
Eats anything
"""

regex = re.compile('(?:[^\n]+\n)+', re.MULTILINE)

def parse(section):
  name, date_value, *information = section.strip().split('\n')
  return {
    'Name': name,
    'Date': date_value,
    'Information': information
  }

sections = [section for section in regex.findall(text)]
parsed_sections = [parse(section) for section in sections]

for parsed in parsed_sections:
  print(parsed)
  print()

Output

{'Name': 'Special Type A Sunflower ', 'Date': '2017-10-19 18:20:30', 'Information': ['Asteraceae', 'Brought to the USA by Europeans', 'Ingredient for Sunflower Oil', 'Needs full sun', 'Moist Soil, with heavy mulch', 'Water only when top 2 inches of soil is dry']}

{'Name': 'Tropical Sealion', 'Date': '2020-04-25 12:10:05', 'Information': ['Pinnipeds ', 'Mostly found in zoos', 'Likes Fish', 'Likes Balls', 'Likes Zookeepers']}

{'Name': 'Honey Badger', 'Date': '2018-06-06 16:15:25', 'Information': ['Mustelidae', 'Eats anything']}
bertdida
  • 4,988
  • 2
  • 16
  • 22