Python - How to make every first 3 lines of text blocks into dictionary values?

Question

I'm in an introductory Python undergraduate class and I'm working on a text file.
An example of its contents can be seen below:

Special Type A Sunflower 
2017-10-19 18:20:30
Asteraceae
Brought to the USA by Europeans
Ingredient for Sunflower Oil
Needs full sun
Moist Soil, with heavy mulch
Water only when top 2 inches of soil is dry

Tropical Sealion
2020-04-25 12:10:05
Pinnipeds 
Mostly found in zoos
Likes Fish
Likes Balls
Likes Zookeepers

Honey Badger
2018-06-06 16:15:25
Mustelidae
Eats anything

Currently, I'm trying to convert these lines to become the values of a dictionary, by making only 3 keys.

The first key is "Name", the corresponding value would be every first line of every text block.
The second key is "Date", the corresponding value would be every second line of every text block.
The third key is "Information", the corresponding value would be every third line and beyond of every text block, stopping at the space between the text blocks. I believe this should be a list of values too.

My progress is here:

import itertools
import os

MyFilePath = os.getcwd() # absolute directory the file is in
ActualFile = "myplants.txt"
FinalFilePath = os.path.join(MyFilePath, ActualFile)

def TextFileToDictionary():

    dictionary_1 = {}

    textfile = open(FinalFilePath, 'r')
    first_line = textfile.readline()
    second_line = textfile.readline()
    third_line = textfile.readline()
    for line in textfile:
        dictionary_1["name"] = first_line
        dictionary_1["date"] = second_line
        dictionary_1["information"] = third_line
    print(dictionary_1)
    textfile.close()

TextFileToDictionary()

Although I have parsed the lines as values in a dictionary,
I am unable to iterate them over every text block to ensure all text blocks become dictionary values.
I am also unable to convert every third line and beyond, to become a list of values.

Do note that the text blocks are of uneven lengths.

So the end result should resemble:

dictionary_1 = {'Name' : "Special Type A Sunflower", 'Date' : "2017-10-19 18:20:30", 'Information' : ["Asteraceae, Brought to the USA by Europeans, Ingredient for Sunflower Oil, Needs full sun, Moist Soil, with heavy mulch, Water only when top 2 inches of soil is dry"]}

dictionary_2 = {'Name' : "Tropical Sealion", "Date" : "2020-04-25 12:10:05", "Information" : ["Pinnipeds, Mostly found in zoos, Likes Fish, Likes Balls, Likes Zookeepers"]}

And so on.

Does anyone know how to change the code to resemble the desired end result?
Many thanks!

Are you sure it should be `"Asteraceae, Brought to the USA by Europeans, Ingredient for Sunflower Oil, Needs full sun, Moist Soil, with heavy mulch, Water only when top 2 inches of soil is dry"` rather than `"Asteraceae", "Brought to the USA by Europeans", "Ingredient for Sunflower Oil", "Needs full sun", "Moist Soil", "with heavy mulch", "Water only when top 2 inches of soil is dry"`? — Acccumulation, Aug 14 '20 at 03:50
`third_line` you need to read until you find a empty line or end of the file — deadshot, Aug 14 '20 at 03:57

ObsoleteAwareProduce · Accepted Answer · 2020-08-14T03:44:19.587

My solution for the TextFileToDictionary() function is as follows:

data = [] #Blank list
with open(FinalFilePath, "r") as file:  #Open file
  sections = file.read().split("\n\n")  #Split it by double linebreaks
  for section in sections:              #Iterate through sections
    lines = section.split("\n")         #Split sections by linebreaks
    if len(lines) < 3:                  #Make sure that there is the correct amount of lines
      return "ERROR!"
    data.append({                       #Add a dictionary to the data with:
      "Name": lines[0],                 #First line: name
      "Date": lines[1],                 #Second line: date
      "Information": lines[2:]          #Third line and onwards: info
    })
return data                             #Returns a list of dictionaries containing the data about each species

If you ran the function on your sample file, it should return the following:

[
  {
    "Name": "Special Type A Sunflower",
    "Date": "2017-10-19 18:20:30",
    "Information": ["Asteraceae", "Brought to the USA by Europeans" etc... ]
  },
  {
    "Name": "Tropical Sealion",
    "Date": "2020-04-25 12:10:05",
    "Information": ["Pinnipeds", "Mostly found in zoos" etc... ]
  } #and so on.
]

Many thanks! It was what I was looking for! I was also wondering if the list of values for the key : "Information" could be combined into a single element as well. — TropicalMagic, Aug 14 '20 at 05:01
@TropicalMagic How would you want them combined? You can use `"SEPARATOR".join(lines[2:])` to join them with a separator, or use the 'pythony' way of list comprehensions: `"".join([(info) for info in lines[2:]])`, and edit the first bit in brackets. — ObsoleteAwareProduce, Aug 14 '20 at 18:53

score 1 · Answer 2 · answered Aug 14 '20 at 04:03

1

Simpler Version:

def dicter(file):
    with open(file, 'r') as f:
        dics = []
        blocks = [x. split('\n') for x in f.read().split('\n\n')]
        for block in blocks:
            dics.append(dict(Name=block[0], Date=block[1], Information=block[2:]))
        return dics

print(dicter('your/path/to/file'))

answered Aug 14 '20 at 04:03

omdo

161
6

Thank you! This looks great too! I was also wondering if there were a way to separate the dictionaries into new lines. – TropicalMagic Aug 14 '20 at 04:58

bertdida · Answer 3 · 2020-08-14T04:43:18.990

I would use regex, split and destructuring assignment.

I would suggest you read your file through with statement, this way you don't have to explicitly close the file.

with open('myplants.txt') as file:
    text = file.read()

Supposed you already read your file and text is the content.

import re

text = """
Special Type A Sunflower 
2017-10-19 18:20:30
Asteraceae
Brought to the USA by Europeans
Ingredient for Sunflower Oil
Needs full sun
Moist Soil, with heavy mulch
Water only when top 2 inches of soil is dry

Tropical Sealion
2020-04-25 12:10:05
Pinnipeds 
Mostly found in zoos
Likes Fish
Likes Balls
Likes Zookeepers

Honey Badger
2018-06-06 16:15:25
Mustelidae
Eats anything
"""

regex = re.compile('(?:[^\n]+\n)+', re.MULTILINE)

def parse(section):
  name, date_value, *information = section.strip().split('\n')
  return {
    'Name': name,
    'Date': date_value,
    'Information': information
  }

sections = [section for section in regex.findall(text)]
parsed_sections = [parse(section) for section in sections]

for parsed in parsed_sections:
  print(parsed)
  print()

Output

{'Name': 'Special Type A Sunflower ', 'Date': '2017-10-19 18:20:30', 'Information': ['Asteraceae', 'Brought to the USA by Europeans', 'Ingredient for Sunflower Oil', 'Needs full sun', 'Moist Soil, with heavy mulch', 'Water only when top 2 inches of soil is dry']}

{'Name': 'Tropical Sealion', 'Date': '2020-04-25 12:10:05', 'Information': ['Pinnipeds ', 'Mostly found in zoos', 'Likes Fish', 'Likes Balls', 'Likes Zookeepers']}

{'Name': 'Honey Badger', 'Date': '2018-06-06 16:15:25', 'Information': ['Mustelidae', 'Eats anything']}

Thanks!! I am unfamiliar with Regex but this looks great! The dictionaries are neatly stacked! I was also wondering if the elements of the list for the key : "Information" could be combined into a single element. — TropicalMagic, Aug 14 '20 at 05:02
Nope, remain a list but everything in the list is within a single " " — TropicalMagic, Aug 14 '20 at 05:09
Not sure, but you can try `[f'"{i}"' for i in information]`. — bertdida, Aug 14 '20 at 05:11

Python - How to make every first 3 lines of text blocks into dictionary values?

3 Answers3