1

I have a text like below that I want to store in a dictionary, storing values of the same parameters (keys) in sublists.

file = """./path/to/Inventory2020_1.txt
fileType                           = Inventory
StoreCode
    number:1145C
numId                              = 905895
ValuesOfProducts
    prodsTypeA:150
    prodsTypeB:189
    UpdateTime:2020-03-05 14:45:38
InventoryTime                         = 2020-03-05 14:45:29
userName
    number:123

./path/to/Inventory2020_2.txt   
fileType                           = Inventory
StoreCode
    number:7201B
numId                              = 54272
ValuesOfProducts
    prodsTypeA:75
    prodsTypeB:231
    UpdateTime:2020-03-06 09:12:22
InventoryTime                         = 2020-03-06 09:11:47
userName
    number:3901 
"""

My current code successfully stores in a nested list the text, using this line:

import re

a = [ re.sub(r' += +', ':', line).replace(":", "=", 1).strip().split("=") for line in file.splitlines() ]

Now, to store in a dictionary using the parameters as keys, I'm using some conditions doing like this:

d = dict()

for lst in a:
    if len(lst) > 1:
        d.setdefault(lst[0], []).append(lst[1])
    else:
        if "path" in lst[0]:
            d.setdefault("File", []).append(re.sub(r'.+/', '', lst[0]))

>>> d
{
'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'], 
'fileType': ['Inventory', 'Inventory'], 
'number': ['1145C', '123', '7201B', '3901'], 
'numId': ['905895', '54272'], 
'prodsTypeA': ['150', '75'], 
'prodsTypeB': ['189', '231'], 
'UpdateTime': ['2020-03-05 14:45:38 -05:00', '2020-03-06 09:12:22'], 
'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47']
}
>>>

As you can see, for some parameters, the value related is separated by = sign in the same line and I can store key, value pair in the same sublist directly using split("="). But some key, values in which I'm interested are in different line, for example:

StoreCode
    number:1145C
    

In this case the key,pair value I'm interested of is key=StoreCode and value=1145C

For this one:

ValuesOfProducts
    prodsTypeA:75
    prodsTypeB:231
    UpdateTime:2020-03-06 09:12:22

The key,value pairs I'm intered of are:

  • key=prodsTypeA and value=75
  • key=prodsTypeB and value=231
  • key=UpdateTime and value=2020-03-06 09:12:22

So, the final dictionary would have this structure:

{
'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'], 
'fileType': ['Inventory', 'Inventory'], 
'StoreCode': ['1145C', '7201B'], 
'numId': ['905895', '54272'], 
'prodsTypeA': ['150', '75'], 
'prodsTypeB': ['189', '231'], 
'UpdateTime': ['2020-03-05 14:45:38', '2020-03-06 09:12:22'], 
'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47']
'userName': ['123', '3901']
}

The main issue is that in my current output, the parameters StoreCode and userName have the values I'm interested of related with word number. Then, is appending those values mixed and actually some values related with number belongs to key StoreCode and other values related with number belongs to key userName.

May someone help me to get my expected output please. Thanks in advance.

Ger Cas
  • 2,188
  • 2
  • 18
  • 45

1 Answers1

1

This is not exactly the way you specified it, but assuming the structure stays constant in all respects, the following (or something like it) which avoids using regex, is probably going to work for you:

subfiles = file.split('./path/to/')
locs = [0,2,3,5,6,7,8,10]
vals = []
for s in subfiles[1:]:    
    target = s.strip().splitlines()[1:]
    row = [s.split('fileType')[0].strip()]
    for loc in locs:        
        if "=" in target[loc]:
            entry = target[loc].split('=', 1)[1].strip()     
        else:
            if ":" in target[loc]:
                entry = target[loc].split(':',1)[1].strip()
        row.append(entry)
    vals.append(row)

key_names =['File','fileType', 'StoreCodenumber','numId','ValueOfProdsTypeA','ValueOfProdsTypeB','ProdsUpdateTime','InventoryTime','userName']
d = {}
for k, v1, v2 in zip(key_names,vals[0],vals[1]):
    d[k] = [v1,v2]
d

Output:

{'File': ['Inventory2020_1.txt', 'Inventory2020_2.txt'],
 'fileType': ['Inventory', 'Inventory'],
 'StoreCodenumber': ['1145C', '7201B'],
 'numId': ['905895', '54272'],
 'ValueOfProdsTypeA': ['150', '75'],
 'ValueOfProdsTypeB': ['189', '231'],
 'ProdsUpdateTime': ['2020-03-05 14:45:38 -05:00', '2020-03-06 09:12:22'],
 'InventoryTime': ['2020-03-05 14:45:29', '2020-03-06 09:11:47'],
 'userName': ['123', '3901']}

Obviously, you can modify it to suit your actual needs.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • Thanks Jack for your help. In my actual data sometimes appear empty sublists, or with unneeded values. So, in order to work I think before I need to have a fix `subfiles` structure. Thanks so much for your help and show several tecniques to manipulate this kind of text structure. – Ger Cas Jul 10 '20 at 19:18