Python CSV to dictionary with multiple row entries per 1 item

Question

I am using python to turn a CSV file into a dictionary, where the CSV file has multiple values for the same column.

The following works to use the CSV headers (first line) as the named key to turn a simple CSV without multiple values into a dictionary:

def main():
    content = csvArray(".../Csv.csv")
    print(content)


def csvArray(path): 
    df = pd.read_csv(path)
    records = df.to_dict(orient='records')
    return records

However, I now have an issue. There is an Image column in the CSV, and in many cases, there are multiple entries per column for 1 item, formatted like:

SKU	ImageData
12345	1st Image Data
	2nd Image Data
	3rd Image Data
12346	1st Image Data
	2nd Image Data

etc...

There can be anywhere up to 8 images for 1 SKU.

My csvArray function does not work with the CSV formatted as such, and changing the format of the CSV is not possible from the export.

How could I concatenate all the image data into the first row? Or any alternative that could work turning the CSV into a dictionary?

I have replaced the actual base64 data with a placeholder due to the string size, the CSV can be inferred from the table above but here it is raw: Internal Reference;Name;Extra Product Media/Image TGTLI20018;20V Grass Trimmer - Body only;1st Image base64 data ;;2nd Image base64 data ;;3rd Image base64 data ;;4th Image base64 data ;;5th Image base64 data TGTLI20019;25V Grass Trimmer;1st Image base64 data ;;2nd Image base64 data — Ben Dodson, May 24 '22 at 14:20
`...5th Image base64 data TGTLI20019...` is there a delimeter of some sort between `5th Image base64 data` and `TGTLI20019`? Do all product names start with `'TGT'`? your going to have to do something like spilt on semicolon then look-for, *partition* by product names , and construct the dict manaully. — wwii, May 24 '22 at 16:03
maybe show what you have in CSV file. And show it in question, not in comments). It will be more readable and more people will see it. Maybe you could read it in different way to get it as list instead of separated rows. — furas, May 24 '22 at 17:27

wwii · Answer 1 · 2022-05-24T16:49:39.547

Data from your comment to your question:

s = '''Internal Reference;Name;Extra Product Media/Image TGTLI20018;20V Grass Trimmer - Body only;1st Image base64 data ;;2nd Image base64 data ;;3rd Image base64 data ;;4th Image base64 data ;;5th Image base64 data TGTLI20019;25V Grass Trimmer;1st Image base64 data ;;2nd Image base64 data'''

If you can determine a pattern that delineates records and will not occur in the base64 image data like ...

pattern = ' TGTLI'

find all the indices of this pattern in the data - (49, 208) in this case
iterate over the indices in (overlapping pairs) and use them to slice the data
```
record = s[49:208]
```
split the record with semicolon

>>> s[49:208].split(';')
[' TGTLI20018', '20V Grass Trimmer - Body only', '1st Image base64 data ', '', '2nd Image base64 data ', '', '3rd Image base64 data ', '', '4th Image base64 data ', '', '5th Image base64 data']

extract the fields and make the dictionary.

How to find all occurrences of a substring?
Iterate a list as pair (current, next) in Python

many more of those examples/Q&A's searching here on SO.

Python CSV to dictionary with multiple row entries per 1 item

1 Answers1