0

I am using python to turn a CSV file into a dictionary, where the CSV file has multiple values for the same column.

The following works to use the CSV headers (first line) as the named key to turn a simple CSV without multiple values into a dictionary:

def main():
    content = csvArray(".../Csv.csv")
    print(content)


def csvArray(path): 
    df = pd.read_csv(path)
    records = df.to_dict(orient='records')
    return records

However, I now have an issue. There is an Image column in the CSV, and in many cases, there are multiple entries per column for 1 item, formatted like:

SKU ImageData
12345 1st Image Data
2nd Image Data
3rd Image Data
12346 1st Image Data
2nd Image Data

etc...

There can be anywhere up to 8 images for 1 SKU.

My csvArray function does not work with the CSV formatted as such, and changing the format of the CSV is not possible from the export.

How could I concatenate all the image data into the first row? Or any alternative that could work turning the CSV into a dictionary?

Ben Dodson
  • 13
  • 4
  • What does the csv file look like? – ImranD May 24 '22 at 14:07
  • I have replaced the actual base64 data with a placeholder due to the string size, the CSV can be inferred from the table above but here it is raw: Internal Reference;Name;Extra Product Media/Image TGTLI20018;20V Grass Trimmer - Body only;1st Image base64 data ;;2nd Image base64 data ;;3rd Image base64 data ;;4th Image base64 data ;;5th Image base64 data TGTLI20019;25V Grass Trimmer;1st Image base64 data ;;2nd Image base64 data – Ben Dodson May 24 '22 at 14:20
  • `...5th Image base64 data TGTLI20019...` is there a delimeter of some sort between `5th Image base64 data` and `TGTLI20019`? Do all product names start with `'TGT'`? your going to have to do something like spilt on semicolon then look-for, *partition* by product names , and construct the dict manaully. – wwii May 24 '22 at 16:03
  • maybe show what you have in CSV file. And show it in question, not in comments). It will be more readable and more people will see it. Maybe you could read it in different way to get it as list instead of separated rows. – furas May 24 '22 at 17:27

1 Answers1

0

Data from your comment to your question:

s = '''Internal Reference;Name;Extra Product Media/Image TGTLI20018;20V Grass Trimmer - Body only;1st Image base64 data ;;2nd Image base64 data ;;3rd Image base64 data ;;4th Image base64 data ;;5th Image base64 data TGTLI20019;25V Grass Trimmer;1st Image base64 data ;;2nd Image base64 data'''

If you can determine a pattern that delineates records and will not occur in the base64 image data like ...

pattern = ' TGTLI'
  • find all the indices of this pattern in the data - (49, 208) in this case

  • iterate over the indices in (overlapping pairs) and use them to slice the data

    record = s[49:208]
    
  • split the record with semicolon

>>> s[49:208].split(';')
[' TGTLI20018', '20V Grass Trimmer - Body only', '1st Image base64 data ', '', '2nd Image base64 data ', '', '3rd Image base64 data ', '', '4th Image base64 data ', '', '5th Image base64 data']
  • extract the fields and make the dictionary.

How to find all occurrences of a substring?
Iterate a list as pair (current, next) in Python

many more of those examples/Q&A's searching here on SO.

wwii
  • 23,232
  • 7
  • 37
  • 77