0

To make the Personalize properly function, we need:

  • Users.csv
  • Items.csv
  • Interactions.csv

Goal is to import the historical (Interactions) data, and then let it be updated with the real time events. All fine and understandable.

How do we go about the interactions.csv for the historical initial upload, in case with huge amounts of data, one huge CSV.

Ideally, it would be nice that I could split this monster, into several chunks and feed them all to Personalize.

I saw there is a talk about incremental upload, but I don't see it possible. How did you guys go about it?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Amiga500
  • 5,874
  • 10
  • 64
  • 117

1 Answers1

1

You can split your bulk data into multiple CSVs, point your Personalize import job to the S3 "folder" containing your CSVs, and Personalize will import all files.

According to the docs:

If your CSV files are in a folder in your S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, use this syntax without the CSV file name.

Just be sure to split your CSVs for each dataset type into separate "folders" in your bucket. Also, the import process from a folder is not recursive so place your CSVs directly in the folder rather than in sub-folders.

For example:

interactions/
interactions/file1.csv
interactions/file2.csv
interactions/file3.csv
items/
items/file1.csv
items/file2.csv
items/file3.csv
users/
users/file1.csv
users/file2.csv
users/file3.csv

Then to import all interactions CSVs, use the interactions/ folder as the data location (e.g., s3://bucket-name/interactions/).

James J
  • 621
  • 3
  • 6