0

I see that AWS personalize supports GENRES for items. Can't find anything about preferred GENRES for the users dataset.

  • is it supported?
  • does it make sense to add preferred GENRES for users dataset?

General question about GENRES field

  • doc shows genres as strings like action|comedy
  • is it OK to send ids instead of string values 1|42?

and still define GENRES field as categorical string field?

{
          "name": "GENRES",
          "type": "string",
          "categorical": true
}

Isn't it just a math behind the scenes and it doesn't really matter if genre is meaningful name of just a number?

Capacytron
  • 3,425
  • 6
  • 47
  • 80

1 Answers1

1

The GENRES column is required when you create a Video On Demand domain dataset group and include an items dataset. For the e-commerce domain dataset group and custom dataset groups, GENRES is not required.

The GENRES field must be marked as categorical in your items dataset schema.

{
  "type": "record",
  "name": "Items",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "ITEM_ID",
      "type": "string"
    },
    {
      "name": "GENRES",
      "type": "string",
      "categorical": true
    },
    {
      "name": "CREATION_TIMESTAMP",
      "type": "long"
    }
  ],
  "version": "1.0"
}

Categorical fields allow you to specify one or more values for each item where multiple items are separated by |. For example, Action|Adventure. The values you use for genres is dependent on your data. You can use string keywords or numbers. Just make sure that you format the GENRES column as a string and use consistent genre values across your items. Personalize will encode the values you specify when the model is trained.

James J
  • 621
  • 3
  • 6
  • I have sort of "GENRES" (in AWS terms) and TAGS (user adds tags to content a.k.a. ITEM) Will AWS consider another categorical field and utilize it for similarity? My current ITEM schema has GENRES and TAGS fields defined as categorical with | separator – Capacytron Apr 28 '22 at 07:40
  • 1
    Yes, you can add additional fields, such as TAGS, beyond the required and reserved fields defined by the VOD recommender. Alternatively, if the VOD domain recommender is not a good fit for your use case, you can use a custom dataset group where you have complete control over the schema/fields. For user-generated fields like TAGS, be mindful of data quality/cleanliness. – James J Apr 29 '22 at 12:33
  • I'm using custom dataset, I've used video on demand as reference. – Capacytron May 06 '22 at 11:02