I'm building recommender system using AWS Personalize. User-personalization recipe has 3 dataset inputs: interactions, user_metadata and item_metadata. I am having trouble importing user metadata which contains boolean
field.
I created the following schema:
user_schema = {
"type": "record",
"name": "Users",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "type",
"type": [
"null",
"string"
],
"categorical": True
},
{
"name": "lang",
"type": [
"null",
"string"
],
"categorical": True
},
{
"name": "is_active",
"type": "boolean"
}
],
"version": "1.0"
}
dataset csv file content looks like:
USER_ID,type,lang,is_active
1234@gmail.com ,,geo,True
01027061015@mail.ru ,facebook,eng,True
03dadahda@gmail.com ,facebook,geo,True
040168fadw@gmail.com ,facebook,geo,False
I uploaded given csv file on s3 bucket. When I am trying create dataset import job it gives me the following exception:
InvalidInputException: An error occurred (InvalidInputException) when calling the CreateDatasetImportJob operation: Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
I tested and it works without boolean
field is_active
. There are no NaN values in given column!
It'd be nice to have an ability to directly test if your pandas dataframe or csv file conforms given schema and possibly get more detailed error message.
Does anybody know how to format boolean field to fix that issue?