2

I have a json file in S3 with the following strucure

   {
  status: "Success",
  created_at: "19 AUG 2019",
  employees:[
     {"name":"name1", "id":"1"},
     {"name":"name2", "id":"2"},
     {"name":"name3", "id":"3"}
  ],
  contacts: [] 
}

The following SQL functions works fine to find number of contacts

SELECT count(*) FROM S3Object[*].contacts[*]

But sometimes, the json file doesn't have that contacs key itself like,

       {
  status: "Success",
  created_at: "19 AUG 2019",
  employees:[
     {"name":"name1", "id":"1"},
     {"name":"name2", "id":"2"},
     {"name":"name3", "id":"3"}
  ] 
}

In this case the above sql returns contacts count as 1 but I am expecting it to return "zero".

How to re-write a sql dynamically to handle both the file content?

Achaius
  • 5,904
  • 21
  • 65
  • 122

1 Answers1

1

Try:

SELECT count(*) FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING

Explanation

All the things refer to SELECT Command.

Consider this example:

{
  {
    status: "Success",
    created_at: "19 AUG 2019",
    employees:[
      {"name":"name1", "id":"1"},
      {"name":"name2", "id":"2"},
      {"name":"name3", "id":"3"}
    ],
    contacts: [
      {"a": "123"},
      {"b": "456"}
    ]
  },
  {
    status: "Success",
    created_at: "19 AUG 2019",
    employees:[
      {"name":"name1", "id":"1"},
      {"name":"name2", "id":"2"},
      {"name":"name3", "id":"3"}
    ]
  }
}

If you run SELECT * FROM S3Object[*].contacts[*], the result is

{"a": "123"}
{"b": "456"}
{}

Since the second object has no contacts, so refer to SELECT Command,

Amazon S3 Select emitted MISSING, which was then changed to an empty record during output serialization and returned.

So SELECT count(*) FROM S3Object[*].contacts[*] returns the count of the items, which is 3.

If you run SELECT * FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING, the result is

{"a": "123"}
{"b": "456"}

All the MISSING item will be dropped. So SELECT count(*) FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING will return the count of the items, which is 2.

Community
  • 1
  • 1
LiuChang
  • 739
  • 6
  • 13