4

I'm running a crawler over a folder containing several files with different schemas. I expect so to find a table for each file.

What happens is that in the Glue Catalogue I can actually see a table for each file, with its own schema. But when I try to query it via Redshift Spectrum (after creating the external schema etc.) I get this exception:

[XX000][500310] [Amazon](500310) Invalid operation: Parsed manifest is not a valid JSON object.

How to fix it?

Vzzarr
  • 4,600
  • 2
  • 43
  • 80

2 Answers2

5

For Googlers:

Crawler set Location of a Glue table to a file if it cannot make a table out of its containing folder.

That is, the file:

  • is not in a folder but directly in root path of a bucket
  • does not have a file format, compression method or schema that is compatible with sibling files in the same folder

A Location pointing directly at a file is not supported by Redshift Spectrum or Athena, thus this error.

To solve this problem, make a containing folder, also make sure all siblings have same format. Run the crawler again.

You should then see Location pointing to a prefix in a bucket.

dz902
  • 4,782
  • 38
  • 41
1

As reported in this forum https://forums.aws.amazon.com/thread.jspa?threadID=266510

every file should be in its own folder/sub-bucket

So for me putting each file in its own folder and setting the Glue Crawler to run over the top level folder resolved the exception.

I'm now able to query it without any problem.

Vzzarr
  • 4,600
  • 2
  • 43
  • 80