5

I am trying to migrate big query event data in redshift database. I am using following command.

COPY events_20180627 from 's3://big-query-to-rs/big-query-data/events_20180627' CREDENTIALS 'aws_access_key_id=XXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' format as json 'auto' ;

and it giving me following error for the row having size more than 4 MB.

Amazon Invalid operation: The total size of the json object exceeds the max limit of 4194304 bytes Details: ----------------------------------------------- error: The total size of the json object exceeds the max limit of 4194304 bytes code: 8001 context:

I went through various blog and answer with no luck.

Can anybody tell me a workaround for this ? Thanks!

iamabhaykmr
  • 1,803
  • 3
  • 24
  • 49
  • 1
    There is no solution, you need to make the JSON smaller as Redshift only lets you use 4MB. – Pentium10 Nov 29 '19 at 15:14
  • to add to @Pentium10 reply - [COPY command's reference](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) mentions that. – botchniaque Dec 04 '19 at 08:17
  • 1
    But I have JSON that are much larger than 4MB that go through just fine... 8-10 MB JSONs copied into Redshift thousands of times a day, no problem. – Hovanes Gasparian Jan 27 '22 at 07:28

1 Answers1

4

Maybe you guys already knew this, but it wasn't clear to me at first.

The 4MB limit is on a single row of the table insert, not on the total of all the rows. I had an error because the format of my JSON was:

[
  {"field1": "a", "field2": "b"},
  {"field1": "c", "field2": "d"},
  ...
]

I think redshift thought it was a single row and that's why it gave the 4MB error.

Redshift expects this format:

{"field1": "a", "field2": "b"}
{"field1": "c", "field2": "d"}
...

That is, not objects nested inside an array, but just plain objects adjacent to each other, separated by optional whitespace (I only tested separating by newlines, but pretty sure those are optional).

Docs:

Hopefully this helps!

matt0089
  • 301
  • 2
  • 5