1

I have data in json format.

  1. Correct

[{"text":"foo0","number":123},{"text":"foo1","number":345},{"text":"foo2","number":678},{"text":"foo3","number":901}]

  1. Incorrect

{"text":"foo0","number":123}{"text":"foo1","number":345} {"text":"foo2","number":678}{"text":"foo3","number":901}

Creating external table

create external table js_test_3
(
  text string,
  number string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/Serde'

Then I launch

select *
from js_test_3

As a result I get

  1. Correct for json (retrieves as many elements as there are rows)

text number

{"number":"123","text":"foo0"} {"number":"345","text":"foo1"}

  1. For incorrect json-(retrieves only first element)

text number

foo0 123

How should I write script that creates list for which the result is correct?

text number

foo0 123

foo1 345

foo2 678

foo3 901

Thanks

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Kobra
  • 313
  • 1
  • 15
  • Are you asking how to get the json serde to correctly parse the incorrect json? You can't - you need to correct the json. – highlycaffeinated Aug 15 '17 at 11:38
  • No, how to make json serde correctly parse the correct json? Wrong I will correct. By the way, serde correctly parses the non-correct json if it's written like this: {"text":"foo0","number":123}/n/r {"text":"foo1","number":345}/n/r {"text":"foo2","number":678}/n/r {"text":"foo3","number":901} – Kobra Aug 15 '17 at 11:45
  • JsonSerDe isn't expecting the entire file to be one json document, it's expecting a file with one complete json structure per line separated by CR/LF (which is why the example in your comment works correctly) – highlycaffeinated Aug 15 '17 at 11:51
  • How can I solve my problem? Can eat another option solution? SERDE XML documents reads perfectly, i need the tool for JSON. – Kobra Aug 15 '17 at 12:28

1 Answers1

0

The JsonSerDe you are using expects as input a file with one complete JSON per line. This is desirable because it is able to split the input (and divides works amongst the worker nodes) on CR/LF.

If you have a well-formed JSON list like you show in your "correct JSON" example, you could use a tool like jq to transform your input file into the expected format. For example:

jq -c .[] correct.json > /serde-input.txt
highlycaffeinated
  • 19,729
  • 9
  • 60
  • 91