0

I plan to use dl4j on a dataset which is in the following format:

{"articles": [{"abstractText":"text..", "journal":"journal..", "meshMajor":["mesh1",...,"meshN"], "pmid":"PMID", "title":"title..", "year":"YYYY"},..., {..}]}

The field meshMajor contains the class labels and the rest are the input for the model. The input features are textual data.

I was wondering if there are any built in JSON dataset iterators like the CSV one. I looked up in the examples posted on github but couldn't locate one. If there isn't one available, can someone please provide some pointers on implementing it.

Thanks!

2 Answers2

0

This looks like a promising start:

https://deeplearning4j.org/docs/latest/datavec-serialization

You should then be able to use the examples here:

https://github.com/deeplearning4j/dl4j-examples/tree/master/datavec-examples/src/main/java/org/datavec/transform/basic

reden
  • 968
  • 7
  • 14
0

Had asked this Q in DL4J's gitter and the solution is using Jackson record reader. Additional details are available https://github.com/deeplearning4j/DataVec/tree/master/datavec-api/src/main/java/org/datavec/api/records/reader/impl/jackson and example of reading JSON is available at

  1. https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/test/java/org/datavec/api/records/reader/impl/JacksonLineRecordReaderTest.java and,

  2. https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/test/java/org/datavec/api/records/reader/impl/JacksonRecordReaderTest.java

Note that there is a difference between JacksonLineRecordReader and JacksonRecordReaderTest.java where the former requires each JSON record to span exactly one line and the latter requiring one file for each JSON record.