0

I have some text documents which contain paragraphs, sections, etc and I know/can extract the name of the paragraphs and the content itself.

I would like to index the content of the documents in elasticsearch but in a hierarchical mode. There is not a fixed indentation in the documents, I mean there could be different types of section/sub-sections imbrications.

How can I do this? The children-parent relationship or nested objects is not working because I don't know from the beginning the number of nested paragraphs.

P.S the language programming is java for this implementation Thanks!

SimLine
  • 11
  • 8
  • Really depends on what you're trying to do with the hierarchy in the end. Es is fairly bad for storing relational data. What about a plain list of items (=paragraphs?) and keep a representation of the location in the hierarchy like "2.2.5.1" in some field? – zapl Aug 29 '18 at 22:38
  • I need only the leaves because I have to do some textual processing on them and at the end I will aggregate the information. Probably I will setup from beginning the depth of the tree. – SimLine Aug 30 '18 at 12:01

0 Answers0