0

I need to store data that can be represented in JSON as follows:

Article{
    Id: 1,
    Category: History,
    Title: War stories,

    //Comments could be pretty long and also be changed frequently
    Comments: "Nice narration, Reminds me of the difficult Times, Tough Decisions" 

    Tags: "truth, reality, history", //Might change frequently

    UserSpecifiedNotes:[
    //The array may contain different users for different articles
    {
        userid: 20,
        note: "Good for work"
    },
    {
        userid: 22,
        note: "Homework is due for work"
    }
    ]
}

After having gone through different articles, denormalization of data is one of the ways to handle this data. But since common fields could be pretty long and even be changed frequently, I would like to not repeat it. What could be the other ways better ways to represent and search this data? Parent-child? Inner object?


Currently, I would be dealing with a lot of inserts, updates and few searches. But whenever search is to be done, it has to be very fast. I am using NEST (.net client) for using elastic search. The search query to be used is expected to work as follows:

  1. Input: searchString and a userID
  2. Behavior: The Articles containing searchString in either Title, comments, tags or the note for the given userIDsort in the order of relevance
labyrinth
  • 1,104
  • 3
  • 11
  • 32

1 Answers1

0

In a normal scenario the main contents of the article will be changed very rarely whereas the "UserSpecifiedNotes"/comments against an article will be generated/added more frequently. This is an ideal use case for implementing parent-child relation.

With inner object you still have to reindex all of the "man article" and "UserSpecifiedNotes"/comments every time a new note comes in. With the use of parent-child relation you will be just adding a new note.

With the details you have specified you can take the approach of 4 indices

  • Main Article (id, category, title, description etc)
  • Comments (commented by, comment text etc)
  • Tags (tags, any other meta tag)
  • UserSpecifiedNotes (userId, notes)

Having said that what need to be kept in mind is your actual requirement. Having parent-child relation will need more memory, and ma slow down search performance a tiny bit. But indexing will be faster.

On the other hand a nested object will increase your indexing time significantly as you need to collect all the data related to an article before indexing. You can of course store everything and just add as an update. As a simpler maintenance and ease of implementation I would suggest use parent-child.

Prabin Meitei
  • 1,920
  • 1
  • 13
  • 16
  • Sorry I might be pretty naive, but what do you exactly mean by 4 indices? Do you suggest Main Article as a parent having 3 type of children: comment, tag, notes. And there could be multiple instances of each type of children? – labyrinth Sep 09 '15 at 12:41
  • Exactly. Main article as parent and other three as child types. Each article can have zero or more children of each type. You can construct a query like give me all articles where title has Java, and has a comment containing j2ee and having note from user1. – Prabin Meitei Sep 09 '15 at 13:45