1

I have question and answers entities represented in a SQL Server database as 2 tables Questions and Answers (see below). The relationship between them is OneToMany.

Questions table

Id      Title
-------------------
 1      Question 1
 2      Question 2

Answers table:

Id    Answer        Question_Id
-------------------------------
1     answer 1      1
2     answer 2      1
3     answer 3      1
4     answer 4      2
5     answer 5      2

I would like to get the ES document with a structure provided below, after moving the data through the Logstash pipeline:

{
  “questionId": 1,
  "questionTitle": "Question 1",
  "questionAnswers": [
    {
      “answerId": 1,
      "answer": "answer 1"
    },
    {
      "answerId": 2,
      "answer": "answer 2"
    },
    {
      "answerId": 3,
      "answer": "answer 3"
    }
  ]
}

{
  "questionId": 2,
  "questionTitle": "Question 2",
  "questionAnswers": [
    {
      "answerId": 4,
      "answer": "answer 4"
    },
    {
      "answerId": 5,
      "answer": "answer 5"
    }
  ]
}

The logstash jdbc input plugin setup uses the Question_Answers view to retrieve the data.

{
  jdbc {
    type => “Test_1”
    jdbc_connection_string => "jdbc:sqlserver://myinstance:1433"
    jdbc_user => “root”
    jdbc_password => “root”
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_driver_library => "/home/abury/enu/mssql-jdbc-6.2.2.jre8.jar"
    schedule => "*/3 * * * *"
    statement => "SELECT * from Question_Answers"
  }
}

The result set returned by view looks like this:

questionId  questionTitle   answerId    answer
1           Question 1      1           answer 1
1           Question 1      2           answer 2
1           Question 1      3           answer 3
2           Question 2      4           answer 4
2           Question 2      5           answer 5

The Elasticsearch output plugin setup looks like this:

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "question"
    document_id => "%{questionId}"
  }
}

Question: How can I setup Logstash to identify records related to the same question and build ES document with desired structure provided above? Is it possible to add some aggregation logic into the output.conf file to achieve desired behavior? Or I need to re-write my DB view to return single record for each question:

questionId  questionTitle   answerId    answer
---------------------------------------------------------------------
1           Question 1      1, 2, 3     answer 1, answer 2, answer 3

UPDATED: fix typo in column names

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Hutsul
  • 1,535
  • 4
  • 31
  • 51

2 Answers2

0
SELECT 
questionId,
questionTitle,
GROUP_CONCAT(answereId) answerIDs,
GROUP_CONCAT(answer) answers
FROM Question_Answers
GROUP BY questionId, questionTitle

By the way you have typo in your column name answereId I guess you want answerId.

Alex
  • 16,739
  • 1
  • 28
  • 51
  • Thanks for you response. I assume, you suggest to rewrite my view to return single record for each `question` entity. In this case, should I also explicitly define mapping for `answerIDs` to `questionAnswers.aswerId` and for `answers` to `questionAnswers.answer` fields in the `output.conf` file? – Hutsul May 15 '20 at 19:00
0

I was able to get the desired structure of Elasticsearch document by using the logstash aggregate filter plugin (see, Example 4):

filter {
    aggregate {
        task_id => "%{questionId}"
        code => "
               map['questionId'] ||= event.get('questionid')
               map['questionTitle'] ||= event.get('questiontitle')
               
               map['questionAnswers'] ||= []
               map['questionAnswers'] << {'answerId' => event.get('answerid'), 'answer' => event.get('answer')}

               event.cancel()
             "
        push_previous_map_as_event => true
        timeout => 3
    }
}
Hutsul
  • 1,535
  • 4
  • 31
  • 51