3

I've been using FOSElasticaBundle to index my documents (which are entities from a Symfony project kept in a db through Doctrine) into Elastic Search. FOSElastica does an automatic mapping and index after that all the documents.

The problem is that there's some actions I would want to apply on every documents (on those already indexed and those which will be after), so pipelines and painless seems to be a good solution.

But, I can't get to understand how to apply a pipeline to documents that are already indexed, do you have an idea how ?

I've seen that you can add a "pipeline=my_pipeline_name" after an ES request but you can do it for a single document while I'd want it to affect all the documents.

Malcom HAMELIN
  • 305
  • 1
  • 3
  • 11
  • Take a look at this: https://www.shubho.dev/devops/elastic-search-update-by-query-with-ingest-pipeline might be helpful – David Kong Aug 13 '20 at 10:56

2 Answers2

7

You can use Pipeline while you move your data from one index to another index.

You would need to make use of Reindex API in order for it to be executed on the data during its movement/ingestion_process from one index to another.

Note: This is index level operation meaning it would affect all the documents.

Below is summary of steps:

  • Create a temporary_index,
  • Reindex from source_index to temporary_index make use of Reindex API. Also including the pipeline (sample query provided below)
  • Delete and re-create the source_index. Ensure that the mappings are also included while creating the index.
  • Execute the same query this with source_index as destination name and temporary_index as source name without the pipeline

Below is how you make use of Reindex API with pipeline

POST _reindex
{
  "source": {
    "index": "source_index_name"
  },
  "dest": {
    "index": "temporary_index",
    "pipeline": "some_ingest_pipeline"
  }
}

Let me know if this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32
  • 2
    On an added note, if your index is too large and would want to see the status of reindex operations, while its still processing under the hood, refer to the `Task API`, which would help you give information as how many docs are processed and how many are still left during reindex operation. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api – Kamal Kunjapur May 24 '19 at 11:58
1

So, after some time I found a way more efficient solution to my problems : Dynamic templates and Index templates

I actually had trouble with ElasticSearch not recognizing some types of fields (like date or geo_point), so I forced them for specifically named fields with help of templates.

If you want an example of my configuration in FOSElastica (doc is here) :

fos_elastica:
    serializer: 
        serializer: jms_serializer
    clients:
        default: 
            host: localhost 
            port: 9200
    index_templates: # https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-templates.html
        base_template: # this is a custom name for the index template
            client: default
            template: "*" # this is where you define which indices will use this template
            types:
                _doc: # this is where you define which types will use this (_doc stands for every type/documents)
                    dynamic_templates: # https://www.elastic.co/guide/en/elasticsearch/reference/6.8/dynamic-templates.html
                        dynamic_date_template: # this is a custom name for the dynamic field template
                            match_pattern: regex
                            match: created|updated|tpq_date|taq_date
                            mapping:
                                type: date
                        dynamic_location_template:
                            match: location
                            mapping:
                                type: geo_point
Malcom HAMELIN
  • 305
  • 1
  • 3
  • 11