0

I have installed elastic search version 6.2.3 over docker.

I face the following error when trying to install the following elasticsearch plugin

org.wikimedia.search:extra

Exception in thread "main" java.lang.IllegalArgumentException: plugin [extra] is incompatible with version [6.2.3]; was designed for version [5.5.2]

I tried to install the plugin using the following command:

RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install org.wikimedia.search:extra:5.5.2.3

I was trying to install this plugin to load wikipedia dictionary into elasticsearch but the latest version for the plugin is 5.5.2

Ahmed Abbas
  • 118
  • 8
  • Then you need to install ES 5.5.2 in order to use that plugin... or contribute to it in order to make it 6.2.3 compatible. – Val Apr 19 '18 at 09:20
  • i will try this solution but it's suitable only in case of proof of concept not in production environment. – Ahmed Abbas Apr 20 '18 at 04:39
  • It worked very well especially with docker. i had only to disable the security as i am testing by add this: -e 'xpack.security.enabled=false' to docker run command – Ahmed Abbas Apr 20 '18 at 04:52

1 Answers1

2

Two years ago wikimedia has made available dumps of the production elasticsearch indices. So to load wikipedia, also wiktionary, into elastic is now very simple

The indices are exported every week and for each wiki there are two exports.

The content index, which contains only article pages, called content;
The general index, containing all pages. This includes talk pages, templates, etc, called general;

you can find them here http://dumps.wikimedia.org/other/cirrussearch/current/

  • create a mapping according your needs. For example:

    {
         "mappings": {
         "page": {
            "properties": {
               "auxiliary_text": {
                  "type": "text"
               },
               "category": {
                  "type": "text"
               },
               "coordinates": {
                  "properties": {
                     "coord": {
                        "properties": {
                           "lat": {
                              "type": "double"
                           },
                           "lon": {
                              "type": "double"
                           }
                        }
                     },
                     "country": {
                        "type": "text"
                     },
                     "dim": {
                        "type": "long"
                     },
                     "globe": {
                        "type": "text"
                     },
                     "name": {
                        "type": "text"
                     },
                     "primary": {
                        "type": "boolean"
                     },
                     "region": {
                        "type": "text"
                     },
                     "type": {
                        "type": "text"
                     }
                  }
               },
               "defaultsort": {
                  "type": "boolean"
               },
               "external_link": {
                  "type": "text"
               },
               "heading": {
                  "type": "text"
               },
               "incoming_links": {
                  "type": "long"
               },
               "language": {
                  "type": "text"
               },
               "namespace": {
                  "type": "long"
               },
               "namespace_text": {
                  "type": "text"
               },
               "opening_text": {
                  "type": "text"
               },
               "outgoing_link": {
                  "type": "text"
               },
               "popularity_score": {
                  "type": "double"
               },
               "redirect": {
                  "properties": {
                     "namespace": {
                        "type": "long"
                     },
                     "title": {
                        "type": "text"
                     }
                  }
               },
               "score": {
                  "type": "double"
               },
               "source_text": {
                  "type": "text"
               },
               "template": {
                  "type": "text"
               },
               "text": {
                  "type": "text"
               },
               "text_bytes": {
                  "type": "long"
               },
               "timestamp": {
                  "type": "date",
                  "format": "strict_date_optional_time||epoch_millis"
               },
               "title": {
                  "type": "text"
               },
               "version": {
                  "type": "long"
               },
               "version_type": {
                  "type": "text"
               },
               "wiki": {
                  "type": "text"
               },
               "wikibase_item": {
                  "type": "text"
               }
            }
         }
      }
    }
    

once you have created the index you just type:

zcat enwiki-current-cirrussearch-general.json.gz | parallel --pipe -L 2 -N 2000 -j3 'curl -s http://localhost:9200/enwiki/_bulk --data-binary @- > /dev/null'

Enjoy!

Lupanoide
  • 3,132
  • 20
  • 36
  • I didn't try your answer yet but the link you provided will not let me skip the index but then I need to create the index. and when I try to create the index with analyzer, I face the above error. so i don't think it's a solution – Ahmed Abbas Apr 20 '18 at 04:55
  • you are wrong ,you don't need to create the index! The index sample that i report in my answer is for wikipedia, not for wiktionary. And it is comfortable for my purposes. But you can also index wiktionary without an index mapping. Elasticsearch will produce an automatic index. Then you should check in the automatic mapping provided by ES if is better for your searches to change some field from text to keyword, and find a mapping that fulfill your needs. Save your new index and then reindex with the new mapping. But you can also use tha automatic mapping provided by ES – Lupanoide Apr 20 '18 at 08:19