Questions tagged [kuromoji]

Kuromoji is a self-contained and easy to use Japanese morphological analyzer designed for search

Kuromoji is a self-contained and easy to use Japanese morphological analyzer designed for search http://atilika.org/

17 questions
3
votes
0 answers

Partial search query in kuromoji

I have an issue when trying to do partial search using the kuromoji plugin. When I index full sentence, like ホワイトソックス with analyzer like: { "tokenizer": { "type": "kuromoji_tokenizer", "mode": "search" }, "filter": ["lowercase"], …
sdooo
  • 1,851
  • 13
  • 20
3
votes
1 answer

How to split Japanese text?

What is the best way of splitting Japanese text using Java? For Example, for the below text: こんにちは。私の名前はオバマです。私はアメリカに行く。 I need the following output: こんにちは 私の名前はオバマです 私はアメリカに行く Is it possible using Kuromoji?
din_oops
  • 698
  • 1
  • 9
  • 27
2
votes
1 answer

Elasticsearch: Cannot search using Kuromoji reading form filter

I'm using Elasticsearch 0.90.1 with Kuromoji plugin 1.4.0. $ curl localhost:9200 { "ok" : true, "status" : 200, "name" : "Agent Zero", "version" : { "number" : "0.90.1", "snapshot_build" : false, "lucene_version" : "4.3" }, …
Chris B
  • 9,149
  • 4
  • 32
  • 38
1
vote
1 answer

Struggling to understand user dictionary format in Elasticsearch Kuromoji tokenizer

I wanted to use Elasticsearch Kuromoji plugin for Japanese language. However, I'm struggling to understand the user_dictionary format of the file in the tokenizer. It's explained in elastic doc…
czeczek
  • 61
  • 5
1
vote
2 answers

Elasticsearch/Kuromoji: How to Use Kuromoji with Unidic

Elasticsearch 1.7 We would like to test Kuromoji with Unidic on Elasticsearch. Compiling kuromoji gives me a few jars with different dictinaries. Is there a simple way to replace the ipadic-based-kuromoji with the unidic-based-kuromoji? Thanks.
tokosh
  • 1,772
  • 3
  • 20
  • 37
1
vote
1 answer

Elasticsearch error when creating index possibly due to module Kuromoji not installed properly

I am trying to setup my local environment for a Rails application I just gained access to that uses Elasticsearch 1.3 along with two modules (kuromoji and smartcn) I have followed instructions to install Elasticsearch along with the modules and when…
0
votes
0 answers

Elasticsearch kuromoji plugin output issue

What is the expected output when we run the Elasticsearch kuromoji plugin : Using the number and reading form filter the code is not working as it should. but if they are used separately it is working properly. PUT test { "settings":…
Aman
  • 13
  • 5
0
votes
0 answers

How can I fix the ElasticSearch kuromoji-readingform filter for Japanese fulltext search?

ElasticSearch Japanese fulltext kuromoji-readingform is not working I have settings and query like this. My documents are "Medical development system" in three form of Japanese. When I search for "医療用システム開発" -> result must contains 3 documents but i…
0
votes
1 answer

How to Use Kuromoji.js in Javascript

I recently installed a package w/ Bower from here: https://github.com/takuyaa/kuromoji.js/ Reading the installation on the github, I basically copied and pasted from the guide: kuromoji.builder({ dicPath: "../bower_components/kuromoji/dict/"…
Steak
  • 514
  • 3
  • 15
0
votes
1 answer

How to return the values in the order of the data passed to a promise in a loop?

To learn about fetch, promise, and other js stuff I'm trying to write a small script that suggests words to learn (base on its difficulty) from a given Japanese text. It utilizes a Japanese parser called Kuromojin. What a parser like Kuromojin does…
aanhlle
  • 131
  • 9
0
votes
1 answer

Getting NoClassDefFoundError on using JapaneseTokenizer of Apache Lucene 7.1.0

I am trying to use the JapaneseTokenizer from Apache Lucene 7.1.0. Its giving me java.lang.NoClassDefFoundError: org/apache/lucene/analysis/ja/JapaneseTokenizer and java.lang.ClassNotFoundException: org.apache.lucene.analysis.ja.JapaneseTokenizer.…
Shaw
  • 1
  • 3
0
votes
0 answers

elasticsearch user dictionary

I want to use the symbol '#' as a user dictionary with elasticsearch However, setting the character "C #" in the user dictionary results in an error ES version 5.6 I am using the Kuromoji plugin user…
0
votes
1 answer

Hibernate Search | Lucene Kuromojo Analyzer depend on method name

I have my entity class FeatureMeta annotated using two Analyzers English and Japanese. In my repository class, I have named the method to search for FeatureMeta entities as "findFeatures". But when I try to access the "findFeatures" method in the…
Ashika Umanga Umagiliya
  • 8,988
  • 28
  • 102
  • 185
0
votes
1 answer

ElasticSearch 2.4.1 and Kuromoji plugin with specify filed in search query

I've just used ElaticSearch(version 2.4.1) in my project for 2 weeks ago, and I have a problem if I specify field in the query string. I want to use Kuromoji plugin and n-gram tokenizer to search Japanese data. In my query, if I don't specify the…
0
votes
2 answers

How to get analyzed word count by Elasticsearch?

I would like to count each token analyzed. First, I tried following codes: mapping: { "docs": { "mappings": { "doc": { "dynamic": "false", "properties": { "text": { "type": "string", …
1
2