I'm trying to create a Lucene 4.10 index. I just want to save in the index the exact strings that I put into the document, witout tokenization.
I'm using the StandardAnalyzer.
Directory dir = FSDirectory.open(new File("myDire"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_10_0, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
StringField field1 = new StringField("1", content1, Store.YES);
StringField field2 = new StringField("2", content2, Store.YES);
StringField field3 = new StringField("3", content3, Store.YES);
doc.add(field1);
doc.add(field2);
doc.add(field3);
writer.addDocument(doc, analyzer);
writer.close();
If I print the index's content, I can see my data being stored, for example, my document has this "field 3":
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<3:"Fuel Tank Capacity"@en>
I'm trying to query the index in order to get it back:
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("3", analyzer);
String queryString = "\"\"Fuel Tank Capacity"\@en\"";
Query query = parser.createPhraseQuery("3", QueryParser.escape(queryString));
TopDocs docs = searcher.search(query, null, 20);
I'm trying to search the term "Fuel Tank Capacity"@en (quotation marks included) so I tried to escape them and I put another couple of quotes around the terms in order to let lucene understand that I'm searching for the entire texts.
If I print the query, I get: 3:"fuel tank capacity en" but I dont want to split the text on the @ symbol.
I think that my first problem is the StandardAnalyzer, because it seems to tokenize, if I'm not mistaken. However, I cannot understand how to query the index in order to get exactly "Fuel Tank Capacity"@en (quotation marks included).
Thank you