I am currently trying to batch index data that at the moment I have in a text file using Cloudera Search batch indexing developing on the Cloudera quickstart vm. I believe I have a problem with my schema, and morphline because it completes the job and appears to be working when its indexed but no documents are present when I go into the Solr dashboard. The core shows but it is just zero documents. I am sure the command i am running and that cloudera search works before it allows me to batch index an example i have when i use the example input file, schema, and morphline file it works as it should and indexes and adds the documents to the core. The command I am using to perform this is:
hadoop --config /etc/hadoop/conf.cloudera.yarn jar \
/usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool -D \
'mapred.child.java.opts=-Xmx500m' \
--log4j '/usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/log4j.properties' \
--morphline-file /usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/test-morphlines/readMultiLine.conf \
--output-dir hdfs://quickstart.cloudera:8020/user/outdir --verbose --go-live \
--zk-host 127.0.0.1:2181/solr --collection collection1 \
hdfs://quickstart.cloudera:8020/user/indir
My schema is:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="sentences" version="1.5">
<fields>
<field name="id" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<field name="sentence" type="text_general" indexed="true" stored="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<dynamicField name="ignored_*" type="ignored"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
</schema>
For my morphline file I am using one that I found in the examples for just reading single line which is:
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
readLine {
ignoreFirstLine : true
commentPrefix : "#"
charset : UTF-8
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]
My sample input is : (DocID tab Sentence)
1 For evening wear at the North Pole, girls could dress up in handsome Nordic sweaters and full iridescent taffeta skirts, or top one of the full striped skirts with a terrific short beige trench coat.
2 But working to change the communist-run system is illegal, and the party relentlessly punishes dissent.
3 Word of the latest document first came on Sept. 1, 1987, during a meeting between the pope and Jewish leaders in Castel Gandolfo, the pontiff's summer residence in the hills southeast of Rome.
4 Anita Moen-Guidon of Norway was third, 2:28.6 behind Lazutina, and Russia's Julia Chepalova fourth, 2:53.5 behind.
5 We have been beaten, we have shed blood, we have purchased the right to meet here today with our blood,'' said John Munuve, an assembly leader.
6 The folklore Nordic knits were handsome, in sweaters, or knee-length pants, and might have been topped by something like a super taffeta full coat.
7 Several politicians have charged that the high taxes Kenyans already pay go into the pockets of government officials or wasteful projects, and not into providing essential services and repairing crumbling infrastructure.
8 independence.