0

How do I get empty fields in SOLR indexed? I am using solr 7.2.0

I am using schemaless SOLR to try to index everything as string, but for files with empty fields, those fields do not get indexed. Is there a way to get them to show up?

col1,col2,col3
a,,1
d,e,
g,h,3

for example column 1 shows up as

{
"col1":"a",
"col3":"1",
}

I'm trying to also get col2 to show up. in my solrconfig.xml i have this

  <dynamicField name="*" type="text_general" indexed="true" stored="true" required="true" default="" />

and I have any traces of the remove-blank processor removed from my config. I've reloaded and deleted/recreated by collection multiple times. Is there a solution for this?

2 Answers2

0

maybe preprocess your csv file like this:

s/,,/, ,/g

That is, add an space between both commas (you will have to specially deal with the last value differntly though, there is a regex for that).

And then try again. Right now solr is reading the value as non existant, making it a space has more chances to make it through, and would not change search results (if you don't have some crazy analysis chains)

Persimmonium
  • 15,593
  • 11
  • 47
  • 78
0

The CSV import module has its own option to keep empty fields - f.<field name>.keepEmpty=true.

If you don't give that option, the CSV handler will never give the empty field value to the next step in your indexing process.

Giving f.col2.keepEmpty=True as an URL argument should at least give you a better starting point.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Is there a way to apply this for all columns instead of just selected columns since my collection is schemaless? – ProgrammingUnicorn Aug 05 '18 at 17:59
  • Try just `f.keepEmpty` or just `keepEmpty`. Usually the field part is optional, and the docs say that as well -- _This parameter can be global or per-field._ – MatsLindh Aug 05 '18 at 21:21