1

I'm trying to update a specific field in a solr document. For testing purposes I'm using the author field. Afterwards i will try to update the date field. I'm using curl in cygwin terminal. This is the command I'm entering in the terminal:

curl http://localhost:8983/solr/MaharaPortfolioA/update -d '[{"url":"https://www.moopaed.de/mahara/view/view.php?id=6920","author":{"set":"Herbert"}}]'

To check for a success I'm using the following command and getting this response:

$ curl http://localhost:8983/solr/MaharaPortfolioA/get?id="https://www.moopaed.de/mahara/view/view.php?id=6920"
{
  "doc":
  {
    "url":"https://www.moopaed.de/mahara/view/view.php?id=6920",
    "portfolio_title":"IT 2 Portfolio - View 2",
    "title":"Themenschwerpunkt Informationssysteme  - moopaed mahara",
    "author":"Herbert",
    "indexDate":"2017-04-05T22:04:10Z",
    "nrImages":8,
    "nrWords":7474,
    "nrUploadedImages":6,
    "nrLinks":0,
    "cohort":"IT3 WS 2013/2014",
    "lecture":"OOP",
    "nrWikipediaImages":0,
    "nrWikipediaLinks":0,
    "_version_":1564023239370342400}}

According the response everything seems fine: The vaule of author changed from "Louisa" to "Herbert". But if I'm using a query the search for "Herbert" I get no result (http://localhost:8983/solr/MaharaPortfolioA/select?q=Herbert). On my search for a solution I found different possible reasons for that problem:

I have no further ideas why my search for "Herbert" gives me no response.

  • Is it because my unique key is a url and not an integer value?
  • Or is it because I'm using Curl via Cygwin? Furthermore there's a difference between cURL (Client for URLs) and curl (programming
    language). If tutorials use this term, do they refer to cURL?
  • Another reason could be that "author" gets filtered and tokenized while indexing. Maybe my update doesn't run through these actions?

Thanks in advance

Alexander
  • 195
  • 1
  • 13
  • What do logs say? – Oyeme Apr 07 '17 at 13:47
  • @Oyeme As i tried to change "nrWords" the log was: '2017-04-07 14:02:28.745 INFO (qtp870698190-14) [ x:MaharaPortfolioA] o.a.s.u.p.LogUpdateProcessorFactory [MaharaPortfolioA] webapp=/solr path=/update params={}{add=[https://www.moopaed.de/mahara/view/view.php?id=6093 (1564028435152502784)]} 0 15' – Alexander Apr 07 '17 at 14:07
  • 2
    Have you tried to add commit=true into your link ? curl curl http://localhost:8983/solr/MaharaPortfolioA/update?commit=true -d '[{"url":"https://www.moopaed.de/mahara/view/view.php?id=6920","author":{"set":"Herbert"}}]' ( finally the commit=true parameter tells Solr to commit the update we are sending to it on this request.) – Oyeme Apr 07 '17 at 14:14
  • I added commit=true to my link but the problem stays the same. The strange thing is that "Herbert" appears in the author facet while he is not searchable. In contrast "Louisa" and all other original authors are searchable. – Alexander Apr 07 '17 at 14:39
  • 1
    try to restart solr service, sometimes indexes could be messed up... – Oyeme Apr 07 '17 at 15:09
  • I did this several times. Is it possible that my solr-server isnt acting correctly? I mean i got my configuration files from an university project and already corrected some mistakes in schema.xml. So it's possible that there are more mistakes which I'cant find since I'm a solr newbie. Does the solrconfig.xml contain update-relevant code? – Alexander Apr 07 '17 at 15:49
  • Could be, it's hard to say without seeing solrconfig.xml. Try to download and compare your solrconfig with default one. https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig – Oyeme Apr 07 '17 at 15:57
  • 1
    Are you _actually_ searching the `author` field? `q=author:Herbert` would be the standard Lucene syntax for querying `author`. When you're using `q=Herbert`, the search goes to the default search field, which probably isn't `author`. – MatsLindh Apr 07 '17 at 18:13
  • @MatsLindh Ouch! You are right! I always thought, that I searched all fields. But it seems that I only search the text field. So my primary problem is solved. Thank you MatsLindh and Oyeme. But another problem appears: as soon as I change a field, the value of 'text' disappears. It already isn't listed above in my curl response. – Alexander Apr 08 '17 at 07:35

2 Answers2

1

You're not actually searching the author field: q=author:Herbert would be the standard Lucene syntax for querying for author. When you're using q=Herbert, the search goes to the default search field, which probably isn't author (but usually text).

If you're using the edismax or dismax query parsers, you can use qf=author text to search both the text and the author field, and you can use qf=author^5 text to give more relevancy weight to hits in the author field.

The default configuration of the text field is probably not as stored="true", which will discard the actual content (and only keep the indexed terms for searching).

If you're indexing content using the ExtractingRequestHandler / Apache Tika / Solr Cell, the content is added in a field called content. If that field doesn't exist, the content is dropped.

Tika adds all the extracted text to the content field.

You can use fmap.content=<fieldname> to map the content to a different field name.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
0
  1. First of all make sure the field 'author' is set to indexed="true" in the schema and set with the type you are expecting. You can play with it with the analysis tool of the Solr admin and the schema browser to validate your assumptions.

  2. You used the realtime get to assess the update worked well. Which is fine, but bare in mind the realtime get works even if no commit happened ( as it works in cooperation with the Transaction Log), so be sure a softcommit or an hard commit ( with openSearcher=true is triggered)

  3. be sure your query make sense to your use case. If you want to search specifically by the author "author:Herbert" is the correct query.