2

I'm new to SOLR and was doing some research on this technology. I now have a question regarding the delta-import function so I looked on SO and found this: Solr DataImportHandler delta import. In the answer there is a field [date_update] mentioned which seems to be a timestamp of the record.

My question is: Is [date_update] a timestamp stored in the table on record creation? If so, cannot this create an issues if the date of the Database Server is not exactly in sync with the server on which SOLR is installed? This could possible leave out some records if the Solr server time is ahead of SQLServer time.

Community
  • 1
  • 1
mrd3650
  • 1,371
  • 3
  • 16
  • 24

2 Answers2

1

This solution might left some records behind (if servers are not configured properly).

I'm using similar solution but with some modifications. Items in DB have timestamp field updated when item changes in any way.

Before updating index I'm getting last timestamp from Solr (this field is stored), then I'm passing this timestamp in index query to Solr (/?command=full-import&clean=false&timestamp=...).

Using query attribute for both full and delta import

That way time on Solr machine have nothing to do with the time on DB machine. However in my case, after indexing is completed I'm performing quick verification with DB (check is anything missing for some reason, or something have to be deleted).

You can also use that kind of verification when you use dataimporter.last_index_time.

Fuxi
  • 5,298
  • 3
  • 25
  • 35
  • Happy new year and thanks for your answer. In the answer above, I think that the problem still exists since the Solr timestamp "timestamp from Solr (this field is stored)" is still being compared to the DB Table timestamp no? I think it's just the process you mentioned afterwards that verifies everything is imported. Have you ever encountered instances where some data has not been imported? – mrd3650 Jan 02 '12 at 09:10
  • Yes, that timestamp is compared with timestamp in DB, but it's indexed from DB as well, therefore time on Solr's machine is not important (not used). – Fuxi Jan 03 '12 at 21:26
  • Thanks for your reply. However i'm still not understanding this properly. What is confusing me is this sentence: "_Before updating index I'm getting last timestamp from Solr (this field is stored)_ **is this dataimporter.last_index_time?** _, then I'm passing this timestamp in index query to Solr (/?command=full-import&clean=false&timestamp=...)._". Also, I am assuming that the timestamp from Solr is then being passed as a parameter in the import query no? – mrd3650 Jan 04 '12 at 08:43
  • I have timestamp field in DB as well as in Solr (to indicate when each record was last modified). This timestamp is not dataimporter.last_index_time, but "/select/?q=*:*&fl=timestamp&rows=1&sort=timestamp+desc&wt=xslt&tr=timestamp.xsl". XSL transformation is used to get format that DB understand. And as you said, I'm using that timestamp (as parameter) in import query. – Fuxi Jan 04 '12 at 12:45
  • Oh, so in effect your are passing to the database, the same (latest) timestamp that the database provided earlier, is this so? Also please could you give me an example of the xsl file you are using or point me to a good source? – mrd3650 Jan 04 '12 at 15:42
  • Depends what DB are you using but should be enough to replace 'TZ' with space :) You can do that in any script, I'm using XSL because it's fastest solution for me. Example in transforming dates in Solr's xsl: [XsltResponseWriter](http://wiki.apache.org/solr/XsltResponseWriter) more about xsl: [w3schools xslt](http://www.w3schools.com/xsl/default.asp) and [w3schools xslt functions](http://www.w3schools.com/Xpath/xpath_functions.asp). Let me know if you need more info. – Fuxi Jan 05 '12 at 09:04
0

You could use FlexCDC, which monitors the MySQL binary log for table data changes:

http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152