0

I have a value in FieldA and another value in FieldB that is stored for comparison purposes by a downstream function. I need to compare if FieldA=FieldB which I do with Mysql:

Select TableA T1
Inner Join TableB T2
On T1.ID=T2.ID
and T1.FIeldA=T2.FIeldB
Set Matched='Y'

However in order to make sure FieldA and FieldB are identical since they come from different sources I need to run a complex Standardization mysql update first e.g.

Update TableA set Field1 Set Field1=Replace(Field1,'Gato','Cat) where Field1 like '%Gato%

(I actually use Regex to make sure the term is whole word anywhere but wanted to keep the question simple, just in case anyone felt compelled to point out where the above might fail).

The issue is I have around 2000 terms that need to be updated so I have to run all 2000 queries on both fields and every time I do the comparison.

So it seemed to me an ideal solution would be Sphinx where I could wordforms or regexp e.g.

Gato>Cat
Perro>Dog

to index the table and then use Sphinx to compare them so that Gato and Cat would match.

However I can't figure out a way to structure q sphinxQL query that does a Match between two fields let alone one that will affect an update as well. Is there any such solution?

user3649739
  • 1,829
  • 2
  • 18
  • 28

1 Answers1

0

In theory should be able to use SphinxSE

http://sphinxsearch.com/docs/current/sphinxse-using.html

https://mariadb.com/kb/en/mariadb/sphinx-storage-engine/

This allows you to run sphinx queries 'inside' mysql, as such. So could form a mysql query (using subqueries) that runs an UPDATE based on the as sphinx (sub)query.

If have a sphinx index on TableB (with wordforms enabled), called SphIndexB in theory something like

UPDATE TableA INNER JOIN SphIndexB 
SET Matched = 'Y' 
WHERE SphIndexB.query = CONCAT('@FIeldB "^',TableA.Field1,'$" ;mode=extended;limit 1')

... it will be quite slow, as it has to run a full sphinx query over each and every row of Table A - the only benefit is it all happens in Mysql engine process, rather than coded in application :)

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • btw, started out saying to use subquery, but then thought can probably use a straight join (as long as sphinxse table is on the right). But untested, might still need a subquery. But the concept is the same. – barryhunter Nov 28 '16 at 16:05
  • I will try that, the slowness of the process should be offset by having to run standardization on all the fields and tables I want over and over vs having a single sphinx index for those tables. I will report back – user3649739 Nov 29 '16 at 16:22
  • I tried that just as is before changing a thing to reflect my tables I got this `[Err] 1064 - sphinxql: syntax error, unexpected IDENT, expecting SET or ',' near 'INNER JOIN SphIndexB SET Matched = 'Y' WHERE SphIndexB.query = CONCAT('@FIeldB "^',TableA.Field1,'$" ;mode=extended;limit 1')'` – user3649739 Nov 29 '16 at 22:45
  • You seem to be running that command against searchd via sphinxQL. Thats a mysql command you would run in mysql server. If you had SphinxSE setup, and created a SphIndexB **mysql** table (which is a virtual table pointing back at a index) – barryhunter Nov 30 '16 at 10:05
  • Oh I see. I tried running in Mysql too but it failed but I see the issue is I need to in fact install SphinxSE and then create the 'SphinxdexB'. Before i actually do all this can I confirm the following; the SphindexB virtual table will be based on the index of the mysql table so that any changes I make to the mysql will reflect in the SphinxexB (upon rotation)? – user3649739 Nov 30 '16 at 12:08
  • Not automatically. But if you rebuild the index (using indexer) then the index will be updated. Its typical to call indexer from cron to keep the index updated. – barryhunter Nov 30 '16 at 14:07
  • Yes that would be my workflow; update the mysql table, call indexer to rotate the index to keep it in sync then run the update with sphinxSE. I will try and report back here if it works we can add as solution :) – user3649739 Nov 30 '16 at 16:13