0

Lists (both lists fields is equal) example:

name(rt_field): Aaaaaa
attr1(rt_attr_uint): 3
attr2(rt_attr_uint): 0
attr3(rt_attr_multi): (a,b,c,d...)

My goal is to find equal data in both lists.

The way i do it now - I get one list from MySQL db and run it in a cycle with SphinxQL step-by-step. It runs for almost 2 seconds with 500 rows in first list and with 400 000 rows in second list. My query for each row is like:

SELECT id FROM list2 WHERE MATCH('Name') AND attr1=9 AND attr2=0 AND attr3 IN (a,b,x)

What is the best way to find equal data between two lists?

P.S.: Prefer SphinxQL.

o139
  • 854
  • 2
  • 8
  • 20

1 Answers1

1

Well if its was just scaler attributes, it would be easy...

SELECT *,COUNT(*) AS cnt FROM list1, list2 GROUP BY attr1, attr2 HAVING cnt > 1 

(can also use GROUP_CONCAT() to get all the ids that are duplicate)

But MVA attributes are more complicated, because GROUP BY'ing on a MVA column is handled specially, rather than grouping by the whole column, it groups by each value seperately. You could perhaps duplicate the mva into a string attribute (so grouping on it would take all the values, and hence find duplicates)

... and FIELDS are even more complicated. You can't group on fields at all. You can use the same trick and duplicate the field into a string attribute - which then again allows you to group on it.

(but string attribute grouping is 'exact' ie finds where the two strings are identical in every way. If you want more 'fuzzy' matching, ie using stemming or other similar features of full-text matching its more complicated. Maybe could use a two phase approach, use GROUP BY to get rows where the attributes match, then run a more directed query to find if the fields are similar enough.

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • Thank you, Barry, we solve half of problem. But, i understand now, what i need more conditions in my query. How can i use something like that: ... WHERE list1.user=11 AND list2.user=99? – o139 Nov 12 '15 at 22:28
  • Well you could do `WHERE user IN (11,99)`, remember both tables are UNIONed before being GROUPed. – barryhunter Nov 13 '15 at 11:23
  • is it possible to get data as `LIST1.USER=11 AND LIST2.USER=99`? I don't need data for `LIST1.USER=99` and don't need `LIST2.USER=11` – o139 Nov 18 '15 at 19:18
  • THe only way I think would be to offset the ids, so they dont overlap. eg, when inserting, insert 1000011 for list one, and 2000099 for list two (the first digit being unique to the index), if instered id 11 into list two would inserr 2000011) Then can do `WHERE user IN (1000011, 2000099)` – barryhunter Nov 18 '15 at 20:28