Handle multiple tables of MySQL in with Solr 5.1.0?

Question

I have more than 30 tables in my MySQL database. Recently I have import data from my 1 table to Solr 5.1.0 using DataImporthandler and in my data-config.xml file, fire query of,

select * from table-name

But in my search I have to integrate more than 10 tables to give proper search result.

The ways are to do this is

1) To import data by using JOIN query in MySQL database and import it

OR

2) JOIN solr cores by importing full data separate tables.

What shoud I do make it optimize?? and which is a good way?

do you have multiple cores? – userMadhav May 11 '15 at 10:31 — userMadhav, May 11 '15 at 10:31

userMadhav · Answer 1 · 2015-05-11T12:24:19.190

If you have a single core then i would recommend importing tables into one single core and using joins.That is what i have done on my solr 4.9 with cake php and solrphpclient. But for this you will have to define table structure and data types in data-config.xml and schema.xml.Which i assume you must have done. In you data-config file you write queries or define a structure which will import all the data from your ten tables accordingly

See my example for two tables

 <entity name="type_masters" pk="type_id" query="SELECT delete_status as   
 type_masters_delete_status,type_updated,type_id,category_id,type_name FROM   
 type_masters
where type_id='${businessmasters.Business_Type}'"
deltaQuery="select type_id from type_masters where type_updated > 
'${dih.last_index_time}'"
parentDeltaQuery="select business_id from businessmasters where 
Business_Type=${type_masters.type_id}"> 
 <field column="type_id" name="id"/>   
 <field column="category_id" name="category_id" indexed="true" stored="true"   
/>
  <field column="type_name" name="type_name" indexed="true" stored="true" />

       <field column="type_updated" name="type_updated" indexed="true" 
stored="true" />
<field column="type_masters_delete_status" name="type_masters_delete_status" 
indexed="true" stored="true" />


<entity name="category_masters" query="SELECT delete_status as 
category_masters_delete_status,category_updated,category_id,category_name 
FROM category_masters where category_id='${type_masters.category_id}'"

   deltaQuery="select category_id from category_masters where category_updated > '${dih.last_index_time}'"

  parentDeltaQuery="select type_id from type_masters where 
  category_id=${category_masters.category_id}"> 

   <field column="category_id" name="id"/>   

  <field column="category_name" name="category_name" indexed="true"    
    stored="true" />
    <field column="category_updated" name="category_updated" indexed="true" 
   stored="true" />
             <field column="category_masters_delete_status" 
     name="category_masters_delete_status" indexed="true" stored="true" />
           </entity><!-- category_masters -->

      </entity><!-- type_masters -->

dont mind the editing ,i dont know iam unable to format my answer in code blocks. — userMadhav, May 11 '15 at 10:37
I got your answer and I have already change data-config file and imported my single table data. I want to search in 10 tables. So should I create only 1 core by joining 10 MySQL tables?? Or Create 10 different cores and join them in solr? — Mayur Champaneria, May 11 '15 at 11:57
my recommendation is create one core,that should do the trick — userMadhav, May 11 '15 at 12:03
Means you telling that I should use a single core that contains data of joined tables ?? Means my "search1" core contains joined results of my "User"+"Details"+"membership" mysql tables?? — Mayur Champaneria, May 15 '15 at 11:07
yes ,currently iam using 12 tables with join in a single core — userMadhav, May 18 '15 at 07:15
as you are using id as uniqueKey, so if the table type_masters has the same id with the table category_updated; you will miss some documents. In short, you must make sure category_id and type_id can't be same. — Stony, Dec 01 '16 at 04:21

Abhijit Bashetti · Accepted Answer · 2015-05-13T07:46:00.977

To import data by using JOIN query in MySQL database and import it

Yes, this is achievable in solr using DIH. With the DIH, as you have to configure your data-config.xml. Here you can write the query using the joins which will fetch the data from all the desired table. Here you can create a single core and can have all the data in the single core. You can create your document using those field. (Documents fields will be mentioned in schema.xml).

Points to consider here for the optimization would be what all fields you want to search on and wanted to show in the result. So you need to sort of this first. Which on fields will you search on and need to displayed.

The fields on which you need search make them as indexed="true". Rest all make as indexed="false". The fields which you need in the result mark them as stored="true". Rest all make as stored="false".

Some may be require as both, like search and show in result. Mark them as indexed="true" and stored="true".

for example I had 15 fields in my document but only 4 are indexed, as I want to search only on those fields. and rest all fields are shown in the result so there are stored.

Now coming to your second question

JOIN solr cores by importing full data separate tables. Yes this is possible in solr since solr 4.0

for a detailed example check the below link https://wiki.apache.org/solr/Join

But also condider the limitations of it.
Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents (ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents).

So you can consider these points before you take a final call.

Consider here you have two cores

core brands with fields {id,name}
core products with fields{id, name, brand_id}

data in core BRANDS: {1, Apple}, {2, Samsung}, {3, HTC}

data in core PRODUCTS: {1, iPhone, 1}, {2, iPad, 1}, {3, Galaxy S3, 2}, {4, Galaxy Note, 2}, {5, One X, 3}

you would build your query like :

http://example.com:8999/solr/brands/select?q=*:*&fq={!join from=brand_id to=id fromIndex=products}name:iPad

and the Result will be: {id: "1", name:"Apple"}

In a DistributedSearch environment, you can not Join across cores on multiple nodes. If however you have a custom sharding approach, you could join across cores on the same node.
The Join query produces constant scores for all documents that match -- scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents.

Considering the above points I hope you can decide on which approach you want to take.

Handle multiple tables of MySQL in with Solr 5.1.0?

2 Answers2