To import data by using JOIN query in MySQL database and import it
Yes, this is achievable in solr using DIH.
With the DIH, as you have to configure your data-config.xml.
Here you can write the query using the joins which will
fetch the data from all the desired table. Here you can create a single core and can have all the data in the single core.
You can create your document using those field. (Documents fields will be mentioned in schema.xml).
Points to consider here for the optimization would be what all fields you want to search on and wanted to show in the result.
So you need to sort of this first. Which on fields will you search on and need to displayed.
The fields on which you need search make them as indexed="true". Rest all make as indexed="false".
The fields which you need in the result mark them as stored="true". Rest all make as stored="false".
Some may be require as both, like search and show in result. Mark them as indexed="true" and stored="true".
for example I had 15 fields in my document but only 4 are indexed, as I want to search only on those fields.
and rest all fields are shown in the result so there are stored.
Now coming to your second question
JOIN solr cores by importing full data separate tables.
Yes this is possible in solr since solr 4.0
for a detailed example check the below link
https://wiki.apache.org/solr/Join
But also condider the limitations of it.
Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents
(ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents).
So you can consider these points before you take a final call.
Consider here you have two cores
core brands with fields {id,name}
core products with fields{id, name, brand_id}
data in core BRANDS: {1, Apple}, {2, Samsung}, {3, HTC}
data in core PRODUCTS: {1, iPhone, 1}, {2, iPad, 1}, {3, Galaxy S3, 2}, {4, Galaxy Note, 2}, {5, One X, 3}
you would build your query like :
http://example.com:8999/solr/brands/select?q=*:*&fq={!join from=brand_id to=id fromIndex=products}name:iPad
and the Result will be: {id: "1", name:"Apple"}
In a DistributedSearch environment, you can not Join across cores on multiple nodes.
If however you have a custom sharding approach, you could join across cores on the same node.
The Join query produces constant scores for all documents that match --
scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents.
Considering the above points I hope you can decide on which approach you want to take.