3

While exploring example for indexing wikipedia data in Solr, how can we get the expected result (i.e. same as data imported)?

Is there any process that we can achieve it through configurations not from group query, because I have data which having lots of inner tags.

I explored xslt result transformation, but i am looking for json response.

imported doc:

<page>
<title>AccessibleComputing</title>
    <ns>0</ns>
    <id>10</id>
    <redirect title="Computer accessibility" />
    <revision>
    <id>381202555</id>
    <parentid>381200179</parentid>
    <timestamp>2010-08-26T22:38:36Z</timestamp>
    <contributor>
         <username>OlEnglish</username>
         <id>7181920</id>
    </contributor>
</revision>
</page>

solrConfig.xml:

<dataConfig>
        <dataSource type="FileDataSource" encoding="UTF-8" />
        <document>
        <entity name="page"
                processor="XPathEntityProcessor"
                stream="true"
                forEach="/mediawiki/page/"
                url="data/enwiki-20130102-pages-articles.xml"
                transformer="RegexTransformer,DateFormatTransformer"
                >
            <field column="id"        xpath="/mediawiki/page/id" />
            <field column="title"     xpath="/mediawiki/page/title" />
            <field column="revision"  xpath="/mediawiki/page/revision/id" />
            <field column="user"      xpath="/mediawiki/page/revision/contributor/username" />
            <field column="userId"    xpath="/mediawiki/page/revision/contributor/id" />
            <field column="text"      xpath="/mediawiki/page/revision/text" />
            <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
            <field column="$skipDoc"  regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
       </entity>
       </document>
</dataConfig>

Response by solr query:

  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "10",
        "timestamp": "2010-08-26T17:08:36Z",
        "revision": 381202555,
        "titleText": "AccessibleComputing",
        "userId": 7181920,
        "user": "OlEnglish"
      }
    ]
  }

expected response:

"response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "10",
        "timestamp": "2010-08-26T17:08:36Z",
        "revision": 381202555,
        "titleText": "AccessibleComputing",
        "contributor": [{
            "userId": 7181920,
            "user": "OlEnglish"
        }]
      }
    ]
  }
user2551549
  • 192
  • 2
  • 12
  • I think you cannot do that: http://stackoverflow.com/questions/5584857/solr-documents-with-child-elements – anti_social Aug 19 '13 at 09:18
  • thanks for reply, while searching I found that, we can modify the result using xslt but there is any way we can do it using queryResponseWriter? and is there any example of custom queryResponseWriter? – user2551549 Aug 19 '13 at 09:34
  • I think you can achieve this using a poly-field-type and a multi-valued field. Ploy field types are like `solr.CurrencyField`, which consists of two or more simple fields like `integer`, `string`, etc. – xiaofeng.li Aug 20 '13 at 02:20

1 Answers1

2

If you don't like the idea of using XsltResponseWriter (which can help int outputting the results in JSON as well), you can create your own SearchComponent, which will modify the output. When you use a custom SearchComponent you can apply different ResponseWriters to the output (xml, json, csv, xslt, etc.).

You can learn how to create a custom SearchComponent in this article, for example.

To use XsltResponseWriter, add this code to solrconfig.xml:

<queryResponseWriter name="xslt" class="org.apache.solr.response.XSLTResponseWriter"/>

Add a json.xsl file to conf/xslt folder, which has transformation rules for your XML output (when you use wt=xml in your query), something like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:strip-space elements="*"/>
  <xsl:output method="text" indent="no" media-type="application/json"/>

  <xsl:template match="result">
    <xsl:text>{"response":{"docs":[</xsl:text>
    <xsl:apply-templates select="doc"/>
    <xsl:text>]}}</xsl:text>
  </xsl:template>

  <xsl:template match="doc">
    <xsl:if test="position() &gt; 1">
      <xsl:text>,</xsl:text>
    </xsl:if>
    <xsl:text>{"contributor": [{"userId": </xsl:text><xsl:value-of select="userId"/><xsl:text>, "user": "</xsl:text><xsl:value-of select="user"/><xsl:text>"}]}</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Then you can get this response using a url like:

http://localhost:8983/solr/select/?q=id:10&wt=xslt&tr=json.xsl
Artem Lukanin
  • 556
  • 3
  • 15
  • Is there any complex example of using XsltResponseWriter? I found examples which has very basic xslt format. – user2551549 Oct 03 '13 at 03:00
  • XSL Transformation is done for XML files. I don't see XML in your question to transform to json. Actually, this is a new question, which has no direct connection to Solr, it is better to ask a new question with [XSLT](http://stackoverflow.com/questions/tagged/xslt) tag. – Artem Lukanin Oct 03 '13 at 06:35
  • I am trying to implement XsltResponseWriter to achieve what i asked in my question, Actually my file having large set of data, I am able to convert it into solr doc style but while querying for search result i need the data same as the content of file i imported (i.e in above example "contributor" having sub fields). I appreciate your help and thanks for quick response. – user2551549 Oct 03 '13 at 11:56
  • added example, elaborate it to include all fields – Artem Lukanin Oct 03 '13 at 12:22