(This took me a while, so I'm providing the Question and Answer thinking it's worth it.)
The URL from which the DataImportHandler has to retrieve the data is secured via HTTPS and an additional auth
parameter. The configuration of the DataImportHandler
looks like this:
<dataConfig>
<dataSource type="URLDataSource"
baseUrl="https://www.gutscheinpony.de/"
encoding="UTF-8"/>
<document>
<entity name="pony"
pk="id"
url="feeds.xml?auth=XXX"
processor="XPathEntityProcessor"
forEach="/data/offers/offer"
xsl="xslt/gutscheinpony.xsl">
<!-- fields omitted -->
</entity>
</document>
</dataConfig>
Running this on a regular SOLR 6 installation will fail with a 403 Forbidden
code while a quick test on the same URL via curl
succeeds (showing only the interesting output):
curl https://www.gutscheinpony.de/feeds.xml?auth=XXX -Iv
> Host: www.gutscheinpony.de
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
Is it possible to set the User Agent for DataImportHandler
connections without writing custom Java code?