0

Solr Version :: 6.6.1

I am able to import the pdf files into the Solr system using the DIH and performs the indexing as expected. But i wish to clear the folder C:/solr-6.6.1/server/solr/core_K2_Depot/Depot after the successful finish of the indexing process.

Please suggest, if there is a way to delete all the files from the folder via the DIH data-config.xml or by another easier way.

<!--Local filesystem-->
<dataConfig>
  <dataSource type="BinFileDataSource"/>
  <document>
    <entity name="K2FileEntity" processor="FileListEntityProcessor" dataSource="null"
            recursive = "true"
            baseDir="C:/solr-6.6.1/server/solr/core_K2_Depot/Depot" fileName=".*pdf" rootEntity="false">

        <field column="file" name="id"/>
        <field column="fileLastModified" name="lastmodified" />

          <entity name="pdf" processor="TikaEntityProcessor" onError="skip"
                  url="${K2FileEntity.fileAbsolutePath}" format="text">

                <field column="title" name="title" meta="true"/>
                <field column="dc:format" name="format" meta="true"/>
                <field column="text" name="text"/>

          </entity>
    </entity>
  </document>
</dataConfig>
Karan
  • 3,265
  • 9
  • 54
  • 82
  • 1
    That sounds like something that should go into the script you're using to trigger the import, as Solr shouldn't really remove documents from the disk – MatsLindh Nov 20 '17 at 12:00
  • The pdf files are useless and can be removed, when the DIH gets finished successfully with indexing. That's why i want to clean the folder. – Karan Nov 20 '17 at 12:12

1 Answers1

0

Usuaully, in production you want to run DIH proces via shell scripts, which are at first copying needed files for ftp, http, s3, etc, than runs full-import or delta-import and later track the status of the indexing via status command as soon as it will successfully ends you just need to execute rm command

while flag; do
   curl -XGET // get status of the DIH
   if finished change flag to false

rm files -rf // removing not needed files for indexing

There are no any support of deleting external files in Solr

Mysterion
  • 9,050
  • 3
  • 30
  • 52