2

We were running a backup and consistency check on a database. In the _lost_and_found we find one file and a _contents_xml This file is named for us "cover.xml"

It obviously somehow became corrupt as the export XML is like this (no tags, and text from something else):

<?xml version="1.0" encoding="UTF-8"?>
Most people do not have any problems and are satisfied ...

Now the _contents_xml does contain this info:

<collection xmlns="http://exist.sourceforge.net/NS/exist" name="/db/__lost_and_found__" version="1" owner="admin" group="dba" mode="0771">
    <resource type="XMLResource" name="cover.xml" skip="no" owner="XXXXXXXXX" group="UK_Territory" mode="644" created="2019-11-13T15:09:41.625Z" modified="2019-11-13T15:10:28.282Z" filename="cover.xml" mimetype="application/xml">
        <acl entries="0" version="1"/>
    </resource>
</collection>

The end of the log does report this:

  DOCUMENT: 35000 of 35011
  DOCUMENT: 35011 of 35011
----------------------------------------------
RESOURCE_ACCESS_FAILED:
Failed to access document data
Document ID: 31499

Now how do we find that document in the /db? We ask because there are over 5000 "cover.xml" files in different collections and this does not tell us in which collection the broken file is that we can see. Is there a way to find that out from the above information?

line-o
  • 1,885
  • 3
  • 16
  • 33
Kevin Brown
  • 8,805
  • 2
  • 20
  • 38

1 Answers1

0

I was not able to find something OOTB. I created a gist that could work for you. At least my quick test returned the path to the document. Hope it works for you as well.

https://gist.github.com/line-o/59b615bde0a19fe153152eb89f284ca1

NOTE: I was made aware that document-ids are re-used in existdb. That is, to not run out of document-ids. After a document is removed from the db the next one that is created might get that same id. So the document returned by my lookup might change over time.

line-o
  • 1,885
  • 3
  • 16
  • 33