Retrieved parent folders from a document aren't always correct

Question

We're migrating a library content from Filenet Content Services to Filenet P8.

So we wrote an extractor which outputs both folders tree and document list, in XML format, each document with versions, properties and parent folder. This extractor relies on a home-made dll which virtualizes FileNet objects.

Documents are retrieved this way (huge sql request) :

Public Function getAllDocumentIds() As ADODB.Recordset
  Dim cmdProperties As New Dictionary

  cmdProperties.Item("Maximum Rows") = 0
  cmdProperties.Item("Page Size")    = 0

  Set getAllDocumentIds = _
    executeADOQuery("SELECT idmId,            idmVerFileName, "  & vbNewLine & _
                    "       idmVerCreateDate, idmAddedByUser"    & vbNewLine & _
                    " FROM FNDOCUMENT ORDER BY idmId ASC", & _
                    cmdProperties)
End Function

But we encounter issues when we retrieve parent folders this way (slightly modified to be used as an example) :

Public Function getFolders(document As IDMObjects.document) As Collection
  Dim f As IDMObjects.Folder
  ' [...]
  For Each f In document.FoldersFiledIn '
    ' folders retrieval
  Next
End Function

For a little amount of documents, some "wrong" parent folders ("folders [the document is] filed in") are reported.

"Wrong" because the following way doesn't report the document has being filed in them (slightly modified code too) :

Public Function getDocumentIds(folder As IDMObjects.Folder) As Collection
  Dim rs As ADODB.Recordset
  Dim cmdProperties As New Dictionary

  ' Actually there is a control here, to prevent Filenet crashing.
  ' Indeed, setting "Page Size" to 0 crashes if no results are returned.
  ' *Maybe* a product bug.

  cmdProperties.Add "SearchFolderName", internalObject.id ' Folder parent
  cmdProperties.Item("Maximum Rows") = 0 ' No limits
  cmdProperties.Item("Page Size") = 0 ' No limits. Crashes if Recordset is empty

  ' Here, the cmdProperties entries are copied to an
  ' ADODB.Command object, into the "properties" member.
  ' The query are copied to this object, into the "CommandText" member.
  ' Then ADODB.Command.Execute is invoked and returs an ADODB.RecordSet.
  ' So this Recordset contains all documents filled in this folder.
  Set rs = executeADOQuery("SELECT * from FnDocument", cmdProperties)

  Exit Function

End Function

We're working on a workaround, which may take more resources (for each document, double-check...). But understanding why we don't get same results could be relevant to check our library.

Applied-Logic · Answer 1 · 2016-06-17T19:59:18.667

If I understand the problem correctly, I believe the quick answer is that the logical order of parent and child records in the result set can not be guaranteed by the query. You are making an assumption about the ID sequences. A document can be moved so there is nothing to guarantee a folder id will occur before that of a document id or vice versa. For a large document set, to solve this without recursion, defer the child records without a parent and resolve them later (in the sample I used a sentinel to flag/filter such records). Depending upon the number of 'orphaned' rows, you may be able to do this in memory, or it may require a second pass. The getrows method will allow you to handle 'huge' datasets, especially if you're using XML and don't want to run out of memory.

Thanks for your reply, especially because you obviously even created an account to answer th this questions. We made a work around for this, I will come back to read more carefully your answer later. — Amessihel, Jun 17 '16 at 19:58

Retrieved parent folders from a document aren't always correct

1 Answers1