1

I am currently working on a solution for streaming huge files from EMC Documentum and to the client through Jersey. The API for Documentum allows either to get the file as a ByteArrayInputStream or to save it down to a disk area. Using the ByteArrayInputStream is out of question as it stores the whole file in memory, which is not acceptable for 20 gigabyte files.

Therefore the only solution is to save the file to a disk area (use of internal classes and functions is also out of question). In order to make it faster I want to let Documentum to write the data to a file and at the same time read data from this file and stream it to the client through Jersey (by returning an InputStream in Jersey).

The problem is that if the reading thread is faster than the writing thread it will reach the end of the stream and get -1 back, meaning there is no more data in the stream. When this happens Jersey will probably stop streaming the file as it thinks it is done.

Are there any best practices or libraries for this kind of problem? I have been looking on the internet and have some workarounds in mind, but it feels like this should be a common problem and there maybe already exists a solution in the Jersey API which I missed or some other library. Does there exist a class in Jersey which you can return and explicitly set when the end of the stream is reached?

Paul
  • 163
  • 1
  • 11
  • I don't know Documentum but writing from and reading to the same file is generally not the thing to do, unless the file's format consists of records of the same size (and then you won't be able to delete records in the middle anyway). Is this the case? Also, define "huge"? – fge May 05 '15 at 09:05
  • The files can be up to 20 gigabytes, and they will contain static binary data. – Paul May 05 '15 at 09:21
  • Do you have a link to the javadoc of the API? – fge May 05 '15 at 09:24
  • "Writing to and reading from the same file at the same time ": always a bad idea already. – user207421 May 05 '15 at 09:39
  • AFAIK there is no solution on the Documentum side of things. You _can_ finetune how UCF transfer is handled, but partial loading, simultaenous write on the same file, etc. makes me think this one's a long shot at least when dealing with binary files. – eivamu May 05 '15 at 09:50
  • Sorry, I have no link to the api for Documentum. Thank you all for valuable input! – Paul May 07 '15 at 14:07

2 Answers2

4

The API for Documentum allows either to get the file as a ByteArrayInputStream or to save it down to a disk area

Actually, DFC provides two another options for transferring content from content server:

  1. getCollectionForContent() method (poorly documented, but present in public API):

    IDfCollection collection = null;
    try {
        collection = object.getCollectionForContent(null, 0);
        long total = 0;
        while (collection.next()) {
            // 64K chunk
            ByteArrayInputStream baos = collection.getBytesBuffer(null,
                    null, null, 0);
        }
    } finally {
        if (collection != null) {
            collection.close();
        }
    }
    
  2. getStream() method in ISysObjectInternal interface (not a part of public API, but widely used by EMC applications):

    InputStream stream = null;
    try {
        stream = ((ISysObjectInternal) object).getStream(null, 0, null, false);
    
        // some logic here
    
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
    
Andrey B. Panfilov
  • 4,324
  • 2
  • 12
  • 18
0

EMC Documentum is DMS - document management system. I am sure that you cannot use same repository object to concurrently read/write the same version of that particular object.

If you really need to stick to the Documentum maybe you could try accessing real content at the filestore location which ever filestore type you are using. Yet again, this way you need to reconsider security issues and stuff like that.

Miki
  • 2,493
  • 2
  • 27
  • 39