3

I am trying to port some part of a desktop application to be able to run in the browser (client-side). I need a sort of virtual file system, in which I can read and write files (binary data). From what I gather, one of the only options that works broadly across browsers is IndexedDB. However, I'm kind of alienated trying to find examples that read or write larger files. It seems the API only supports passing/obtaining an entire file contents to/from the database (a blob or byte array).

What I'm trying to find, is something in which I can continuously "stream" so to speak the data to/from the virtual file system, analoguous to how you do it on any other non-browser application. E.g. (pseudo code)

val in = new FileInputStream(someURLorPath)
val chunkSize = 4096
val buf = new Array[Byte](chunkSize)
while (in.hasRemaining) {
  val sz = min(chunkSize, in.remaining)
  in.read(buf, 0, sz)
  processSome(buf, 0, sz)
  ...
)
in.close()

I understand synchronous API is a problem for browsers; it would also be ok, if read was an asynchronous method instead. But I want to go through the file - which can be huge, e.g. several 100 MB - block by block. The block size doesn't matter. This goes both for reading and for writing.

Random-access (being able to seek to a position within the virtual file) would be a plus, but not mandatory.


One idea I have is that one store = one virtual file, and then the keys are chunk indices? A bit like the cursor example on MDN, but each record is a blob or array of a fixed size. Does that make sense? Is there a better API or approach?


It seems that Streams would conceptually be the API I'm looking for, but I don't know how to "stream to/from" a virtual file system such as IndexedDB.

0__
  • 66,707
  • 21
  • 171
  • 266
  • Is the data meant to actually reside on the user's local computer? Or can you stream it from a remote storage server over HTTP? If so, could you use a simple Range request? – Dai Oct 25 '20 at 00:08
  • @Dai thanks for the question. Indeed, both cases will be relevant. Imagine these are sound and video files. I must be able both to retrieve them from an only resource and place them in the local storage (and I don't want to have an entire 300 MB file in memory at some point between fetch and put), but also the web application must be able to process and create new sound files in the virtual system. In most cases, they do not need to remain after the user closes the browser tab, but it would be nice if that was possible. – 0__ Oct 25 '20 at 00:11
  • 1
    Sounds like you will quickly run into storage size restrictions here – charlietfl Oct 25 '20 at 00:21
  • @charlietfl I think storage manager reports 2 GB (Firefox) which should be fine for most cases. – 0__ Oct 25 '20 at 00:27
  • Can you run a headless webserver process on the user's machine? Or require them to use Chromium-based browsers? If you want to stick to _only_ W3C Recommendations with wide browser support you're kinda stuck for any kind of local storage options (and that's not even considering IE11 support). – Dai Oct 25 '20 at 00:33
  • btw, IO is inherently asynchronous in all modern environments: The C Standard Library `read` function in many OS (including Windows) internally starts an asynchronous IO and then blocks the thread - which is why end-to-end async APIs are better and are replacing synchronous IO in many programming languages and libraries. – Dai Oct 25 '20 at 00:36
  • Are your data files mutable, strictly immutable, or append-only? – Dai Oct 25 '20 at 00:38
  • @Dai one could say either read-only or append-only write. Strictly speaking, I don't need to be able to overwrite file contents (although it would be convenient to have). – 0__ Oct 25 '20 at 00:44
  • possibly related: https://stackoverflow.com/questions/61122409/is-there-a-way-for-a-progressive-web-app-to-save-a-lot-of-data-without-using-up – 0__ Oct 25 '20 at 01:16

1 Answers1

2

Assuming you want the ability to transparently work with initially remote resources which are cached (and consistent) locally, you can abstract over fetch (with Range: requests) and IndexedDB.

BTW, you'll really want to use TypeScript for this, because working with Promise<T> in pure JavaScript is a PITA.

one could say either read-only or append-only write. Strictly speaking, I don't need to be able to overwrite file contents (although it would be convenient to have)

Something like this..

I cobbled this together from MDN's docs - I haven't tested it, but I hope it put you in the right direction:

Part 1 - LocalFileStore

These classes allow you to store arbitrary binary data in in chunks of 4096 bytes, where each chunk is represented by an ArrayBuffer.

The IndexedDB API is confusing at first, as it doesn't use native ECMAScript Promise<T>s but instead its own IDBRequest-API and with oddly named properties - but the gist of it is:

  • A single IndexedDB database named 'files' holds all of the files cached locally.
  • Each file is represented by its own IDBObjectStore instance.
  • Each 4096-byte chunk of each file is represented by its own record/entry/key-value-pair inside that IDBObjectStore, where the key is the 4096-aligned offset into the file.
    • Note that all IndexedDB operations happen within an IDBTransaction context, hence why class LocalFile wraps a IDBTransaction object rather than an IDBObjectStore object.
class LocalFileStore {
    
    static open(): Promise<IDBDatabase> {
        
        return new Promise<IDBDatabase> ( function( accept, reject ) {
            
            // Surprisingly, the IndexedDB API is designed such that you add the event-handlers *after* you've made the `open` request. Weird.
            const openReq = indexedDB.open( 'files' );
            openReq.addEventListener( 'error', function( err ) {
                reject( err );
            };
            openReq.addEventListener( 'success', function() {
                const db = openReq.result;
                accept( db );
            };
        } );
    }

    constructor(
        private readonly db: IDBDatabase
    ) {    
    }
    
    openFile( fileName: string, write: boolean ): LocalFile {
        
        const transaction = this.db.transaction( fileName, write ? 'readwrite' : 'readonly', 'strict' );
        
        return new LocalFile( fileName, transaction, write );
    }
}

class LocalFile {
    
    constructor(
        public readonly fileName: string,
        private readonly t: IDBTransaction,
         public readonly writable: boolean
    ) {
    }

    getChunk( offset: BigInt ): Promise<ArrayBuffer> {
        
        if( offset % 4096 !== 0 ) throw new Error( "Offset value must be a multiple of 4096." );
       
        return new Promise<ArrayBuffer>( function( accept, reject ) {
        
            const key = offset.ToString()
            const req = t.objectStore( this.fileName ).get( key );
            
            req.addEventListener( 'error', function( err ) {
                reject( err );
            } );
            
            req.addEventListener( 'success', function() {
                const entry = req.result;
                if( typeof entry === 'object' && entry !== null ) {
                    if( entry instanceof ArrayBuffer ) {
                        accept( entry as ArrayBuffer );
                        return;
                    }
                }
                else if( typeof entry === 'undefined' ) {
                    accept( null );
                    return;
                }

                reject( "Entry was not an ArrayBuffer or 'undefined'." );
            } );

        } );
    }

    putChunk( offset: BigInt, bytes: ArrayBuffer ): Promise<void> {
        if( offset % 4096 !== 0 ) throw new Error( "Offset value must be a multiple of 4096." );
        if( bytes.length > 4096 ) throw new Error( "Chunk size cannot exceed 4096 bytes." );
        
        return new Promise<ArrayBuffer>( function( accept, reject ) {
        
            const key = offset.ToString();
            const req = t.objectStore( this.fileName ).put( bytes, key );
            
            req.addEventListener( 'error', function( err ) {
                reject( err );
            } );
            
            req.addEventListener( 'success', function() {
                accept();
            } );

        } );
    }

    existsLocally(): Promise<boolean> {
        // TODO: Implement check to see if *any* data for this file exists locally.
    }
}

Part 2: AbstractFile

  • This class wraps the IndexedDB-based LocalFileStore and LocalFile classes above and also uses fetch.
  • When you make a read request for a range of a file:
    1. It first checks with the LocalFileStore; if it has the necessary chunks then it will retrieve them.
    2. If it's lacking any chunks in the range then it will fallback to retrieving the requested range using fetch with a Range: header, and cache those chunks locally.
  • When you make a write request to a file:
    • I actually haven't implemented that bit yet, but that's an exercise left up to the reader :)
class AbstractFileStore {
    
    private readonly LocalFileStore lfs;

    constructor() {
        this.lfs = LocalFileStore.open();
    }

    openFile( fileName: string, writeable: boolean ): AbstractFile {
        
        return new AbstractFile( fileName, this.lfs.openFile( fileName, writeable ) );
    }
}

class AbstractFile {
    
    private static const BASE_URL = 'https://storage.example.com/'

    constructor(
        public readonly fileName: string,
        private readonly localFile: LocalFile
    ) {
        
    }

    read( offset: BigInt, length: number ): Promise<ArrayBuffer> {

        const anyExistsLocally = await this.localFile.existsLocally();
        if( !anyExistsLocally ) {
            return this.readUsingFetch( chunk, 4096 ); // TODO: Cache the returned data into the localFile store.
        }

        const concat = new Uint8Array( length );
        let count = 0;

        for( const chunkOffset of calculateChunks( offset, length ) ) {
             // TODO: Exercise for the reader: Split `offset + length` into a series of 4096-sized chunks.
            
            const fromLocal = await this.localFile.getChunk( chunk );
            if( fromLocal !== null ) {
                concat.set( new Uint8Array( fromLocal ), count );
                count += fromLocal.length;
            }
            else {
                const fromFetch = this.readUsingFetch( chunk, 4096 );
                concat.set( new Uint8Array( fromFetch ), count );
                count += fromFetch.length;
            }
        }

        return concat;
    }

    private readUsingFetch( offset: BigInt, length: number ): Promise<ArrayBuffer> {
        
        const url = AbstractFile.BASE_URL + this.fileName;

        const headers = new Headers();
        headers.append( 'Range', 'bytes=' + offset + '-' + ( offset + length ).toString() );

        const opts = {
            credentials: 'include',
            headers    : headers
        };

        const resp = await fetch( url, opts );
        return await resp.arrayBuffer();
    }

    write( offset: BigInt, data: ArrayBuffer ): Promise<void> {
        
        throw new Error( "Not yet implemented." );
    }
}

Part 3 - Streams?

As the classes above use ArrayBuffer, you can make-use of existing ArrayBuffer functionality to create a Stream-compatible or Stream-like representation - it will have to be asynchronous of course, but async + await make that easy. You could write a generator-function (aka iterator) that simply yields each chunk asynchronously.

Dai
  • 141,631
  • 28
  • 261
  • 374
  • Thanks for the extensive answer! I was thinking along these lines indeed. Another orthogonal thing I discovered, is the [idb.filesystem.js library](https://github.com/ebidel/idb.filesystem.js/blob/2f9e8fb3ece3aeca8388e60ba75d622dd354e6da/src/idb.filesystem.js#L313). Instead of having multiple keys per file (one per chunk), it updates a `Blob` object over and over again, and then re-puts it into the store. It seems counter-intuitive at first, but I read blobs do not have to reside in memory, they are indeed like files, so perhaps the browser is optimised for such updates. – 0__ Oct 25 '20 at 02:37
  • @0__ Technically the browser/engine is free to abstract away _anything_ and _everything_ in a script. Note that a `Blob` does not represent a file on-disk _or_ anything in-memory, it's just an abstraction over *any* arbitrary binary data of fixed-length (e.g. you can also get a `Blob` from a `fetch` `Response` too). I agree that concatenating `Blob` objects is *probably* inefficient - I recommend profiling first! – Dai Oct 25 '20 at 02:39
  • when wrapping calls to put/add, the promise should only resolve when the transaction containing the request completed, not the request – Josh Oct 25 '20 at 08:41