Usually, when working with Hadoop and Flink, opening/reading a file from a distributed file system will return a Source (counterpart of Sink) object extending the java.io.InputStream.
However, in Apache Ignite, the IgfsSecondaryFileSystem, and more specifically the IgniteHadoopIgfsSecondaryFileSystem, returns an object of type HadoopIgfsSecondaryFileSystemPositionedReadable when calling their "open" method (by passing an IgfsPath).
HadoopIgfsSecondaryFileSystemPositionedReadable offers a "read" method but requires to know details on where the data, which is intended to be read, is located, such as the input stream position.
/**
* Read up to the specified number of bytes, from a given position within a file, and return the number of bytes
* read.
*
* @param pos Position in the input stream to seek.
* @param buf Buffer into which data is read.
* @param off Offset in the buffer from which stream data should be written.
* @param len The number of bytes to read.
* @return Total number of bytes read into the buffer, or -1 if there is no more data (EOF).
* @throws IOException In case of any exception.
*/
public int read(long pos, byte[] buf, int off, int len) throws IOException;
How to determine these details before calling the read method?
I am quite new to these frameworks and maybe there exists a different way to obtain an InputStream based on an IgfsPath pointing to a file stored in a Hadoop file system?
I am trying to achieve what is described here: https://apacheignite-fs.readme.io/docs/secondary-file-system
Thanks in advance for any hint !