I am building an application that scans pdf files and extract data from them.
I have already built an application that does batch processing using spark core but now I want the data to be continuously streamed from the directory.
How can I use spark streaming filestream method to read pdf files from a directory?
And should this directory be an hdfs directory ?
thanks in advance.