I have a stream of Strings in java. That is coming from a csv file on some other machine. I am creating an InputStream and reading csv file line by line from BufferedReader in java as follows.
//call a method that returns inputStream
InputStream stream = getInputStreamOfFile();
BufferedReader lineStream = new BufferedReader(new InputStreamReader(stream));
while ((inputLine = lineStream.readLine()) != null) {
System.out.println("******************new Line***********");
System.out.println(inputLine);
}
lineStream.close();
stream.close();
Now, I want to create a spark RDD or DataFrame from this.
one solution is, I keep creating new RDD at each line and maintain globle RDD and continue doing union of RDDs. Is there any other solution ?
Note : this file is not on the same machine. It is coming from some remote storage. I do have the HTTP URL of the file.