In my MongoDB database I have a collection of the following documents:
As one can see, each document has some nested documents (Decade, Title, Plot, Genres, etc.). Those are kinda Map representations of SparseVectors that I came up with. And are actually generated with my other Spark job.
As it appeared, these kinds of documents can't be easily read into the Spark DataFrame.
I was thinking how I can actually read such Documents into a Dataframe where each subdocument will be represented not by a bson Document but by a simple Map[String, Double]. Because each of those subdocuments is absolutely arbitrary and contains arbitrary number of numeric fields.
Is there a way to deal with such documents?