I'm trying to read data in from Enron emails and then analyse it. Currently all the files are zipped and then in .pst format.
Is there anyway to read the .pst data directly into spark?
I'm currently going down the route of expanding the PST in Java using libPST, mapping to JSON and then loading the json into a dataframe.