i have pst or email files in hdfs. now, i want to do text analysis by whichever component available in hadoop which suits the best. how do i start with.
Do I have to first extract the actual content out of these files and store it somewhere (in a text file for example) and then run the analysis on the text file?
please suggest me.
p.s: i came across this while i began to search in google. is this only option left or any other solution available.