I want to create a spark dataframe by reading some text files. However, the text files have some weird formatting. This is one example of the text file:
These are the problems I am facing:
In the first few lines, there are some headers which consists of 3 lines (e.g. the Student Identification Number takes 3 lines)
Each student's data consists of 2 lines, where the Code and Transfer columns are not in the same line.
There is another few lines of headers (after No. 00000025) which should be omitted from the middle
Expecting to read the text file in a spark dataframe like this: