Aim : Build a small ETL framework to take a Huge CSV and dump it into RDB(say MySQL).
The current approach we are thinking about is to load csv using spark into a dataframe and persist it and later use frameworks like apache scoop and and load it into mySQL.
Need recommendations on which format to persist and on the approach itself.
Edit: CSV will have around 50 million rows with 50-100 columns. Since our tasks involves lots of transformations before dumping into RDB, we thought using spark was a good idea.