I have ~4gb of text file which I parse and save the data in a db. This process almost take 3-4hr(5-6 million lines) to process and save data in db. And this is a everyday process.
Now when I query the db its taking too much time to compute result and return. Like if I do a simple avg, sum operation for a particular day its taking 30-40mins.
I am using python, mysql right now. Tried Spark also to do this computation which also taking 30-40 min and now data is increasing so file size will increase and it will be like 10gb, which spark is not able to handle large files.
Please suggest how can I improve this time of parsing, storing in db, and fetching time.