I have a web scraper which scrape data from an e-commerce site and right now, my data gets stored in BigQuery tables from pandas dataframes. But I am doing all these things manually. For example, starting the VM instance from the GCP site, then connecting my local machine with a remote SSH, then opening the terminal on the project folder, and running
$ python main.py
to start the scraping. And then after the process is completed, I turn off the VM instance manually again. Now, what I want is to automate this task, which will automatically start the VM instance on the first date of every month, and then scrape the e-commerce site data, and then when the program will be completed, it will automatically turn off the VM instance.
My program takes almost 40 hours to complete getting all data from the e-commerce site. I was looking for Cloud Functions where I have seen the maximum time limit is 540
seconds . As my program takes so much time to get executed, I am not sure whether Cloud functions will work for my case or not.
Is there any solution to automate these processes? I am very new on GCP, I am sorry if it's a very trivial problem to ask for a solution.