-2

I am working on a project which involves working with a large amount of data. Essentially, there exists a large repository on some website of excel files that can be downloaded. The site has several different lists of filters and I have several different parameters I am filtering and then collecting data from. Overall, this process requires me to download upwards of 1,000+ excel files and copy and paste them together.

Does Python have the functionality to automate this process? Essentially what I am doing is setting Filter 1 = A, Filter 2 = B, Filter 3 = C, download file, and then repeat with different parameters and copy and paste files together. If Python is suitable for this, can anyone point me in the direction of a good tutorial or starting point? If not, what language would be more suitable for this for someone with little background?

Thanks!

niccalis
  • 134
  • 1
  • 7

1 Answers1

1

Personally I would prefer to use python for this. I would look in particular at the Pandas library that is a powerful data analysis library that has a dataframe object that can be used like a headless Spreadsheet. I use it for a small number of spreadsheets and it's been very quick. Perhaps take a look at this person's website for more guidance. https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/

I'm not 100% if your question was only about spreadsheets and my first paragraph was really about working on the files once you have downloaded them, but if you're interested in actually fetching the files or 'scraping' the data you can look at the Requests library for the http side of things - this might be what you could use if there is Restful way of doing things. Or, look at scrapy https://scrapy.org for web scraping. Sorry if I misunderstood in parts.

johnr
  • 46
  • 4