0

I should create some code that allows me to download data from different sites (eg: WHO, Unicef, Eurostat, ...) and then transform this data into a format that I find useful, such as JSON. (If I understand correctly, it means to do web scraping, right?).

Data can be in different formats: html, PDF, xlsx, tsv, csv, etc. For example:

I need to process them and transform them into a uniform format so that they can then be compared.

Obviously data collection can be done manually but I would prefer an automatic procedure that would do it for me.

I've never done such a thing and I don't know how to start.

For now, I have used only client-side Javascript and I know little about server-side programming. They advised me to use Node.js, Express.js and MongoDB. I have read that MEAN exists: a JavaScript software for building dynamic web sites and web applications but I do not know how to use it.

I've never used Node.js nor Express.js nor MongoDB. I am very happy to learn but I need a help.

Can someone help me? I didn't find tutorials or guides that did my case.

Thanks!

1 Answers1

0

You just need something that talks with URL. You can do it with Node.js or any other framework, which provides the functionality to talk with URL. After that you can write a parser to scrub the data according to your needs.

Here is the link to the page which describes how to do it in Node.

In Node.js / Express, how do I "download" a page and gets its HTML?