I should create some code that allows me to download data from different sites (eg: WHO, Unicef, Eurostat, ...) and then transform this data into a format that I find useful, such as JSON
.
(If I understand correctly, it means to do web scraping, right?).
Data can be in different formats: html
, PDF
, xlsx
, tsv
, csv
, etc.
For example:
- [html] http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tscoveragebcg.html
- [PDF] http://www.salute.gov.it/portale/documentazione/p6_2_8_3_1.jsp?id=20
- [html] http://apps.who.int/flumart/Default?ReportNo=12
- [various formats] http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=nama_10r_2gdp&lang=en
I need to process them and transform them into a uniform format so that they can then be compared.
Obviously data collection can be done manually but I would prefer an automatic procedure that would do it for me.
I've never done such a thing and I don't know how to start.
For now, I have used only client-side Javascript
and I know little about server-side programming.
They advised me to use Node.js
, Express.js
and MongoDB
.
I have read that MEAN
exists: a JavaScript software for building dynamic web sites and web applications but I do not know how to use it.
I've never used Node.js
nor Express.js
nor MongoDB
.
I am very happy to learn but I need a help.
Can someone help me? I didn't find tutorials or guides that did my case.
Thanks!