1

I am dealing with GDELT data using R and the {GDELTtools} package.

When downloading the GDELT database using GetAllOfGDELT() or through a web browser, it appears that one file (20140319.export.CSV.zip) is missing. This causes GetAllOfGDELT() to fail and also creates problems for subsequent data analyses.

Questions: Is this a temporary issue? Has anyone else run into the same issue?

Here is the related code and output:

> # Download the entire GDELT database
> GetAllOfGDELT(local.folder = "./Data",
+               data.url.root = "http://data.gdeltproject.org/events/", 
+               force = FALSE)
The compressed GDELT data set is currently 12.3GB. It will take a long time to download and
requires a lot of room (12.3GB) where you store it. Please verify that you have sufficient free
space on the drive where you intend to store it.
Are you ready to proceed? (y/n) y
Downloading or verifying 1979.zip succeeded.
Downloading or verifying 1980.zip succeeded.
...
Downloading or verifying 20140317.export.CSV.zip succeeded.
Downloading or verifying 20140318.export.CSV.zip succeeded.
trying URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
Error in download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  : 
  cannot open URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
In addition: Warning message:
In download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  :
  cannot open: HTTP status was '404 Not Found'
>

Here is how the online "All GDELT Event Files" directory listing looks:

20140321.export.CSV.zip (9.9MB) (MD5: d492ca38db3c8f40b657b0eb2415f950)
20140320.export.CSV.zip (10.6MB) (MD5: 8602497fdc0f54861c056d33fb64f3b8)
20140318.export.CSV.zip (10.7MB) (MD5: cf0c2a30b09cdbc28204eb0eca53db1e)
20140317.export.CSV.zip (9.8MB) (MD5: 61e70e4ff79e590abddd6f26f8dfa552)

Source: http://data.gdeltproject.org/events/index.html

One partial workaround is provided below, but it only solves the problem of downloading the remaining post-2014/03/19 event files.

# Download the entire post-20140319 GDELT database
GetGDELT(start.date = "2014/03/20", 
         end.date = "2015/01/01", 
         local.folder = "./Data", 
         data.url.root = "http://data.gdeltproject.org/events/",
         verbose = TRUE)

Note: There are 0 results on Google for "20140319.export.CSV.zip", but useful results do appear for other files.

blong
  • 2,815
  • 8
  • 44
  • 110
Jonathan
  • 65
  • 1
  • 7
  • It seems likely they *ought* to have a file for that date; for example, in [this blog post](http://blog.gdeltproject.org/early-warning-for-epidemic-outbreaks-gdelt-offers-the-earliest-warning-of-ebola-outbreak/) they mention 3/19 data specifically (as part of the Ebola outbreak coverage, it occurring around that time). Unless they pulled it because of some quality issue? – Joe Feb 11 '15 at 03:35
  • Thanks, @Joe. That's some good evidence that the file at least exists. Let's hope it returns in a future update. – Jonathan Feb 12 '15 at 05:06
  • It has been nearly a month, and the file has not returned. Also, none of the supposed GDELT mirrors seem to exist anymore. Does anyone know of a working mirror that could solve the problem? Thanks! – Jonathan Mar 06 '15 at 10:11

0 Answers0