0

I run a php script that uses the wikipedia api to locate wikipedia pages about certain movies, based on a long list with titles and year of release. This takes 1-2 seconds per query on average, and I do about 5 queries per minute. This has been working well for years. But since february 11 it suddenly became very slow : 30 seconds per query seems the norm now.

This is a example from a random movie in my list, and the link my script loads with file_get_contents();

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=yaml&rvsection=0&titles=Deceiver_(film)

I can put this link in my browser directly and it takes no more than a few seconds to load and open it. So I don't think the wikipedia api servers have suddenly become slow. When I load the link to my php script from my webserver in my browser, it takes between 20 to 40 seconds before the page is loaded and the result from one query is shown. When I run the php script from the command line of my webserver, I have the same slow loading times. My script still manage to save some results to the database now and then, so I'm probably not blocked either.

Other parts of my php scripts have not slowed down. There is a whole bunch of calculations done with the results of the wikipedia api, and all that is still working at a regular speed. So my webserver is still healthy. The load is always pretty low, and I'm not even using this server for something else. I have restarted apache, but found no difference in loading times.

My questions :

  • has something changed in the wikipedia api system recently ? Perhaps my way of using it is outdated and I need to use something new ?

  • where could I look for the cause of this slow loading ? I've dug through error files and tested what I could test, but I don't even know where something goes wrong. If I knew what to look for, I might perhaps easily fix it.

Nemo
  • 2,441
  • 2
  • 29
  • 63
Deceti
  • 51
  • 3
  • 1
    or perhaps you're being throttled? – Marc B Feb 17 '13 at 23:38
  • try the php issued request on a different host. that may shed light on network issues. also try using curl, and use curl_getinfo() to get low level debug info. – goat Feb 17 '13 at 23:39
  • I'm now testing with curl_getinfo() on my current host (#1) and a different host (#2) . Host#1 has 5.2 seconds for one query through curl wich is slow, host#2 has 0.2 seconds wich seems normal. The values 'connect_time' and 'pretransfer_time' of host#1 are about 5 seconds higher than those on host#2. 'speed_download' of #2 is about 20 times the value of that on #1 . Everything else (like 'namelookup_time') is about the same. I'm still trying to make sense of it, but it's something. – Deceti Feb 18 '13 at 00:41
  • I'm also trying out different sites to load through curl, and most of the time both hosts load same sites at very similar speeds. Sometimes that 5 second difference comes up again though. So far I'm not seeing a pattern. Both of those servers I'm testing on are located in the same datacenter. – Deceti Feb 18 '13 at 01:39

1 Answers1

0

git log --until=2013-02-11 --merges in operations/puppet.git shows no changes on that day. Sounds like a DNS issue or similar on your end.

Nemo
  • 2,441
  • 2
  • 29
  • 63