4

Does Google's App Engine have excessive downtime, specifically with regards to datastore writes?

Additionally, downtime seems to be scheduled during high traffic times, e.g., in the middle of the afternoon vs. 3:00AM in the morning. Is this normal? Will it improve as the technology matures?

David Underhill
  • 15,896
  • 7
  • 53
  • 61
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
  • 3
    It's always the middle of the afternoon somewhere. Even when it's 3am to you. – DOK Jul 20 '10 at 18:08
  • @DOK - True. My counter argument would be that Google is based in America... and also that America is better than everyone else j/k. – Chris Dutrow Jul 20 '10 at 18:23
  • 1
    a slightly more objective way to look at it would be to measure the total number of hits by hour. I suspect that midafternoon in the US is not the lowest activity level for appengine "users" in aggregate, but it would be an interesting stat to see. – Peter Recore Jul 21 '10 at 00:24
  • 1
    +1 I think the question of when downtime occurs, and how app engine downtime is trending will be of interest to more than a few GAE developers. – David Underhill Jul 21 '10 at 01:12

1 Answers1

10

Short Answers

  1. Afternoon vs early morning downtime. The datastore has been unavailable for writes about 20-30% more often in the afternoon than in the wee hours of the morning (Pacific time; includes put, update, and delete availability).

    Note: I'm sure Google would like downtime to occur during off-peak hours. Thus I expect they'll continue to try to minimize downtime, or schedule it for off-peak hours whenever possible.

  2. Downtime trending. The number of 15-minute periods during which the datastore has been unavailable has been decreasing. In the past 366 days, there were an average of 3.8 15-minute periods in which the datastore was unavailable per day. In the past 200 days, this has decreased by 60% to 2.3 per day. Write downtime over the past few months has actually been quite good - since March 1st, there have been less than 0.25 15-minute chunks of write downtime per day. Here's a graph of datastore write downtime: Downtime trending http://imagebin.ca/img/4wkHVQPc.png


Source of Answers

To answer your question, I wrote this script which extracts downtime data from GAE's Datastore Status page.


Graphs

Datastore write downtime from 2009-Jul-20 to 2010-Jul-20 (4 hour bins):

alt text http://imagebin.ca/img/p9ScWTm.png

Datastore write downtime from 2009-Jul-20 to 2010-Jul-20 (1 hour bins):

alt text http://imagebin.ca/img/9FbLut2G.png

Datastore downtime from 2009-Jul-20 to 2010-Jul-20 (4 hour bins):

alt text http://imagebin.ca/img/t3XKLk.png

Datastore downtime from 2010-Jan-01 to 2010-Jul-20 (4 hour bins):

alt text http://imagebin.ca/img/k36T9h.png


Raw data

(you can tweak the variables at the top of the script if you'd like to collect your own data with slightly different parameters):

# RAW Data: Each element counts the number of days in which the datastore
# was unavailable for at least some portion of a given 15-minute window. The
# first element corresponds to the time chunk from 00:00 to 00:15, and so on.
RESULTS_SINCE_2010JAN01_BIN15 = [0, 0, 0, 0, 3, 11, 3, 3, 3, 3, 12, 3, 3, 3, 4, 14, 4, 4, 4, 4, 12, 2, 2, 2, 2, 14, 4, 4, 4, 4, 11, 2, 2, 2, 2, 11, 5, 5, 5, 5, 13, 4, 4, 4, 4, 14, 7, 5, 5, 5, 14, 4, 3, 3, 3, 13, 2, 2, 2, 2, 12, 5, 4, 4, 4, 14, 5, 3, 3, 3, 12, 7, 2, 2, 2, 5, 5, 0, 0, 0, 2, 9, 3, 2, 2, 2, 10, 1, 1, 1, 2, 9, 3, 3, 3, 15]
RESULTS_SINCE_2009JUL20_BIN15 = [0, 0, 0, 0, 11, 21, 5, 5, 5, 5, 29, 6, 6, 6, 7, 38, 11, 11, 11, 11, 37, 7, 7, 7, 7, 44, 12, 12, 12, 12, 37, 10, 10, 10, 10, 34, 7, 7, 7, 7, 46, 11, 11, 11, 11, 39, 15, 13, 13, 13, 44, 13, 12, 12, 12, 44, 5, 5, 5, 5, 34, 11, 10, 10, 10, 40, 13, 11, 11, 11, 31, 21, 12, 12, 11, 19, 21, 4, 4, 4, 13, 28, 10, 9, 9, 16, 36, 10, 10, 10, 12, 32, 7, 7, 6, 35]
RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN15 = [0, 0, 0, 0, 4, 12, 4, 4, 4, 4, 22, 6, 6, 6, 7, 27, 7, 7, 7, 7, 21, 6, 6, 6, 6, 32, 9, 9, 9, 9, 26, 8, 8, 8, 8, 27, 7, 7, 7, 7, 30, 7, 7, 7, 7, 27, 10, 8, 8, 8, 28, 10, 9, 9, 9, 28, 4, 4, 4, 4, 21, 4, 4, 4, 4, 25, 6, 4, 4, 4, 18, 14, 9, 10, 9, 16, 17, 2, 2, 2, 8, 18, 7, 6, 6, 9, 19, 5, 5, 5, 6, 18, 5, 5, 4, 21]

# RESULTS DISTILLED FROM COLLECTED_RESULTS
RESULTS_SINCE_2010JAN01_BIN60 = [RESULTS_SINCE_2010JAN01_BIN15[i*4]+RESULTS_SINCE_2010JAN01_BIN15[i*4+1]+RESULTS_SINCE_2010JAN01_BIN15[i*4+2]+RESULTS_SINCE_2010JAN01_BIN15[i*4+3] for i in xrange(24)]
RESULTS_SINCE_2010JAN01_BIN240 = [RESULTS_SINCE_2010JAN01_BIN60[i*4]+RESULTS_SINCE_2010JAN01_BIN60[i*4+1]+RESULTS_SINCE_2010JAN01_BIN60[i*4+2]+RESULTS_SINCE_2010JAN01_BIN60[i*4+3] for i in xrange(6)]
RESULTS_SINCE_2010JAN01_BIN480 = [RESULTS_SINCE_2010JAN01_BIN60[i*2]+RESULTS_SINCE_2010JAN01_BIN60[i*2+1] for i in xrange(3)]
RESULTS_SINCE_2009JUL20_BIN60 = [RESULTS_SINCE_2009JUL20_BIN15[i*4]+RESULTS_SINCE_2009JUL20_BIN15[i*4+1]+RESULTS_SINCE_2009JUL20_BIN15[i*4+2]+RESULTS_SINCE_2009JUL20_BIN15[i*4+3] for i in xrange(24)]
RESULTS_SINCE_2009JUL20_BIN240 = [RESULTS_SINCE_2009JUL20_BIN60[i*4]+RESULTS_SINCE_2009JUL20_BIN60[i*4+1]+RESULTS_SINCE_2009JUL20_BIN60[i*4+2]+RESULTS_SINCE_2009JUL20_BIN60[i*4+3] for i in xrange(6)]
RESULTS_SINCE_2009JUL20_BIN480 = [RESULTS_SINCE_2009JUL20_BIN240[i*2]+RESULTS_SINCE_2009JUL20_BIN240[i*2+1] for i in xrange(3)]
RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN60 = [RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN15[i*4]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN15[i*4+1]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN15[i*4+2]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN15[i*4+3] for i in xrange(24)]
RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN240 = [RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN60[i*4]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN60[i*4+1]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN60[i*4+2]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN60[i*4+3] for i in xrange(6)]
RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN480 = [RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN240[i*2]+RESULTS_WRITE_DOWNTIME_SINCE_2009JUL20_BIN240[i*2+1] for i in xrange(3)]
David Underhill
  • 15,896
  • 7
  • 53
  • 61