0

I'm a moderate web developer. I haven't managed any high traffic websites. Generally, I observe that only high traffic websites are down for maintenance. stackoverflow.com will also go down for maintenance.

I always wonder. What kind of maintenance do they do? I mean, the process is automated.

user request --> web server --> server side programs --- > Database server.

What is there to maintain?

masegaloeh
  • 18,236
  • 10
  • 57
  • 106
claws
  • 232
  • 3
  • 10
  • There's always something... – Justin Ethier Aug 09 '10 at 18:31
  • Data model change, update/correct data based on those changes... – OMG Ponies Aug 09 '10 at 18:32
  • 11
    Yes, fully automated. Sysadmins don't really do anything. We just collect salaries. – John Gardeniers Aug 09 '10 at 18:45
  • 3
    Monkeys need food/water and sometimes sleep, hamster-wheels need oiling, etc. – jscott Aug 09 '10 at 18:46
  • @John Gardeniers: Lol!! You know thats not what I meant. I'm just a student. I'm sorry! I don't mean any offense bu you are not completely wrong. I don't have great opinion about Sysadmins (may be because sysadmins of our computer labs who do *nothing*.) I've feel sysadmin is so less that last week I was called in for interview at Google for sysadmin but I rejected the Interview. I again repeat I don't mean to offend any of you. I guess, I need to spend more time on serverfault.com and learn what exactly sysadmins do. If you have anything that helps me feel free to suggest. – claws Aug 09 '10 at 20:57
  • @John Gardeniers: I think I'm not the only one with this feeling. http://serverfault.com/questions/4176/what-sysadmin-things-should-every-programmer-know – claws Aug 09 '10 at 21:14
  • 2
    @claws, every job looks easier when someone else is doing it but consider that while very few programmers are sysadmins, many sysadmins are also programmers, so we see both sides and generally show more respect. As for that automation, remember it's an admin that designed and implemented it. – John Gardeniers Aug 09 '10 at 23:05
  • @John Gardeniers : I'm sorry I didn't mean any disrespect. I said, I want to know more about sysadmins work. – claws Aug 10 '10 at 03:02

7 Answers7

6

Usually the highest traffic sites don't go down for maintenance. They're designed so they don't have to. (Depending on the site, that can be very tricky. It's not just a case of running multiple servers, although obviously that's the starting point.)

However, usually "Site down for maintenance" means any of:

  • Web application software upgrade (adding new features etc)
  • Hardware change (e.g. moving to a different data centre; during the switchover)
  • Something's gone terribly wrong and they're trying to fix it (e.g. there's been a power outage at the data centre; change the DNS entry to point to a static "site is down" page elsewhere until the power comes back)
Jon Skeet
  • 4,897
  • 1
  • 25
  • 17
  • 1
    In that case, why do they regularly take it down for maintenance? Software/Hardware changes are not regular. – claws Aug 09 '10 at 18:34
  • I've seen many times stackoverflow.com is down. Everytime is it one of the reasons you mentioned? Its hard to believe. – claws Aug 09 '10 at 18:36
  • @claws: Software maintenance certainly *is* regular in many cases. The Stack Overflow team regularly rolls out new features and bug fixes, for example. If you have a specific case in mind, please tell us - the "they" in your comment is ambiguous. – Jon Skeet Aug 09 '10 at 18:36
  • https://www.onlinesbi.com/ this is an Indian online banking site. every night its down for maintenance. like now! – claws Aug 09 '10 at 18:38
  • 1
    With financial sites, there's often a lot of data to move, and at end-of-day, they want an atomic change (plus things need time to be calculated and updated) – Matt Simmons Aug 09 '10 at 18:46
  • 2
    Claws - My own bank is a small US-based credit union - their site is down every night for a few hours. It's a small bank, they don't have a big budget, so maybe they took everything down so that they can do offline backups, or do a big data load. Who knows? There's a lot of reasons. – mfinni Aug 09 '10 at 18:48
  • 1
    If they need to take offline backups tehy are IDIOTS. Every database systm orth using supportrs transactionally consistent online backups - for the last 20 years or so. – TomTom Dec 06 '11 at 05:29
  • 2
    @TomTom For some banks, "the last 20 years or so" in our world are their distant future. – ceejayoz May 27 '15 at 01:35
3

They may want to run updates (or fixes) on many of the different pieces of software running on the server, including (but not limited to):

  • The operating system
  • The webserver software iteself
  • Any scripting frameworks
  • Databases
  • Etc

Beyond that, they could also be doing hardware maintenance, such as adding a new hard drive, upgrading a motherboard, putting in faster RAM, or swapping out network cards. There's plenty of things, both hardware and software, that can be upgraded or modified, really.

Now if they have a backup server (or a cluster or something of the sort), this can be transparent, but if it's literally one box serving the pages...well, it pretty much has to go down.

eldarerathis
  • 131
  • 5
2

Since you're coming from a coding background, I'll base my analogy there. Imagine that being a sysadmin is just like programming, except you'll be called on to code in a different language every couple of hours. And sometimes it's Pascal.

Truly, though it could mean anything. Sometimes a mouse chews its way into a warm place. Or a single point of failure makes itself known. Eliminating downtime is what we pursue ... like writing code that works perfectly on the first compile.

Kara Marfia
  • 7,892
  • 5
  • 33
  • 57
2

Liken a single server to a running vehicle. If you turn off the vehicle, your 'server' is down.

There are some things you can do while the car is running - add fuel, oil, washer fluid, clean the windshield, change gears, etc.

However, you can't replace the fuel line in the car while it's running - liken fuel to data; you don't want to lose any, or you'll have unhappy customers.

These downtimes vary based on the level of administrator skill and the complexity of changes. On larger, high traffic sites - the only way this could feasibly happen is if there's a major architecture change; something that, no matter how many servers and redundancies you have, the architecture needs to change all at once.

This is rare for very large systems - I liken it to replacing the fuel line on a running vehicle: for many, it's not feasible to do (or worth the effort/risk) at certain skill and resource levels. However, for places that have the skills and resources, they can perform a fuel line replacement on a running vehicle. Liken that to architecture migration; they do it a lot more complex.

thinice
  • 4,716
  • 21
  • 38
1

Could be upgrade of servers, frameworks, databases Moving to a new datacenter and shutting the old servers own so that nobody can connect Patching of operating systems or software that runs on those servers

basically anything that could make the site unavailable for a certain amount of time

SQLMenace
  • 391
  • 4
  • 4
0

Regular maintenance items would be things like rebuilding caches, upgrading software and/or templates, doing some data trawling for statistics, various routine maintenance tasks like backups, (which work better on quiet systems) and a variety of other expensive, infrequent tasks.

Some tasks just require pouring over a lot of data, and it's not really efficient to do after each change. Recommendation databases are one thing that comes to mind, as you don't need up to the second data, and it's rather expensive to calculate common purchase patterns across many different users. This is an N^2 complexity problem with some algorithms, and tends to take both a lot of data trawling, and lots of memory.

Financial institutions may use the down time to calculate and make interest payments to accounts, or close outstanding transactions and calculate reconciliation balances. This data in theory should never change after reconciliation, so it makes sense to write it to WORM storage at this point.

Backups are a major item that's often done during downtime because high Disk I/O tends to bring even very powerful servers to their knees, and taking the site offline can help speed the backup process. I remember one organization I was at, where they had a very large customer RAID array, and the backup team kept complaining because their backup window for this one customer typically extended 22-24, and at one point 26 hours. A small amount of quiet time can decrease that window substantially.

-6

Defrag the disk arrays. Its faster and safer to defrag servers when they are offline, allowing the CPU and disks to focus on that task rather than running 1000 websites. Its better to tell people to come back later, than to give them a poor user experience.

If its a windows server, you can crash it by running defrag while memory usage is over 50%. This is because at this point windows starts to rev up the page file. I learned this the hard way.

  • 8
    I have no idea what you're talking about... I don't know anyone who defrags arrays or SANs, or even DAS or *anything* these days. Additionally, I've also never heard of crashing because of page file related activity. I suspect your server might have had bigger issues... – Mark Henderson Dec 06 '11 at 05:16
  • What Mark said -- I think you're running on some very outdated (hell, positively **ANCIENT**) information there... – voretaq7 Dec 06 '11 at 21:49