I wonder if I should publish a set of tools and patches that make easier to write close-to-zero-downtime-without-users-noticing-that-half-servers-are-gone applications. Guess I'll put in a bit of effort to make it better suited for public release, like translating all the documentation comments from Russian to English :)
The basic idea for the tools is to provide a side-attached layer that gracefully handles failure and retries the operation if the tool decides that it still might succeed. While the idea is easy, it really works for something as crude as pulling the plug for half of the servers with users noticing only a slight (several seconds max) delay with their web pages load times for a few seconds.
No comments:
Post a Comment