2011-11-17

Homegrown replication

Recently, I was working on improving our homegrown file replication system that we use to allow redundant image storage. Currently it serves tens of millions of files that occupy about 4TiB of storage, about 8-10 gigabytes are added per day. In reality, it's not a "replication system" by itself, it's just a system that delays writes to unavailable targets until they become available. It turned out to be very efficient and resilient even to the "we just lost two disks" cases, without any centralized authority.
We considered both custom filesystems for both Linux and Windows, but all of them required some kind of a central management server or servers, which we rather not have. Also, we considered storing files in MongoDB GridFS, but a simple session with calculator told us that replacing a node (taking down, adding another one, syncing before oplog gets exhausted) for such volumes and large items would be prohibitive, copying a virtual disk image is much simpler and faster than doing it though the database layer, so while the idea of specialized file storage in database is very appealing, MongoDB GridFS deployment for terabyte-scale files with intensive write load requires considerable amount of preplanning that defeats the main (well, for me) feature of MongoDB -- simplicity.

No comments:

Post a Comment