Sunday, December 6, 2015

ASP.NET browser capabilities caching gotcha

If you're doing ASP.NET (pre-vNext) development and use browser capabilities checking (e.g. Request.Browser), add the following to your config file:

<browserCaps userAgentCacheKeyLength="256" />

You'll thank me later.
In short, ASP.NET caches capabilities based on first N characters of a user agent string, where N is 64 by default. After a "mobile" Google bot visits your site (with user agent of "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +") your iPhone users will be considered as crawlers, which is probably not what you want.

Thursday, October 30, 2014

mod_gridfs v0.4: tag sets

Short news: mod_gridfs is now v0.4, supporing read preference tag sets along with having some performance and robustness improvements. Addition of tag sets allows many interesting things, for example fair 1-to-1 load balancing between shard replicas.

Monday, November 11, 2013

Backups to Amazon S3 -- simple and efficient

We use Amazon S3 as a part of our backup strategy -- all of our backup servers in datacenter replicate local backup images to S3 daily. While we have at least six physical copies of each backup (3 copies, each on different machine, all backup disks are in RAID1), having an offsite backup for DR even if we'll go multi-datacenter and will replicate data in semi-realtime is very important.
During the lifetime of this installation we used different approaches. The first one was using s3cmd -- while very functional and reliable it was very slow, because there was no real way to determine what changed between local and remote "filesystem" and just copying 100s of gigabytes per host per day was very slow. We thought that something like rsync would be much better, so we moved to s3fs+rsync. Unfortunately, it was very unstable and either required a second copy of files (to cache remote attributes) or was prone to downloading parts of files to compare them with originals, to determine if the file itself should be copied. We also evaluated duplicity (it consumed large amounts of temporary space) and a couple of commercial solutions, but none of them were good enough for us.
So, I decided to write a simple utility that would do this kind of a sync -- s3backup.
  • easy to use -- configure AWS credentials, specify source and destination, put it into your crontab and you're set
  • resource-efficient -- never downloads remote files, compares file sizes (good if your backups are named differently for each day) and/or MD5 checksums (good for other cases, consumes a bit more CPU), does not attempt to read entire file in RAM, etc.
  • works with large files -- currently, the upper limit is about 500GB (10,000 chunks, 50MB each), this can easily be increased up to 5TB if you don't need MD5 checksum comparison
  • supports recycling -- locally-removed files are removed from S3 only after they reach a specified age
You're welcome to try it out and use it. Feedback (especially about file sizes you manage and RAM constraints you have on backup machines) is very welcome also.

.NET ResourceManager loading incorrect resources

Recently, we found that some of our backends were serving pages in the wrong language. A simple application domain recycling helped, but the problem manifested itself several days later after another publish. I've spent about 4 hours investigating, and found out, that we're not the only people with the same problem. It appears that there is a bug in .NET's ResourceManager class that leads to wrong resources being loaded. There is some kind of a race condition that occurs when multiple threads try to create access the same underlying resource file. Thankfully, it's was easy enough to reproduce, but it won't be easy to fix (externally), so I reported it and hoping for a fix from Microsoft. While the temporary workaround seems obvious (just access resources under some kind of a lock), the most common case of ResourceManager class usage (autogenerated *.Designer.cs files which are created when you add resources file) would be very hard to patch by hand. Maybe custom "ThreadSafeResXFileCodeGenerator" would do, if Microsoft won't come up with a quick solution.

Sunday, February 17, 2013

Linux guests on Hyper-V: I/O and CPU utilization caveats

Probably this is known to experienced Hyper-V admins, but I've recently found out, that high "system" CPU utilization in Linux guest (Ubuntu 12.04.2 x64) on a Hyper-V Server 2012 might actually be an indicator of I/O capacity being exhausted.

We are running PostgreSQL on one of the VMs and it had per-core 5-minute average load greater than 5 (actually 7-10). When I checked monitoring, I was surprised that the CPU breakdown was as follows: ~20% softirq (expected since this VM has about 100Mbps in/300Mbps out and is not the only VM on the host system, while NIC's on this machine don't support SR-IOV), 30-40% user (expected, since we don't do sequential scans), ~4% iowait (unexpected since the working set of the database does not completely fit in RAM) and 30-40% system (completely unexpected and unexplainable).

I tried numerous changes including adjusting PostgreSQL settings, adding/removing memory and cores, turning off NUMA both at the guest and the host level, trying a newer OS kernel... until I decided to stop and think it over.

When the monitoring system told us that CPU load is too high, it also mentioned that time spend doing disk I/O on one of the partitions was high too (about 85-90% disk time). I disregarded this warning at first, since iowait was low, but after monitoring PostgreSQL per-table statistics using a simple tool built around the query alike to 'select relid, relname, heap_blks_read + idx_blks_read + coalesce(toast_blks_read, 0) + coalesce(tidx_blks_read, 0) from pg_statio_user_tables' and shuffling around tables to balance reads between different partitions that are mapped to different disks, the system CPU utilization dropped by half.

This is different from running on real hardware, where iowait CPU utilization was going much higher when disk performance capacity was about to be exhausted, while system CPU utilization stayed mostly the same. I believe, that this has something to do not only with virtualization, but also with I/O scheduler used (we're using deadline on hardware and noop on VMs).

P.S. Microsoft's latest hypervisor is actually better at running Linux than many Linux ones and is very good at running Windows guests (no surprises here) -- we evaluated XenServer and KVM, both of them not only had problems with Windows stability and both network and I/O performance, but also with Linux kernels newer than 2.6.x.

Wednesday, December 19, 2012

mod_gridfs v0.3

Well, it was long time since my last post, but at least now I have something to tell about. mod_gridfs, an Apache 2.2+ module that serves files from MongoDB GridFS is now v0.3, supporting authentication, custom prefixes, read preference (currently only mode string is supported, tags will be added later), improved memory footprint and error logging.

UPDATE: Custom collection names are also supported now.

Friday, April 13, 2012

mod_gridfs performance

In my previous post, I announced mod_gridfs. Now, it's time for some numbers. Serving a 3KiB file over a gigabit network on modern hardware, 100 concurrent requests, MongoDB replica set of 3 machines as a backend:
  • NGINX + nginx-gridfs: 1.3krps
  • Apache + mod_gridfs: 6.6krps
  • Apache + mod_gridfs with SlaveOk and one slave: 12.2krps
Not testing with larger files, because this way I'll be benchmarkng OS I/O performance instead of user-mode code.