Behind the Mirrors
mod_zrkadlo, the openSUSE Download Redirector
<ftpadmin at suse.de>
Introduction of Myself
- Working on opensuse.org download infrastructure
- Build service commandline client ("osc")
- Have been with SUSE for 8 years now
- Maintained Apache, OpenSSL, DHCP
- PPC port to IBM iSeries
- Medical Degree :-)
What do we serve?
openSUSE 10.1, 10.2, 10.3 release, unstable snapshots,
3 architectures, sources, debuginfos, test trees,
drpmsync tree, and last, but not least, the Build Service repositories
- Oh, that's all there is to it!
- There is no big fat pipe like that...
- There is not even a mirror like that...
- quote of a mirror:
"That sounds onerous - a full ubuntu mirror (including ISO's) is 260GB,
debian without ISO's is 320GB"
Content Delivery Networks??
- Our (commercial) competition uses Content Delivery Networks (CDNs).
- Affordable for us? Maybe 1 day per year...
- Thus, not a solution.
Mirrors Come To Help!
- They mirror us for their own benefit (saves their bandwidth)
- Some make a business out of it, some want to help us
- They do mirror us, if we want or not.
- We can facilitate.
All Mirrors Are Equal?
- In the good old times, they were :)
- Today, mirrors differ from each other.
- It has become impractical for users to stick to a single, fixed mirror.
- With our content size, ...there are no complete mirrors
- Some may mirror the released distro tree... some may mirror the "update" tree
and (only) the latest distro tree (10.3)... some may mirror only the "build service repositories"...
- Some might not have caught up and still mirror only older trees
=> partial mirrors
Size Matters II
- Only a few crazy guys will mirror 800 GB.
- It is even hard to find somebody to mirror the "Build Service repositories" alone (200G)
- Not even the largest Sourceforge mirror...
=> partial mirrors
Speed Matters, Too
- Much of our content has extremely high turnover rate
- Changes faster than it can be rsync'ed
- So there are no up-to-date mirrors in reality
=> partial mirrors
Precision Matters Also
Dealing with openSUSE binary packages, ...
- Content (binary packages repository) is referenced in "repository metadata"
- Clients will access content through that metadata
- Everything is checksummed and is cryptographically signed
- => Client will trip over the tiniest mistakes
- So how to use mirrors, if they are incomplete and outdated?
- Hard to maintain
- Too static
- Can hardly ever be correct
- Low granularity. Would need lists for each of the file subtrees, at least...
Mirror lists for humans
- Let users figure out which mirror works?
- Works to some extent, for downloads of few files
- Trial and error... annoying, when mirrors are out of date,
incomplete, or unreliable.
- "Sourceforge effect" - people will all end up trying to use the same "most reliable" mirror
Highly dynamical mirror lists
We can't control our mirrors... but we can observe them!
Redirect to mirrors, but know which one.
- Keep inventory of mirrors contents
- Periodically probe mirrors for availability
- Then HTTP-redirect the clients
- It's lightweight, scalable, and works with every standard HTTP client
While we're at it, we can
- "Geolocate" the client through its IP address
- Load-balance requests based on the mirrors capabilities
- Fully control caching through HTTP headers, for files where this is critical
- Or deliver certain files ourselves
- Which is one reason why we only use HTTP, no FTP.
(FTP has no way of controlling caching. Content is (nearly) arbitrarily cached between server and clients.)
- Get it by scanning mirrors by rsync, FTP or HTTP
- Maintain the inventory in a database
- this is robust against sync lags
- because files disappear on the master first
- it doesn't matter if a deleted file is still in the inventory, because we won't serve it, nor will we redirect for it.
History of the implementation
- Idea and first implementations by Christoph Thiel (FOSDEM 2006)
- Didn't scale anymore
- Redesign from scratch, early in 2007
- Rewrite as Apache module in C: mod_zrkadlo
- Other role models for mod_zrkadlo: mod_offload (icculus.org), Bouncer (mozilla.org), sourceforge :-)
When I was travelling in Slovakia, I went to this concert:
Za Zrkadlom - Behind the mirror. And I needed a name for an Apache module ;-)
How It Works
The redirector proceeds like this:
- file can be redirected?
- canonicalize filename
- look up country and continent of client IP via GeoIP
- look up possible mirrors in SQL database
- previously used mirror is preferred (per client)
- then, mirrors from the same countries are considered. Next, from the same region
How It Works II
Link to pseudocode
Link to code
- choose a mirror by random
- ...influenced by a "score", defined for each mirror, which determines probability of its selection
- return HTTP status code 302 Found and a Location: header with the redirection URL
- or serve the file directly
HEAD request demonstrating the redirect:
% curl -sI 'http://go-oo.zrkadlo.org/go-oo/2.3.0-4/GoOo-langpack-en-GB-2.3.0-4.zip'
HTTP/1.0 302 Moved Temporarily
Date: Thu, 28 Feb 2008 15:33:41 GMT
Server: Apache/2.2.8 (Linux/SUSE) mod_zrkadlo/1.5
Content-Type: text/html; charset=iso-8859-1
- lots of ;-)
- redirects are cheap
- integrates transparently into the web services
- redirect only where it makes sense
- maximum control over how content is served
- highly scalable (openSUSE release day: load about 1, minimal memory footprint (100 MB))
- rock-solid: during the release peak I could enjoy my vacation :-)
- Chance to count downloads
- Support for integration with real CDN
- Can serve live mirrorlists instead of redirecting
- Makes small & partial mirrors useful
- in fact, we wouldn't need large mirrors, because that odd stuff we can serve ourselves...
- we need: lots of mirrors which mirror the most popular 10% of content
- if a "broken file" is discovered, it can immediately be fixed and delivered by the server itself, without waiting for mirror synchronisation first
- Completely open source. You can use it, too!
- Implementation is generic - not openSUSE specific. You can use it, too!
- Mirrors die without warning
- That's the thing with mirrors...
- Reliability is as good as the parts.
- So our motto is "monitor and react as quickly as possible, automatically"
- Time window between detecting failure of a mirror, and automatically disabling it (we use a 3 minute probe interval)
- Some failures very hard to detect (think sporadic firewall quirks. Had to deal with that two times.)
- Potential for the openSUSE download client: it could immediately fall back to other mirror by using live mirrorlists. Live mirrorlists are as cheap as redirects.
- One address to be accessed by all
- Availability is crucial -- nothing will work without
- Single point of failure?
- Depends on your setup
- It needs to be backed by load balancer and HA
- Single machine: single point of failure.
- No reason not to run multiple redirector nodes (and database) slaves off-site
Setup Of openSUSE.org
- ...historically has employed only a single piece of hardware (tight budget)
- ...needs to throw hardware at it
- ...but the (specialized) openSUSE download client could
use cached mirror URLs for fallback
openSUSE Zypp heroes -- please step on the giants shoulders :-)
Assumptions / observations about server downtimes:
- For human users, occasional downtimes are acceptable.
- For machines, downtimes tend to cause problems. Users can't or don't want to deal with them.
With the mirror database, there is a unique potential
for a specialized download client, like the openSUSE installer:
- Even if there's a single point of failure due to short budget,
- The client doesn't need to live with it
- Don't forget: even with highly available server, connectivity between client and server or mirrors could be broken
- But after all, there are mirrors available for fallback
- So far, no client-side support for this. Hopefully soon :-)
- low resource usage is crucial for scalability and headroom
- database connection pooling with the new Apache DBD framework
- a lookup is only a single SQL query
- using database engine with row based write locks, transactions and non-blocking optimizes, to keep db 100% responsive
- files are addressed as hashes (keep database small)
- directory indices are not redirected, so one can check what's on the master
- optimized caching headers, for good cacheability
- send "weak" mirrors only local requests (critical feature for them!)
- permit "fragile" mirrors in remote regions, if it is the only one
- respect special network topology of countries and their connectivity (e.g. New Zealand).
- circadian variation of selection probability for certain mirrors
- we run a "primary" mirror close to the master (widehat.opensuse.org)
- it also serves as public rsync server
Some Numbers From openSUSE
- openSUSE 10.3 release, October 2007
- Peak bandwidth served: 13 GB/s, i.e. 100 TB in a day.
(counting only the CDN we redirected to)
- ...plus what other mirrors served
- Normal traffic: 15,000,000 to 40,000,000 requests each day
How About Other Approaches?
- Web caches (squid): would work best, but requires people to set up squids ;)
- "real" CDN: wide area traffic management, which adds intelligence to standard DNS
- Coral CDN, uses standard DNS but is not transparent
- Static mirror lists: out of question :-)
- mod_offload: requires script on mirror, which makes it act as "active" cache, files are mirrored on demand. Practical if you control all mirrors
How About Other Approaches? II
- Bouncer: (Mozilla project) essentially similar approach, but different implementation (PHP script); (I think) more specialized to Mozilla software download structure
- Fedora MirrorManager / Yum: principally a similar approach but done differently ;) They evolved from static lists to pregenerated mirror lists. Works with less granularity (directory-wise). Client does have fallback capability.
- trees too big -- need better rsync modules => "opensuse-hotstuff"
- mirrors mirror what we offer... example: 14 mirrors were mirroring drpmsync tree once...
- infrastructure on thin ice (no redundancy). Plan: HA setup
- scanner does the bare minimum
- probe daemon does the bare minimum
- window between detection of failure and disabling
- undetected failures
- the old static mirror lists in the openSUSE wiki are not well maintained
- no colourful world maps ;)
- free GeoIP database doesn't allow finer localization (e.g. Germany)
- scanner needs to detect mirrors which are not large-file-capable
- no client support to select (or prefer) a local mirror
- conflicting syncs
- openSUSE wiki page about the mirror infrastructure is too much openSUSE lingo (target audience being public mirrors admins)
- metalink support conceivable
- provide means to mirror files based purely on popularity
- ad-hoc rsync modules?
- massive space-savings on mirrors conceivable
- hack rsyncd to directly update the database
- implement client feedback - could trigger reactive mirror probing
- implement a "degraded" mode (for fallback?) which works without files
- use finer geolocation of clients
- more important: stay scalable
- however, server can (with no cost) include geographical coordinates of mirrors, which a smart client could use
- client retrieves list and uses the fastest mirror, or many in parallel, or keeps the urls for later (fallback if redirector is gone fishing)
- external access for mirror admins, to disable hosts, change priority or trigger re-scan
- provide means to send mirrors traffic from their local clients (network prefix?)
- stickyness of (large) files to certain mirrors, to make better use of buffer caches?
- count traffic to keep track of caused bandwidth per mirror. Needs to account for range requests though and outliers (clients running wild)
This space intentially left blank
We love our mirrors!
Because they make us visible :-)
Any more questions?