Iozone Author Speaks Out… Hilariously

27 02 2009

When my experiment in benchmarking ZFS-Fuse yielded more data than I knew what to do with, I googled around a bit and found at least one other person in a similar position who contacted the author (Don Capps) of my benchmarking tool (Iozone) to get his take on the results. I figured it was worth a shot, and found Don to be extremely generous with his time and expertise. He distilled my numbers down into this graph, which made things a lot easier for me to grasp:

Distilled Worst-Case Stats for ZFS-FUSE vs. XFS

Distilled Worst-Case Stats for ZFS-FUSE vs. XFS

Basically, Don only looked at the results where the file size exceeded the system’s RAM size, since any transfer that fits in RAM isn’t going to tell you much about the underlying filesystem technology.
He also told me that if I was getting those kinds of speeds on commodity hardware, I should feel pretty good about my results.

So I posted a little about these results in a relevant mailing list, and someone else posted some numbers for a competing FUSE filesystem that were so much better that I had to ask Don for his opinion. Naturally, Don wanted details about the poster’s hardware and testing protocol, and suggested that we were very likely seeing pure cache effect in those numbers. Unfortunately, I’ve been unable to get any detailsβ€”just one among several reasons I’m not giving those numbers much weight. In any case, Don followed up with a few messages about what sort of setup one would need to reproduce those claimed speeds. Those followups are the point of this post. They’re reposted here with Don’s permission, and IMO they speak for themselves:

Since you are interested in the science, I thought
I would describe some ways to get 1.2 Gbytes/sec
off the platter. ( It can be done, not easily, but
if one has ~infinite resources…..)

Note: Assume all values below are ballpark and not any
specific hw.

Assuming ~40 Mbytes/sec/disk ( Typical modern disk drive )

Then to get to 1.2 Gbytes/sec == 1200 Mbytes/sec
1200/40 == 120/4 == 30 disk drives.

Now we need someway to connect 30 disks. Assuming
we can get 10 disks in a JBOD, we’ll nee 3 disk
enclosures… Well… Not exactly. We still need
to have an aggregate interconnect of 1200 Mbytes/sec.
Ok.. Fibre Channel (1 Gigabit FC) can do around 100 Mbytes/sec
so… 1200/100 = 12 fibre channel connections. That’s
a bit of a bummer as most PC’s don’t have 12 PCI slots.
So… We will need to go to 2 Gigabit fibre and use 6
slots,… Oh darn… Most PC’s doesn’t have 6 free
PCI slots, so we’ll probably need 3 dual ported 2 Gigabit
FC cards. Since each of these cards is going to be
sustaining 400 Mbytes/sec, it’s probably a good
idea to make these PCI express slots.
So far we now have:
30 disks
6 Disk enclosures with 5 disks in each.
3 Dual ported 2 Gigabit FC cards.

Now on to the next bottleneck… Be sure that one
starts with a motherboard that has a backplane that
can sustain 1200 Mbytes/sec.

Next up, integrity… I doubt that most folks are
going to want to be ripping through data a 1.2 Gbytes/sec
and not care about their data. So… Chances are good
that they’ll want some level of RAID. RAID 1 would be
a good choice for speed, but it does mean that we’ll
need 60 disks instead of 30. If we use RAID 5 then we
will still need more disks, but not as many more. The
bummer of RAID 5 is that it generally slows down the
writer. To make up for that issue, we’ll have to
either choose more disks (RAID1) or a smarter RAID
enclosure, that can do the RAID5 XOR ops independently
of the system CPU, and hopefully double buffered, and
with multiple XOR engines and data paths. All doable,
but it does increase the cost of the system.

So.. Here we are. We can do 1.2 Gbytes/sec, but it
is not going to be cheap or easily achieved. If we
ballpark this we get something like:

* 3 dual ported 2Gbit FC controllers with multiple RAID5
XOR engines… ~ $3,000
* 40 to 60 disks .. ~$4,000 to $6,000.
* 6 JBOD enclosures.. ~$6,000
* 6 FC cables… ~$600
* PC with nice MB for that 1200 Mbytes/sec
backplane.. ~$2,000
~$15,600 to ~$17,600

( And that could go up higher if one wanted dual path
HA type connectivity as it would push one to dual
ported enclosures and quad ported FC cards )

Don Capps

P.S. The above may get one to 1.2 Gbytes/sec for sequential
workloads, but it will not be nearly so speedy if
that workload were to shift towards a random I/O
access pattern… πŸ™‚

P.P.S. Once you have this beast built, then you can
start thinking about the environmental impact.
It’s pretty likely that these 6 disk enclosures,
60 disks, and the PC, are generating a fairly
significant quantity of heat, noise, and making
the electric meter spin at rate you have never
seen before, and can not afford to sustain πŸ™‚
BUT, it will be beautiful and a work, of both,
science and art πŸ™‚ … Make sure you install
plenty of blue LEDs, as the blinking lights with
this many disks is mesmerizing, and will satisfy your
wife that you have constructed something really
interesting and have not been simply wasting your
time… πŸ™‚

Don, this is hilarious and educational. Do you mind if I post it on my blog
(with attribution, of course)?


I don’t mind. But I did leave off a few other thoughts…

Environmental impact continued:

It’s fairly likely you’ll need to hire an electrician to
come out and put in a special circuit and rewire the
bedroom (where you have the storage system) as the current
draw is probably going to exceed the typical breaker used
for a bedroom. + $500

You also may need to call the air-conditioner folks and
upgrade that 3 ton handler to a 5 ton handler, as the
thermal load is pretty high and without addition cooling
capacity, your house may become a sauna. + $6,000
It may also be possible to construct the beast inside of
a water cooled chamber and put a heat exchanger outside
your house. A small cooling tower should do the trick
but you may wish to check with the homeowners association
before you install the external tower.

Make sure the room is very dark and those blinking blue
LEDs look their best, otherwise you may need to explain
to your wife why you spent $23,500 on this project, instead
of a new car, a fur coat, a diamond ring, or a European
vacation … Trust me, you really want those LEDs to be

Don Capps

P.S. If you would like I could send photos of my home
bedroom lab. Yep… My wife really liked the blinking
blue lights, but then again, she is a computer scientist
too πŸ™‚

I’m not going to post a picture of the inside of Don’s bedroom here, but I can tell you that while the rack is impressive, it’s not nearly as scary as I expected. My guess is he’s not trying to reproduce these claimed performance numbers πŸ˜‰




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: