TPC-C - The Benchmarketing Hall of Fame/Shame
Benchmarking and vendor benchmarking specifically is a dirty word in IT - only the naive take the benchmarks at face value and more wise will know that if you want to know the performance levels of the proposed configuration, you have to do it yourself exactly with the configuration you're planning to put in production. Otherwise you'll be committing yourself to a purchase strictly based on the ability of vendor X to muscle and beat a certain benchmark into submission by most likely throwing unrealistic amounts of hardware at the problem and doing some very questionable tuning. I do not want to pick on IBM as if it is the only vendor doing this sort of activity as many companies engage in "creative benchmarking", but IBM in my opinion has perfected this art. IBM always treated TPC-C benchmark as its oyster almost perfecting the art of squeezing the good benchmark results, not so much curtesy of the stellar performance of IBM products, but rather deep pocketed ability to throw ungodly amounts of resources to get the top benchmark. Now, TPC-C in itself has long been questioned as a good benchmark, with many experts stating that does not really reflect the real world workload (even IBM nodded in agreement) and most smarter observers are now paying more attention to TPC-H instead as being more realistic, but that is besides the point. I recently had a good laugh looking the #1 TPC-C benchmark result not surprisingly being held by IBM p595:
http://www.tpc.org/tpcc/results/tpcc_perf_results.asp
On the surface it is all looking nice with IBM scoring amazing 6,085,166 tpmC while costing very respectable 2.81 USD/tpmC, but if you even scratch the surface by looking at the disclosure reports revealing the internals of this setup, you are set to be amazed by the sheer absurdity of this configuration:
http://www.tpc.org/results/individual_results/IBM/IBM_595_20080610_ES.pdf
http://www.tpc.org/results/FDR/TPCC/IBM_595_20080610_fdr.pdf
First number that should immediately jump at you is the rather enormous cost of this config - mind boggling 17 million dollars! And that is after 20 million dollar IBM discount!!! The storage subsystem configuration is especially ridiculous in its unbounded hugeness - IBM employs 746TB of storage for this setup using a grand total of 11,000 disks! And it is not that IBM needs even a half of the 746TB for the database storage requirements, only one fifth of that amount is actually needed - 171TB to be exact. In other words is striping the hell out of the available storage to squeeze every iota of all physically possible storage performance irrespective of how unrealistic it is to the real world - I can tell you for sure that no one in the real world will sacrifice that much capacity over performance improvement - the setup is totally and utterly unrealistic. If you're really planning to use this IBM benchmark as a foundation for an education decision on the real world performance of a potential configuration, well you need to use more common sense to put it mildly.
And again I'm not just picking on IBM specifically, TPC-C benchmark is a hall of fame for "benchmarketing" by other well known captains of industry with HP, Fujitsu and NEC being in the top 10 and challenging IBM with their equally ridiculously out-of-this-world configurations that have very little application to the real world configurations and workloads. If there is one message you would like to take out of all this, it is that do your own benchmarking with configuration and the workloads you're actually planning to use and do not rely on the benchmarketing pushed by the vendors to create illusions of the world beating performance. As your parents probably would have said "use your own head".
