Thursday, 2 July 2009

Unix Pax to the Rescue - Extracting Absolute Path Names from tar Archives

Dealing with tar archives containing absolute path names can range from just annoying to potentially dangerous - unwittingly uncompressing the archive may range from silently growling at your simpleton co-worker who created the archive to sudden gut wrenching realization that you just overwrote some important files in unintended locations. Needless to say that creating tar archives with absolute path names is just a bad practice. Solution to this practice is usually to yell at the poor dope who built the archive and to teach him not to do it again. But what if you really need to get to the contents of the doggone archive without going though the contortions of creating dummy directory hierarchies to accommodate the files that are being extracted? The solution to this conundrum is to resort to a powerful but vastly under appreciated Unix tool called pax. The Unix pax (pax means 'peace' in Latin by the way) utility can support a number of archive format (tar and cpio included) and can perform one notably wonderful function of manipulating the paths of the restored files, which as you can imagine is our way of the ugly tar conundrum. All you have to is to provide a sed type expression to the -s command line switch of pax command, which will perform the path name transformation for you. For instance you may have a tar archive that has '/usr/local' as the part of file name that is getting in the way and you would really prefer not to overwrite things under /usr/local while extracting this archive, pax makes it very easy:

# pax -r -s ':^/usr/local::' < doggone_archive.tar

All we have done above is provided a sed type expression that modified all paths starting with /usr/local (aka ^/usr/local) and replaced them with null string (empty space between the ':' delimiters). Solving problems is easy when you have the right tools.

Pax vobiscum

Friday, 8 May 2009

Oracle to Boost Investment in SPARC

If you're following the press, many so called industry experts are proclaiming that there is now a cloud of uncertainty gathering over SPARC recently in light of the recent Oracle-buying-Sun announcement. Most of this speculation largely coming off as competitive FUD you hear from IBM and HP is referring to the theory that Oracle really wants software from Sun and that the hardware side of the business is just the icing on the cake that Oracle will eventually throw away. Well, not so fast. It looks Oracle is actually quite enticed by SPARC and is planning to increase the investment into chip development. A quote directory from Larry Ellison:

"Q. Oracle's done integrated hardware and software design with the Exadata database machine. But Exadata uses standard Intel chips. Are you going to discontinue Sun's SPARC chip?

A. No. Once we own Sun we're going to increase the investment in SPARC. We think designing our own chips is very, very important. Even Apple is designing its own chips these days. Right now, SPARC chips do some things better than Intel chips and vice-versa. For example, SPARC is much more energy efficient than Intel while delivering the same performance on a per socket basis. This is not just a green issue, it's an economic issue. Today, database centers are paying as much for electricity to run their computers as they pay to buy their computers. SPARC machines are much less expensive to run than Intel machines."

You can find the transcript of the entire Q+A session with Larry at the following URL:

http://www.reuters.com/article/marketsNews/idINN0740285120090507?rpc=44

Friday, 1 May 2009

Laugh Of The Day

There is not as much ribbing on the Microsoft theme as it used to be in the Windows 95 days, but now and again there are still some gems popping up on the web. Here is a good piece on the theme of "If Everything Was Made by Microsoft" courtesy of Cracked.com (I almost split my sides laughing):

http://www.cracked.com/article_17323_if-everything-was-made-by-microsoft.html

Thursday, 12 March 2009

TPC-C - The Benchmarketing Hall of Fame/Shame

Benchmarking and vendor benchmarking specifically is a dirty word in IT - only the naive take the benchmarks at face value and more wise will know that if you want to know the performance levels of the proposed configuration, you have to do it yourself exactly with the configuration you're planning to put in production. Otherwise you'll be committing yourself to a purchase strictly based on the ability of vendor X to muscle and beat a certain benchmark into submission by most likely throwing unrealistic amounts of hardware at the problem and doing some very questionable tuning. I do not want to pick on IBM as if it is the only vendor doing this sort of activity as many companies engage in "creative benchmarking", but IBM in my opinion has perfected this art. IBM always treated TPC-C benchmark as its oyster almost perfecting the art of squeezing the good benchmark results, not so much curtesy of the stellar performance of IBM products, but rather deep pocketed ability to throw ungodly amounts of resources to get the top benchmark. Now, TPC-C in itself has long been questioned as a good benchmark, with many experts stating that does not really reflect the real world workload (even IBM nodded in agreement) and most smarter observers are now paying more attention to TPC-H instead as being more realistic, but that is besides the point. I recently had a good laugh looking the #1 TPC-C benchmark result not surprisingly being held by IBM p595:

http://www.tpc.org/tpcc/results/tpcc_perf_results.asp

On the surface it is all looking nice with IBM scoring amazing 6,085,166 tpmC while costing very respectable 2.81 USD/tpmC, but if you even scratch the surface by looking at the disclosure reports revealing the internals of this setup, you are set to be amazed by the sheer absurdity of this configuration:

http://www.tpc.org/results/individual_results/IBM/IBM_595_20080610_ES.pdf
http://www.tpc.org/results/FDR/TPCC/IBM_595_20080610_fdr.pdf

First number that should immediately jump at you is the rather enormous cost of this config - mind boggling 17 million dollars! And that is after 20 million dollar IBM discount!!! The storage subsystem configuration is especially ridiculous in its unbounded hugeness - IBM employs 746TB of storage for this setup using a grand total of 11,000 disks! And it is not that IBM needs even a half of the 746TB for the database storage requirements, only one fifth of that amount is actually needed - 171TB to be exact. In other words is striping the hell out of the available storage to squeeze every iota of all physically possible storage performance irrespective of how unrealistic it is to the real world - I can tell you for sure that no one in the real world will sacrifice that much capacity over performance improvement - the setup is totally and utterly unrealistic. If you're really planning to use this IBM benchmark as a foundation for an education decision on the real world performance of a potential configuration, well you need to use more common sense to put it mildly.

And again I'm not just picking on IBM specifically, TPC-C benchmark is a hall of fame for "benchmarketing" by other well known captains of industry with HP, Fujitsu and NEC being in the top 10 and challenging IBM with their equally ridiculously out-of-this-world configurations that have very little application to the real world configurations and workloads. If there is one message you would like to take out of all this, it is that do your own benchmarking with configuration and the workloads you're actually planning to use and do not rely on the benchmarketing pushed by the vendors to create illusions of the world beating performance. As your parents probably would have said "use your own head".

Dilbert.com

Sunday, 8 March 2009

Solaris vs. Linux in the Data Center - Maturity vs. Gloss

If you've been around the Unix discussion groups, most likely you have run into countless arguments on the merits of Solaris vs. Linux and which one is better for a particular task or which one is technically superior. The scenario is quite typical where Solaris proponents claim superiority on the technical grounds while referring to Dtrace/ZFS and Linux fans claim that Linux is more ubiquitous, more polished and easier to use. I would agree with both sides - Solaris is an undisputed winner on the technical grounds with more advanced technology packed into it and Linux more of a consumer friendly type. If we go to the world of car analogies, I would liken Linux to a small pickup truck that has a nice stereo and a pretty paint job, Solaris on the other hand is more like giant Caterpillar earth moving truck - it doesn't have the prettiest paint job in the world and it may not have the stereo, but it is brutally efficient at what it was designed to do - moving dirt, moving lots of it and doing it in the most efficient manner. Predictably Linux fans quip that Dtrace, ZFS, high scalability, etc. of Solaris is something they can live without by compromising on old-school LVM/RAID, SystemTrap (poor mans imitation of Dtrace) and scaling horizontally on small PC-type servers. Its all good and true, but compromise is a compromise and all it is saying is that Linux is still second best technically - the level of technology is still on there on the same level with Solaris. But even if you ask me if I would choose Solaris over Linux even when we pretend that ZFS and Dtrace don't exist, I will still say that Solaris is better fit as a serious data center type OS - Solaris still has better tool chains for managing your systems on the daily basis. Solaris LiveUpgrade is a prime example of that - this feature alone would steer me to Solaris and no amount of Linux prettiness would compensate for it. If you're not familiar with Solaris LiveUpgrade - it is essentially a mean of easily creating and maintaining multiple boot environments under Solaris, which while sounding rather simplistic permits tremendous savings in terms of downtime when performing upgrades and patching. Which in essence all OS upgrades regardless how significant and lengthy can be reduced just to the time it requires to reboot the machine - you just create an alternate boot environment which is an identical copy of your current OS and then patch it or upgrade it while the system is running like nothing special is going on! Then you activate the new boot environment and reboot. You system comes up and voilĂ  - you're running an upgraded/patched OS! The only disruption you would see is a reboot. No huge change windows required to prepare the system for a potentially risky change and no taking the system out of service to install countless patches and updates that take what seems like forever. And the best part is you can easily return to the previous pre-patch/pre-upgrade state, again with a simple reboot. Needless to say that LiveUpgrade adds a lot to my peace of mind when going through upgrades with Solaris, which is something I can't say about Linux where all you have is much less prettier option of rolling back to a mirror that must be split off before the patching/upgrade. So as I said, even forgetting the latest wizbang technology of Solaris 10 and OpenSolaris, all the Linux gloss is just not enough to overcome the industrial strength features of Solaris that made it the OS of choice in the Unix data center.

Thursday, 26 February 2009

Unix is Coming Back IDC Report Says


Not so log ago Unix was almost universally disparaged as a platform that is slowly but surely heading for the dust bin, a dim light that is soon to be stomped out by better-than-sliced-bread Windows and then Linux. Well, even in the dark times of the global recession Unix is actually looking like a come-back-kid. The fresh off the press IDC Worldwide Server Quarterly survey is painting a very interesting picture indeed with Unix easily eclipsing both Windows and Linux in terms of generated revenues and growth:

  • The UNIX market (excluding Linux) saw very strong growth Quarter over Quarter (QoQ), with a revenue increase of 30.4% ($3,741M to $4,877M) and a unit increase of 8.3% (114,845 to 124,346).
  • UNIX (excluding Linux) was the largest OS segment by revenue last quarter, eclipsing Windows, which slipped to #2. Also on the processor front RISC saw a QoQ 32.7% increase in revenue and a 15.3% increase in units shipped.
  • Whether this marks a return to the more premium UNIX platforms from vendors like Sun and IBM remains to be seen. But either way it represents a healthy trend in the UNIX/RISC market.
It is hard to provide an easy interpretation for this phenomenon, which goes against the grain with most of the pat industry analysts' predictions - both Linux and Windows slowed down and Unix accelerated (relatively speaking). My interpretation would be that this situation may be an indication that both Windows and Linux have spent enough time in the market to show their true value (or lack of thereof) failing to provide a sufficient business case against tried and true Unix platforms entrenched in the high end and mission critical segments. Unix lives yet another day and lives quite well I have to say.

Friday, 9 January 2009

Another one bites the dust - PA-RISC is officially dead


Another piece of not so pleasant news - the RISC processor bunch has lost yet another member with PA-RISC being announced to be finally End-of-Life'd by HP. You can no longer order HP 9000 even if you really wanted to. Here is the complete announcement from the HP site:

http://www.hp.com/products1/evolution/9000/eol_announcement.html


It is sad to see this processor go especially considering the fact that there was no technical reason for retiring it. PA-RISC was a very strong architecture with a significant following especially in the technical computing circles and the only reason HP mothballed it is because Intel Itanium was supposed to be the next best thing since sliced bread. Well, we know how that turned out to be - Itanium is barely dragging its own weight and even SPARC, which is much criticized by HP as a dwindling platform, is outshipping Itanium by hefty margin on unit shipments. And to this day there are hardly any indications that Itanium is going to live up even to a fraction of the original hype created by HP and Intel. At the end of the day it turned out that Itanium was more powerful as a propaganda machine than as a processor.