Solaris Perl CPAN

May 23, 2007 at 11:36 AM | categories: solaris, troubleshooting, system administration, perl | View Comments

Today, I was searching Google for help installing Perl modules through CPAN using the default Solaris Perl. Sadly, my own blog was one of the search results, and it was no help. I guess this entry is going to make the situation even worse. So I suppose I should put some useful information:

  • Solaris Perl is compiled using Sun Studio and not gcc
  • You must compile Perl modules with the same compiler Perl was compiled with
  • The Blastwave Perl is also uselessly compiled using Sun Studio and not gcc
  • Sun Studio is now free instead of thousands of dollars and free to download
  • The Sunfreeware Perl Package is compiled with gcc. Go sanity!
I'm sure if you cared enough and wanted to waste time, you could download the Sun Studio compiler just for your handful of Perl modules, or you could download the Sunfreeware package and use gcc, the compiler that God intended you to use. Your choice man. BTW, Sun, you suck.
Read and Post Comments

Amazon S3 Backup Solution

May 21, 2007 at 12:48 PM | categories: unix, web 2.0, system administration, perl | View Comments

Although I've had an Amazon Simple Storage Service account for awhile, I haven't used it. For those of you who aren't familiar with S3, Amazon has opened up their resources for everyday people to use. In this instance, you can use their servers as a place to dump your files online. Currently they charge $0.15 per gigabyte of storage used as well as a fee for the bandwidth to transfer it back and forth.

With this setup, they take care of the administration, backup, redundancy, troubleshooting, and the storage scales to whatever you need automatically. I've been searching for a good backup script so I can backup all the stuff I have running on this web-host, but most of them have been beta to this point or a pain to setup. Today I finally installed Brackup through CPAN, along with all the requisite Perl modules. I've already tested a backup and restore and it seems it will fit my needs well.

Read and Post Comments

Convert Floppy Image to an ISO (Solaris/Linux)

March 19, 2007 at 12:51 PM | categories: solaris, troubleshooting, system administration | View Comments

Device manufacturers still haven't caught on that floppy drives are no longer standard equipment on most modern machines. I recently came across this issue when trying to install a RAID driver on a Solaris 10 (x86) box, and solved it thusly: # lofiadm -a /export/home/mmichie/tmp/ARCMSR.DD /dev/lofi/1 # mount -F pcfs /dev/lofi/1 /mnt/floppy/ # mkisofs -R -J -o driverdisk.iso /mnt/floppy/ Total translation table size: 0 Total rockridge attributes bytes: 2428 Total directory bytes: 16384 Path table size(bytes): 122 Max brk space used 10000 278 extents written (0 MB) In other words, download the raw floppy image and mount it as a loopback device. Then use mkisofs to translate it to an iso. Use your favorite CD-R burning software to burn the ISO. Install your driver disk. This can be done similarly in Linux, the main difference will be mounting the floppy image: mount -o loop driverdisk.img /mnt The mkisofs command will be exactly the same as Solaris.

Read and Post Comments

Cron error: bad user (root) or setgid failed (root)

February 14, 2007 at 12:24 PM | categories: solaris, system administration | View Comments

For all you Googlers out there: If you see the following in /var/cron/log on Solaris:

! bad user (root) or setgid failed (root)
The solution is restarting cron.
Read and Post Comments

System Administration as Science

October 21, 2006 at 02:36 AM | categories: philosophy, system administration | View Comments

One goal in my day to day work is to quantify events in a systemic way. System administrators are in a unique position to view the network, servers, clients, software and the ways that they interact. While good software development depends on abstracting away as many things as you can, good system administration depends on understanding how the layers interact.

For example, a good developer will abstract away the type of database he is connecting to. There is a small shim that can be adjusted so that the program runs with no changes on Oracle or PostgreSQL, for example. The Java language itself depends on abstracting away the entire computer by implementing a virtual machine that acts consistently over differing operating systems, or even different CPU architectures. A Java programmer doesn't care that he is running on Solaris Sparc or Linux MIPS or Windows X86, or whether the CPU is big-endian or little-endian.

However, a good system administrator does care, and should know the difference. System administration is about removing layers to solve problems that occur when the abstractions break down. Joel Spolsky refers to this as "The Law of Leaky Abstractions."

All non-trivial abstractions, to some degree, are leaky.

Some have compared system admins to the plumbers of the IT world. Like plumbing, the effects of system administration disappear when everything is working. Only when things start to leak, and shit starts to hit the fan (literally or figuratively) does it become noticeable. There seems to be one breed of system administrator that thrives on fixing problems. Imagine the server going down, and the mayor frantically paging the heroic sysadmin with the Bat Signal.

Our hero drops into the storm with his combat boots and trusty Leatherman, typing arcane commands, drinking Mountain Dew and cursing at everyone around him. Suddenly, joyous shouts erupt as the users discover their work can continue. Everyone cheers the SysOp, while he struts back to his Bat Cave, until the next Bat Time, at the same Bat Channel.

How does one measure the performance of the lone rogue sysadmin troubleshooter against another that has carefully scheduled downtime, and the system "just works"? Is the system with less downtime more reliable because of the work of the system administrator, or are they just lucky? How does one compensate the hero who fixes every problem solved, verses someone that never demonstrates this ability because the system never goes down?

What of the sysadmin who has unreliable hardware or buggy software forced on him by upper management or customer demand? A lot of companies want to measure metrics like uptime, but is it even possible to properly measure 99.99% uptime, and does that have any correlation to the person running the system?

99.9% uptime amounts to approximately 42 minutes of downtime in a single month, but many of the tools used to measure the availability of the system have a minimum time resolution of 1 minute. For example, you want to test that your website is up and available to your users, so you write a script that makes an HTTP request and returns the result. It sends you e-mail if it doesn't get a response. However, the standard UNIX cron utility that schedules tasks can only run once per minute.

With a CPU running millions of instructions per second and servers typically having multiple processors, one minute is too long. But, if we magically invent a utility that can schedule and execute your script once per second, suddenly your server is overwhelmed by these requests and your script itself brings the system to a halt. What if you have a process that crashes and restores itself in less time than your monitoring tool checks? You wouldn't consider a server that crashed every 30 seconds reliable, but most monitoring software can't tell the difference.

Recently, I upgraded our company's e-mail server because it was crashing under an ever increasing load of spam. The new software was more efficient and no longer crashed, however this meant it was also more efficient at delivering spam. I was happy because I wasn't getting pages to restart the mail server, but the average user actually saw more spam in their in-boxes. It is difficult to explain to the average person who just wants to read and send e-mail how complex the system is and how upgrading the software was the right thing to do.

Most people don't understand that e-mail isn't guaranteed instant delivery, and that mail servers will attempt redelivery if it can't get through to a server. In our case, when the server was flooded by spammers, all the legitimate e-mail eventually got through while some spam probably didn't (spammers typically won't retry delivery when they can't connect). Now, both spam and ham get through equally quick. Of course, we are working on ways to reduce the spam, but it is an almost intractable problem when you have thousands of people around the world working day and night to devise clever ways to deliver their junk.

One thing that is important from a sysadmin point of view is to document and explain the problem both upwards to management and downwards to the clients and customers. To quantify the problem I'm using log analysis tools to graph the problem over time. Now that I have hard data, I can start to formalize the problem and test the validity of various hypotheses to solve it.

The challenge, as with uptime statistics is to find numbers that are accurate without introducing a sort of Heisenburg effect from monitoring and then presenting the numbers in a way so that the people who depend on the sysadmin to get their work done can evaluate whether that person is doing a good job or not. I'm not sure there is any magic bullet, but it is clear to me that applying some science to the art of system administration can aid in communication, diagnosis and ultimately problem resolution.

It is an area I will be expending more brain slices on in the future and on this blog.

Read and Post Comments

« Previous Page