Everything is Miscellaneous

internet, philosophy, psychology, video 3 Comments »

“David Weinberger’s new book covers the breakdown of the established order of ordering. He explains how methods of categorization designed for physical objects fail when we can instead put things in multiple categoreis at once, and search them in many ways. This is no dry book on taxonomy, but has the insight and wit you’d expect from the author of The Cluetrain Manifesto, Small Pieces Loosely Joined, and a former writer for Woody Allen.”

The video starts out a little slow and I kept thinking, “yes, this is obvious,” but it picks up pace and puts all the things that are happening in the taxonomy / folksonomy field into perspective. Also Melvil Dewey of the Dewey Decimal System, sounds a lot more insane than I ever realized. David Weinberger really gets to some issues that I hated about the way that Universities somewhat arbitrarily divide learning into Colleges.

How does it make sense that Computer Science and Painting are both in the College of Arts & Sciences, yet Electrical Engineering is in the College of Engineering, for example. Some important cross fertilization is missed simply because students are physically separated into different buildings. Damn you Aristotle, damn you! *Shakes fist*

Update: just read an article in the NY Times about a library abandoning the Dewey Decimal System

Regular Expressions, Lisp, SQL, Parsing, Domain Specific Languages

code, lisp, philosophy, programming, software engineering, unix 3 Comments »

I’ve been trying to code some more on Project Shelob (my web server) in my spare time. I’m to the point of needing a configuration file, so I can start up the server using different ports and directories for testing. Speaking of testing, I’m also to the point of needing automated test suites. I was refactoring some of the HTTP code, and when I got done, it was far more readable, and there was much rejoicing! Unfortunately, two days later I discovered I had introduced a subtle bug in keep-alive handling during a 404 event. Oops.

Anyway, I decided to use JSON as my configuration language. Simple, accommodated everything I needed, and later I would be able to easily write an AJAX GUI front end to configure the whole thing. Should be slick, right? Not as easy as it might sound. Though I have written parsers by hand, I’d rather not. Ok, so I’m using C++, surely someone has written an easy to use open source library that I can just stick in my rules and get out a nice data structure, right?

Well, kind of. There is Boost Spirit which would do everything that I want it to do, but it also required me translating the EBNF grammar of JSON into Boost’s strange amalgamation of YACC and C++. Okay well and good, but surely there is something better? After some more searching, I run across ANTLR which seems to be the spiritual successor to LEX and YACC/Bison. It even has a nice Java GUI and someone had kindly done the ANTLR rules for JSON. Check out the graphical goodness:

Still, the C++ backend wasn’t fully supported and required installing libraries and was complicated. Not 100% what I needed or wanted. All of which got me thinking about domain specific languages. Most programmers don’t consider it, but SQL and Regular Expressions are good examples of Domain Specific Languages (DSL), as are lex and yacc/bison. Up till now, I’ve frowned on the whole idea of DSLs in general. It had always seemed like bad software engineering practice to invent a new language for each problem. After all, did we really want to learn an entirely new programming language with each assignment? Who is going to maintain the code?

However, the facts point out that you have to learn an entire API anyway, and the API really just layers over what you’re really trying to do with a language that wasn’t quite expressive enough to do the job natively to begin with. Which of course leads me to LISP and through Martin Fowler who makes some good points here:

“One of the most obviously DSLy parts of the world is the Unix tradition of writing little languages. These are external DSL systems, that typically use Unix’s built in tools to help with translation. While at university I played a little with lex and yacc - similar tools are a regular part of the Unix tool-chain. These tools make it easy to write parsers and generate code (often in C) for little languages. Awk is a good example of this kind of mini-language.”

While I’ve been using SQL, regular expressions, awk, lex, and yacc for years, I’d never really classified them in my mind as DSLs. I’ve been well aware of the power of small specialized utilities aggregated together to perform a bigger task and why UNIX has been so successful at this, but I hadn’t made the leap to apply this to my programming. Fowler continues:

“Lisp is probably the strongest example of expressing DSLs directly in the language itself.. Symbolic processing is embedded into the name as well as practice of lispers. Doing this is helped by the facilities of lisp - minimalist syntax, closures, and macros present a heady cocktail of DSL tooling. Paul Graham writes a lot about this style of development. Smalltalk also has a strong tradition of this style of development.”

I’ve heard “grey-beards” and academics talk about the power of Lisp for years, and though I did some trivial functional programming in college, I’ve dismissed the rants of the Lisp guys as nothing more than rants. Today though, the ideas are crystallizing in my head, and I’m excited to explore this more.

Microsoft, true innovation

humor, microsoft, philosophy, unix No Comments »

Wes: check out introducing pipes
Matt: “Those who do not understand Unix are condemned to reinvent it, poorly.”
Matt: I hear vista finally has symlinks. Wake me up when they invent mount points and finally kill drive letters
Wes: I think you can do that somehow.
Matt: yeah sure, and break everything *nerd rage*
Wes: yeah, junction point. junction points (technet)
Matt: “Those who do not understand Unix are condemned to reinvent it, poorly.”

Update, Wes says, if you want to know more see his blogs at:

I was a ghost in the machine until the machine woke up

philosophy, random, video, web 2.0 1 Comment »

Found this video randomly today…. This is why I do what I do. Computers are great.

System Administration as Science

philosophy, system administration 3 Comments »

One goal in my day to day work is to quantify events in a systemic way. System administrators are in a unique position to view the network, servers, clients, software and the ways that they interact. While good software development depends on abstracting away as many things as you can, good system administration depends on understanding how the layers interact.

For example, a good developer will abstract away the type of database he is connecting to. There is a small shim that can be adjusted so that the program runs with no changes on Oracle or PostgreSQL, for example. The Java language itself depends on abstracting away the entire computer by implementing a virtual machine that acts consistently over differing operating systems, or even different CPU architectures. A Java programmer doesn’t care that he is running on Solaris Sparc or Linux MIPS or Windows X86, or whether the CPU is big-endian or little-endian.

However, a good system administrator does care, and should know the difference. System administration is about removing layers to solve problems that occur when the abstractions break down. Joel Spolsky refers to this as “The Law of Leaky Abstractions.”

All non-trivial abstractions, to some degree, are leaky.

Some have compared system admins to the plumbers of the IT world. Like plumbing, the effects of system administration disappear when everything is working. Only when things start to leak, and shit starts to hit the fan (literally or figuratively) does it become noticeable. There seems to be one breed of system administrator that thrives on fixing problems. Imagine the server going down, and the mayor frantically paging the heroic sysadmin with the Bat Signal.

Our hero drops into the storm with his combat boots and trusty Leatherman, typing arcane commands, drinking Mountain Dew and cursing at everyone around him. Suddenly, joyous shouts erupt as the users discover their work can continue. Everyone cheers the SysOp, while he struts back to his Bat Cave, until the next Bat Time, at the same Bat Channel.

How does one measure the performance of the lone rogue sysadmin troubleshooter against another that has carefully scheduled downtime, and the system “just works”? Is the system with less downtime more reliable because of the work of the system administrator, or are they just lucky? How does one compensate the hero who fixes every problem solved, verses someone that never demonstrates this ability because the system never goes down?

What of the sysadmin who has unreliable hardware or buggy software forced on him by upper management or customer demand? A lot of companies want to measure metrics like uptime, but is it even possible to properly measure 99.99% uptime, and does that have any correlation to the person running the system?

99.9% uptime amounts to approximately 42 minutes of downtime in a single month, but many of the tools used to measure the availability of the system have a minimum time resolution of 1 minute. For example, you want to test that your website is up and available to your users, so you write a script that makes an HTTP request and returns the result. It sends you e-mail if it doesn’t get a response. However, the standard UNIX cron utility that schedules tasks can only run once per minute.

With a CPU running millions of instructions per second and servers typically having multiple processors, one minute is too long. But, if we magically invent a utility that can schedule and execute your script once per second, suddenly your server is overwhelmed by these requests and your script itself brings the system to a halt. What if you have a process that crashes and restores itself in less time than your monitoring tool checks? You wouldn’t consider a server that crashed every 30 seconds reliable, but most monitoring software can’t tell the difference.

Recently, I upgraded our company’s e-mail server because it was crashing under an ever increasing load of spam. The new software was more efficient and no longer crashed, however this meant it was also more efficient at delivering spam. I was happy because I wasn’t getting pages to restart the mail server, but the average user actually saw more spam in their in-boxes. It is difficult to explain to the average person who just wants to read and send e-mail how complex the system is and how upgrading the software was the right thing to do.

Most people don’t understand that e-mail isn’t guaranteed instant delivery, and that mail servers will attempt redelivery if it can’t get through to a server. In our case, when the server was flooded by spammers, all the legitimate e-mail eventually got through while some spam probably didn’t (spammers typically won’t retry delivery when they can’t connect). Now, both spam and ham get through equally quick. Of course, we are working on ways to reduce the spam, but it is an almost intractable problem when you have thousands of people around the world working day and night to devise clever ways to deliver their junk.

One thing that is important from a sysadmin point of view is to document and explain the problem both upwards to management and downwards to the clients and customers. To quantify the problem I’m using log analysis tools to graph the problem over time. Now that I have hard data, I can start to formalize the problem and test the validity of various hypotheses to solve it.

The challenge, as with uptime statistics is to find numbers that are accurate without introducing a sort of Heisenburg effect from monitoring and then presenting the numbers in a way so that the people who depend on the sysadmin to get their work done can evaluate whether that person is doing a good job or not. I’m not sure there is any magic bullet, but it is clear to me that applying some science to the art of system administration can aid in communication, diagnosis and ultimately problem resolution.

It is an area I will be expending more brain slices on in the future and on this blog.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in