Everything is Miscellaneous

internet, philosophy, psychology, video 3 Comments »

“David Weinberger’s new book covers the breakdown of the established order of ordering. He explains how methods of categorization designed for physical objects fail when we can instead put things in multiple categoreis at once, and search them in many ways. This is no dry book on taxonomy, but has the insight and wit you’d expect from the author of The Cluetrain Manifesto, Small Pieces Loosely Joined, and a former writer for Woody Allen.”

The video starts out a little slow and I kept thinking, “yes, this is obvious,” but it picks up pace and puts all the things that are happening in the taxonomy / folksonomy field into perspective. Also Melvil Dewey of the Dewey Decimal System, sounds a lot more insane than I ever realized. David Weinberger really gets to some issues that I hated about the way that Universities somewhat arbitrarily divide learning into Colleges.

How does it make sense that Computer Science and Painting are both in the College of Arts & Sciences, yet Electrical Engineering is in the College of Engineering, for example. Some important cross fertilization is missed simply because students are physically separated into different buildings. Damn you Aristotle, damn you! *Shakes fist*

Update: just read an article in the NY Times about a library abandoning the Dewey Decimal System

Qwest won’t commit to Seattle fiber, doesn’t want city to do it

internet, seattle No Comments »

Fiber isn’t coming anytime soon, according to Qwest CEO Dick Notebaert. That’s the answer Notebaert gave a Seattle Times reporter in a recent interview, and it looks as though it will echo throughout Qwest’s service region. Although the telecom has built some very small FTTP and FTTN networks for specific subdivisions, it has shown little interest in the kind of widescale deployment its larger brethren are undertaking.

Seattle is anxious to get fiber one way or another, and last year a city task force recommended it develop its own fiber network in order to remain competitive. Predictably, Qwest wasn’t too impressed. In fact, it was downright critical of the task force’s report.

Via Arstechnica

The Coming Battle Over Grid Computing and Internet Services

google, grid computing, hardware, internet, microsoft, yahoo 3 Comments »

A comment I left on Wes Maldonado’s blog has started a conversation about grid computing. He posted on Digipede, a Windows centric way to do distributed computing and I responded that it would be “nice” not to be forced to do this type of work on an operating system that required a GUI. That set off another post about cost effectiveness and using existing infrastructure, points I don’t disagree with.

In an IT environment with a lot of computers running Windows and a problem that allows you to do distributed algorithms easily, Digipede seems pretty exciting. Ever since the original Distributed.net, I’d wondered if a company would bring a product like this to market. It is something that I would have a lot of fun playing with.

That said, my point on electricity and running super computer clusters on Windows still stands. My comment wasn’t intended to disparage Digipede so much as point out the problem that Microsoft is going to have competing with companies like Google and Yahoo for the next generation of Internet Services. Some have estimated that Google’s data centers have well over 100,000 COTS PC’s setup in a distributed grid. Google is running Linux, which can run headless without a video card, or the need to install any GUI package. Linux has been “designed” to be completely scriptable from a command line interface.

Windows however, appears to have a tight integration between the GUI layers and the NT kernel. As far as I know, it is impossible to install Windows on a machine without a video card. Obviously, the GUI layer will be paged to disk on all these machines, but the cost of a video card multiplied by several hundred thousand is needless. The other competitive advantage Google and Yahoo have is the scriptability of Linux and FreeBSD. While PowerShell is a step forward for Microsoft, my view is that the UNIX environment wins on system administration scriptability.

The key to building super computer clusters is easy system administration. Perhaps Microsoft can leverage their existing infrastructure and prove that GUI tools can do everything the UNIX ones can and more, but they are starting with less experience. Google has already proven they can do it effectively. My back of the envelope estimation is that Google Linux sysadmins are each responsible for between 1,000 and 2,000 servers. I don’t see a Microsoft solution for that yet, and I don’t think the Digipede product is intended to compete in that type of environment. Digipede also probably isn’t going to compete in the National Labs super computer arena either (at least yet).

The second problem is any kind of parallel programming is really hard. Even threads prove a huge challenge within a single application. While clever, I don’t think that Map/Reduce is a magic bullet either. A lot of algorithms simply don’t scale linearly with computing power, so adding more hardware just burns a hole in your wallet and in your data center’s air conditioning. All that said, there are plenty of places that products like Digipede would fit perfectly.

Mainly, I am interested to see how this all shakes out, as we see Microsoft, Google and Yahoo building their data centers close to hydro-electric power to cut costs. Sun is also a dark horse in this whole race, building out their grid infrastructure and custom chips that suck less juice. I can’t wait to see more!

Parsing, Priv Seperation and chroot

code, http, internet, security No Comments »

I fixed up the parsing issues on Shelob so that it is somewhat respectable, instead of a bunch of hacks. It was obvious once I started looking at what the client was sending me (the LiveHTTP headers Firefox extension rocks), that I needed to break up each line and then seperate the values into a name and value.

After rewriting the getHeaders() function to use STL hash tables, not only is the code more flexible, but it is also cleaner. For example:

[code]
log.writeLogLine(inet_ntoa(sock->client.sin_addr), request_line, 200, size, headermap["Referer"], headermap["User-Agent"]);
[/code]

Here, with the headermap, it is obvious what values I am passing. Before the rewrite, I just had a bunch of tokens[3], tokens[5], etc.

I’m also toying around with the idea of privilege seperation and chroot jails. This sort of flows with the previous post of a micro-kernel type approach, similar to how Postfix works. While it is more secure, the programming challenges are pretty high. I may leave that for a later version. I still have a bit of cleanup to do before a release.

Aside:

Theo de Raat gave a nice presentation on exploit mitigation techniques that OpenBSD is using which relates to some of these ideas.

More Hacking Shelob

code, http, internet 1 Comment »

I fussed around more with logging today, which lead me to the parseHeader() function. Parsing is one of the weakest areas right now. For simplicity, I had implemented it by simply tokenizing on “space”, shoving the tokens into a string vector and then iterating over that vector for the tokens I needed.

So far, I’ve not peeked at anyone elses source code, Shelob is a clean room implementation of a basic HTTP server. However, I really need to clean up the parser. I thought about going with a full lexer using flex or something, but that is probably overkill. Plus, I’d rather not add another dependency. More thought on this is needed and maybe some research into how other people are doing this. Very much an area where security can go wrong, it needs to be done right.

The other thought I had while poking around, is that I could make each component into its own server, sort of a mini-microkernel approach. I could imagine a swarm of different servers, all being able to communicate. You could have the log server running on one host, seperate cgi servers for each user, as well as different backends. The only thing I’m not sure about is how much overhead this would be. A lot of the interprocess communication could happen over local UNIX sockets, FIFOS, or even shared memory, but it would be awesome if it all worked fast over a regular socket. Yet more thought needed here.

So far I’m having a blast playing with this program. It is nice to write something for yourself and make only the trade offs you decide. I don’t have any customer or management trying to shoe horn this thing into something I don’t want. Even if I never release it, it is a good brain excercise.

Google Adsense and the Magic of the Long Tail

google, internet 2 Comments »

Lem Bingley at IT week blogs about the millions of blogs now running adsense that rarely, if ever break the $100 limit that Google requires before they cut you a check. This made sense in the early days of Adsense, since they were still mailing checks to everyone. It certainly isn’t cost effective for Google to mail out $0.10 checks all over the world.

However, with electronic transfers now enabled, they’ve kept the limit the same. Even banks don’t make this much money of the float. If Yahoo or MSN really wanted to cut into the long tail of adsense, they would lower the minimum payout for electronic transfers to something more like $25/month. The other major complaint that I have with Adsense, is that I am not allowed to set a bid price on what ads can appear on my site. Google controls it. If they determine that my page rates $0.01 ads, that’s the ad they place.

Granted, it is in their best interest as well as mine to put the ad most likely to recieve a legit click-through. However, it may not be in my best interest to clutter up my page with $0.01 advertisements. I should be able to set a minimum bid price for an ad to appear on my site. If I bid too high, then ads don’t show up, but since I’m not making much money anyway, I probably won’t care. My visitors will be more likely to come back and read something else I wrote.

I am very much looking forward to good competition in this space. I’ve tried the Yahoo Beta program and it isn’t close yet. I hope it gets better soon.

Yahoo Slurpy Verifier

internet, yahoo 3 Comments »

Some kind of beta webcrawler from Yahoo has been hitting my site in weird ways. It crawled one page 41 times today so far, sometimes less than a minute apart. Uhhh? Yahoo?  The user agent is Slurpy Verifier/1.0 and it is coming from 66.228.164.201/rdev25.yst.corp.yahoo.com.

  1. 24/Apr/2006:04:54:17
  2. 24/Apr/2006:04:55:20
  3. 24/Apr/2006:05:09:19
  4. 24/Apr/2006:05:10:12
  5. 24/Apr/2006:05:25:35
  6. 24/Apr/2006:05:26:51
  7. 24/Apr/2006:05:40:03
  8. 24/Apr/2006:05:53:00
  9. 24/Apr/2006:06:08:24
  10. 24/Apr/2006:06:23:49
  11. 24/Apr/2006:06:39:04
  12. 24/Apr/2006:06:55:50
  13. 24/Apr/2006:07:10:21
  14. 24/Apr/2006:07:26:04
  15. 24/Apr/2006:07:41:15
  16. 24/Apr/2006:07:57:16
  17. 24/Apr/2006:08:13:54
  18. 24/Apr/2006:08:25:44
  19. 24/Apr/2006:08:41:38
  20. 24/Apr/2006:08:56:35
  21. 24/Apr/2006:09:09:55
  22. 24/Apr/2006:09:25:36
  23. 24/Apr/2006:09:42:03
  24. 24/Apr/2006:09:57:09
  25. 24/Apr/2006:10:11:12
  26. 24/Apr/2006:10:25:53
  27. 24/Apr/2006:10:41:22
  28. 24/Apr/2006:10:56:33
  29. 24/Apr/2006:11:11:46
  30. 24/Apr/2006:11:25:46
  31. 24/Apr/2006:11:39:40
  32. 24/Apr/2006:11:55:19
  33. 24/Apr/2006:12:13:24
  34. 24/Apr/2006:12:25:08
  35. 24/Apr/2006:12:40:09
  36. 24/Apr/2006:12:53:50
  37. 24/Apr/2006:13:11:00
  38. 24/Apr/2006:13:24:19
  39. 24/Apr/2006:13:41:08
  40. 24/Apr/2006:13:53:56
  41. 24/Apr/2006:14:09:42

My Internet Drives Me Crazy

comcast, internet No Comments »

UPDATE: I upgraded my firmware on my wireless router and things started working. One of the most bizzare network issues I’ve seen, since I hadn’t touched anything on that router for about 2 years, and the problem was intermittent. I got rid of the old Sveasoft WRT54 firmware and upgraded it to DD-WRT.

In the past I haven’t had a lot of trouble with Comcast Broadband, but for the past couple days it has been dropping packets all over the place. ARRRRRGH.

--- comcast.net ping statistics ---
624 packets transmitted, 95 packets received, 84% packet loss
round-trip min/avg/max/stddev = 68.991/400.410/3449.330/765.964 ms

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in