Jun 20 2008

Project Euler and Prime Factorization

I have been doing some of the exercises at Project Euler lately. Project Euler describes themselves as:

A series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

It has been a lot of fun to code these up in my language du jour, Python. There are a couple of problems that Python’s built in libraries have made trivial. I have to admit the most enjoyable part for me is having problems that require efficiency in algorithm. For the simpler problems, I usually just quickly hack together the “naive” brute force method, figure out that it doesn’t scale and then start investigating how I can fix it. Doing this, you will exercise your mathematics, computer science and programming skills, something that a lot of programming doesn’t do.

I convinced my girlfriend to work with me on one of the exercises, and of course she picked one of the Prime Factorization problems. The naive brute force algorithm would not be an option for the large composite number given, so we ended up hacking together a Sieve of Eratosthenes. Ultimately, we got a version working, but it was still pretty inefficient, only returning the answer in about an hour. An optimal version should be able to do it within seconds. Obviously there is some “refactoring” to do.


Jun 10 2007

Regular Expressions, Lisp, SQL, Parsing, Domain Specific Languages

I’ve been trying to code some more on Project Shelob (my web server) in my spare time. I’m to the point of needing a configuration file, so I can start up the server using different ports and directories for testing. Speaking of testing, I’m also to the point of needing automated test suites. I was refactoring some of the HTTP code, and when I got done, it was far more readable, and there was much rejoicing! Unfortunately, two days later I discovered I had introduced a subtle bug in keep-alive handling during a 404 event. Oops.

Anyway, I decided to use JSON as my configuration language. Simple, accommodated everything I needed, and later I would be able to easily write an AJAX GUI front end to configure the whole thing. Should be slick, right? Not as easy as it might sound. Though I have written parsers by hand, I’d rather not. Ok, so I’m using C++, surely someone has written an easy to use open source library that I can just stick in my rules and get out a nice data structure, right?

Well, kind of. There is Boost Spirit which would do everything that I want it to do, but it also required me translating the EBNF grammar of JSON into Boost’s strange amalgamation of YACC and C++. Okay well and good, but surely there is something better? After some more searching, I run across ANTLR which seems to be the spiritual successor to LEX and YACC/Bison. It even has a nice Java GUI and someone had kindly done the ANTLR rules for JSON. Check out the graphical goodness:

Still, the C++ backend wasn’t fully supported and required installing libraries and was complicated. Not 100% what I needed or wanted. All of which got me thinking about domain specific languages. Most programmers don’t consider it, but SQL and Regular Expressions are good examples of Domain Specific Languages (DSL), as are lex and yacc/bison. Up till now, I’ve frowned on the whole idea of DSLs in general. It had always seemed like bad software engineering practice to invent a new language for each problem. After all, did we really want to learn an entirely new programming language with each assignment? Who is going to maintain the code?

However, the facts point out that you have to learn an entire API anyway, and the API really just layers over what you’re really trying to do with a language that wasn’t quite expressive enough to do the job natively to begin with. Which of course leads me to LISP and through Martin Fowler who makes some good points here:

“One of the most obviously DSLy parts of the world is the Unix tradition of writing little languages. These are external DSL systems, that typically use Unix’s built in tools to help with translation. While at university I played a little with lex and yacc – similar tools are a regular part of the Unix tool-chain. These tools make it easy to write parsers and generate code (often in C) for little languages. Awk is a good example of this kind of mini-language.”

While I’ve been using SQL, regular expressions, awk, lex, and yacc for years, I’d never really classified them in my mind as DSLs. I’ve been well aware of the power of small specialized utilities aggregated together to perform a bigger task and why UNIX has been so successful at this, but I hadn’t made the leap to apply this to my programming. Fowler continues:

“Lisp is probably the strongest example of expressing DSLs directly in the language itself.. Symbolic processing is embedded into the name as well as practice of lispers. Doing this is helped by the facilities of lisp – minimalist syntax, closures, and macros present a heady cocktail of DSL tooling. Paul Graham writes a lot about this style of development. Smalltalk also has a strong tradition of this style of development.”

I’ve heard “grey-beards” and academics talk about the power of Lisp for years, and though I did some trivial functional programming in college, I’ve dismissed the rants of the Lisp guys as nothing more than rants. Today though, the ideas are crystallizing in my head, and I’m excited to explore this more.


Oct 16 2006

Great Java RFE Bug

I love snarky bug reports for some reason. It cracks me up that it took 8 years for Sun to add password prompting to Java. The users increasingly becoming irate in the bug reports is awesome. I wish the programmers would have responded back in a big flame war. I can only imagine what they were saying inside Sun. Good stuff.

Improved interactive console I/O (password prompting, line editing)


Oct 15 2006

Yet More Shelob Hacking

I’ve been fixing some on my web server again, for fun. Last weekend, I refactored a lot of code, added dynamic mime typing and CGI support! I was going to continue fleshing it out today, but I had to do even more refactoring to clean up some messy code paths.

I broke the Http:sendFile() method into several new ones and moved HTTP/1.1 keep-alive handling into a more central location instead of just tacked on to the first place that it worked. One thing that is driving me slightly insane, is that there appears to be a tiny memory leak. I can’t figure out where it is happening since I eliminated almost all dynamic allocations in the stack. As I wrote that sentence, I think I figured out where it could be coming from, but I’ll need to rewrite some more code to fix it. It is so small, you don’t start to notice it until you get at least 1,000 requests.

Overall, I’m happy with the design so far. This is my first C++ program and my first serious “server” program. I didn’t do any real upfront design except for drawing on past experience and my gut. I’ve added a number of features and it has been extendable. OOP purists would probably frown on it, but I’m using the subset of C++ and OOP in general that makes sense to me and is practical for what I’m doing. Is there a cleaner way? Likely *shrug*.

I’m getting to the point where a couple of patterns probably make sense. This program is growing organically, but under a tight enough constraint that it isn’t turning into a mess (at least not yet). The other thing that starts making sense is Unit Testing. I’ve been doing more refactoring than I have been adding new features and it would be awesome to be able to run a test suite and know that I haven’t broken anything. I’m not even really sure where to begin on that, but it is obvious that Shelob is becoming more of a “real” program and less of a toy.

A couple more good weekends and it would actually be semi-useful.


Sep 22 2006

Simple unix tools in Haskell

Simple unix tools written in Haskell.

This is intended as a beginners tutorial for learning Haskell from a “Lets just solve things already!” point of view. The examples should help give a flavour of the beauty and expressiveness of Haskell programming.


Jul 15 2006

Why Functional Programming Matters


Jul 14 2006

Java theory and practice

IBM DeveloperWorks has a great article on the details of the JVM garbage collector, including some neat foreshadowing of escape analysis that will be present in Java Mustang.

“The Java language does not offer any way to explicitly allocate an object on the stack, but this fact doesn’t prevent JVMs from still using stack allocation where appropriate. JVMs can use a technique called escape analysis, by which they can tell that certain objects remain confined to a single thread for their entire lifetime, and that lifetime is bounded by the lifetime of a given stack frame. Such objects can be safely allocated on the stack instead of the heap. Even better, for small objects, the JVM can optimize away the allocation entirely and simply hoist the object’s fields into registers.”


Jul 7 2006

Python vs Perl vs Ruby vs PHP vs Java


Jun 13 2006

Concurrent Port Scanner in Haskell

Tom Moertel wrote a very concise port scanner in Haskell. I have never looked at Haskell in any detail, but this program seems pretty impressive. You wouldn’t be able to do a port scanner quite so neatly in most other languages. I guess I’ll have to put that on the long stack of things to look into in more someday.