Archive for June, 2005

Ever wanted to program extreme?

Sunday, June 26th, 2005

You ever wanted to be as cool as Beck, Cunningham and Jeffries and program extreme but never had partner to do so? This time might be over with your virtual pair programming partner from Azzurri.
I found out about this amazing piece of technology when I searched the web for useful eclipse plug-ins. After installation this tool plugs into your eclipse workbench and after a short analysis of your current project your new friend is starting to give hints on your code. This is very helpful if you stuck with a difficult implementation decision or if you are just unsure about your last lines. The advices are surprisingly sophisticated and seem to base on a huge database of best practices and programming patterns.

The example shows “Birdie” in action:
Soloprogramming
I start the Solo Programming Eclipse plugin every time I need something to smile about ;-).

Review of “Data Crunching”

Tuesday, June 21st, 2005

If you’re reading this, you probably spend some quality time developing software. If you’re developing software, chances are that you have to move data around on a daily basis (lucky you, if you don’t). Be it getting data from one text format to another, moving data from a legacy system to a newer project’s database, transforming XML into some more readable format for your boss or trying to get some useful data out of a former colleague’s own binary format. Whatever you do in that manner, you’re crunching data. Greg Wilson seems to have spent a lot of time crunching data and wants to share his wisdom with the world of pragmatic programmers. The book’s coding focus is on working with Python and Java. I for my part haven’t worked with Python yet, but being familiar with Ruby and Groovy it wasn’t actually that hard to get an idea about what the Python code does (and I’m starting to like Python). So you’ve been warned about that.

Being a big fan of The Pragmatic Programmers’ bookshelf I didn’t hesitate to buy a copy of “Data Crunching” as well. Since I spend a lot of time doing stuff with some more or less usable data I thought it might be a good read to get some fresh ideas. And as it turns out that was a good choice.

Let’s dive into the world of crunching data. Greg takes it easy on the reader in the introduction. He starts off with short examples of his professional career. This helps a lot to get an idea about what data crunching actually is. If you didn’t already know it, reading the first chapter will give you some hints. The book is split up in a simple way. The next chapters will take you most of the data source/formats/crunching you’ll most likely get in touch with. Mainly, that’s text, regular expressions, XML, binary data, and relational databases. The book ends with a short chapter about the so called horseshoe nails, that being things that didn’t fit anywhere else. But we’ll get to that later. Not surprisingly, every chapter ends with a short summary.

The (more or less) simplest data you can work with is text. While some genius programmer in some company whose products you use/once used can always come up with a great new text format that nobody will ever understand, there’s a good chance that you’ll at least get an idea of its meaning by looking at a text file. Greg takes an example from the introduction some steps further to show the basics of working with text files, and also how to work with and around the common pitfalls. Being a pragmatic book, you also get the idea of how to keep your data crunching code nice and clean, and how to deal with normalising, collision detection and, of course, the basics of working with the UNIX shell (the tool of my choice for dealing with most “normal” text). After reading this chapter you have a very good idea about dealing with text. Compressing more information about dealing with text should be almost impossible.

Ah, regular expressions. The sheer joy of getting to know all the differences between grep, sed, awk, vim, Perl RE and the like just keeps me alive. Giving probably the best and shortest (but still understandable) introduction into working with regular expressions, Greg also gives good examples about what you can and what you can (or should) not do with regular expressions. Skimming through the pages you’ll find that regular expressions can be applied to a lot of problems when it comes to handling input.

Working with XML is something I never really got comfortable with, but I gotta say Greg could convince to change my mind here. He introduces the basic techniques to work with XML, being SAX and DOM, showing their strengths and their weaknesses. Pretty much nothing else to say here. Good thing is that he prefers showing how to work with JDOM (Java) and xml.dom.minidom (Python) rather than the clunky C-style DOM-API. The real beauty of working with XML is XPath, at least for me. On the other hand there is XSLT which is more verbose than useful. You might get a similar impression reading this chapter. But I’m not here to judge (well, not about XPath and XSLT, anyway), the day might come when I’ll have to get back to XSLT. It’s always good to know the choices you have.

If you didn’t get a chance to work with binary data yet, then next chapter is for you. One could discuss, if there’s still a need to fiddle with binary data in the modern world. Or you could just give it shot. The examples are pretty straight-forward and understandable. Greg does an impressive job at working through the ups and downs of working with binary data. My fears certainly turned into curiosity after this chapter. After a short introduction into the world of 0 and its buddy 1 you’ll learn how to pack and unpack different data types in fixed and variable formats with metadata.

The chapter on relational databases starts off with the best summary to SQL I’ve read so far, including joins, nested queries and normalisation. Besides text, databases definitely are one of my favourite tools for data crunching, in whatever tongue of SQL they speak. The SQL you’ll learn in this chapter might be almost everything you’ll ever need for working with data from MySQL, Oracle, and the like. Since working with SQL in your code is not the hardest part here, Greg keeps the focus on showing what you can do with SQL itself.

The grand finale is a small collection of so called horseshoe nails, miscellaneous techniques that will help you while crunching data that didn’t really fit somewhere else. I definitely agree with Greg here in that those nails didn’t fit anywhere else, but they’re very much worth reading anyway. He introduces some basic tools like JUnit, diff and Make. He finishes with some short information about encoding/decoding, floating point arithmetic and working with dates and times.

This book is a gold mine for the software developer, be it a beginner or one that crunches data on a daily basis for years. The examples are very applicable to the every day life of a developer. They are simple enough to be immediately understood, but powerful enough to be good snippets to reuse, work on or build own data crunching code on them. Greg does an amazing job at keeping the examples and the text at a level that is both understandable and helpful for every developer. The book should be on your (as well as on mine) shelf whenever you have to write a small script or program to work with yet another data format the world didn’t know existed. Greg will keep you sane and on track with his book. It’s, after all, a pragmatic book! Just like with the other ones (which I can recommend without hesitation), you’ll find tons of information packed into an entertaining, but nonetheless helpful book.

VoodooPad 2.1 released

Tuesday, June 14th, 2005

VoodooPad is one of these tools that have become indispensable for me. In short, it’s a Wiki for your desktop (well, Mac OS X desktop, to be correct). I use it for everything, outlining, keeping my action lists, organizing my thoughts, keeping reference of information I need every day, like code snippets, small FAQs and so on.

While v2.0 was a great tool, v2.1 took it to the next level and introduces support for Spotlight, Categories (a.k.a. tags), inter-document linking and lots of small things. One of my favourite features is still the export to iPod. An awesome tool, be sure to check it out.

Executing Oracle PL/SQL from Ant

Monday, June 13th, 2005

While not being that new of a problem, I ran over it today. So if you ever run into that kind of problem, here’s what you can do. I had the specific issue of dropping all tables in a specific user’s schema. Ant’s <sql> task can’t run any PL/SQL without specific options, because as default, every single line of the input (be it a file or SQL embedded into your build file) is executed separately.

So I have a small PL/SQL script that looks like this:

declare
   table_name varchar2(30);
   cursor usertables is select * from user_tables where table_name not like ‘BIN$%’;
begin
   for next_row in usertables
   loop
      execute immediate ‘drop table ‘ || next_row.table_name || ‘ cascade constraints’;
   end loop;
end;
/

Don’t worry about any inefficiencies, that this snippet might have, for now ;)

So if you stuff this piece of code into Oracle using Ant’s <sql> task, you get a nice error complaining about the second line containing declare.... We build on the following snippet of Ant build file code:

<sql rdbms=“oracle”
     
userid=“scott”
     
password=“tiger”
     
driver=“oracle.jdbc.OracleDriver”
     
url=“jdbc:oracle:thin:@myhost:1521:orcl”
     
classpathref=“classpath”
   
<transaction src=“drop-tables.sql”/>
</sql>

So this is the code that will throw an error when executing a bunch of PL/SQL code. But don’t worry, fixing it is just easy:

<sql rdbms=“oracle”
     
userid=“scott”
     
password=“tiger”
     
driver=“oracle.jdbc.OracleDriver”
     
url=“jdbc:oracle:thin:@myhost:1521:orcl”
     
classpathref=“classpath”
     
delimiter=“/”
     
delimitertype=“row”
     
keepformat=“yes”
   
<transaction src=“drop-tables.sql”/>
</sql>

That’s it you’re saying? Yes! Adding the attributes delimiter and delimitertype tells the task to throw the whole bunch of code directly over to Oracle without worrying about anything except a / on a single line. delimiter tells Ant to look for the / as a separator for a signle set of statements to be executed at once. Since we’re all huge fans of SQL*Plus, we’ll even tell Ant to look for the / (that’s what the attribute delimitertype is for) on a single line by itself. Only then will the bunch of statements that occured since the last / be thrown over to Oracle and be executed. Ah, the joy of delegating.

As a bonus, we’ll throw in the attribute keepformat, since we want to see the PL/SQL code executed as we have it in the file, right? But this only comes in handy when debugging Ant. Newlines will not be removed so that the output will look like the input.

Ruby Doc in a Widget

Thursday, June 9th, 2005

Ah, finally people start thinking about really useful widgets for Apple’s Dashboard. This one allows you to display, filter and, of course, read the RDoc documentation in a nice and clean widget (whose color is, of course, ruby-ish). A must for the lazy and/or efficient Ruby programmer.

(Via Pragmatic Dave)

My new mate: TextMate

Tuesday, June 7th, 2005

This blog post by James Duncan Davidson and some other raves motivated me to check out TextMate, the editor of choice (it seems) by people working with Ruby on Rails. As every serious programmer I’m looking for the perfect text editor. Well, TextMate might not be the one, but it’s pretty darn good anyway. I used Vim a lot, but since I made the switch to Mac OS X, it just didn’t feel right to use Vim as the default text editor. I’ve tried a few, SubEthaEdit (which is great in its very own ways, but not flexible enough), TextWrangler (I don’t like Carbon apps and it doesn’t feel that intuitive to me, but very powerful anyways), Smultron and jEdit (way to much for my needs, plus Swing just looks ugly on Mac OS X). Haven’t tried BBEdit, because shelling out 199 bucks for a text editor is not what I head in mind, that’s more than twice the price I paid for Tiger.

Working with TextMate is a real pleasure. You have an existing project you’re working on? Import it and continue in TextMate. You want to record a macro, add code snippets, execute a command on your source file? You want to fold code, edit text in blocks (a.k.a. column typing)?, code completion Do it with TextMate (at least the current beta version). And, the most important thing, “Pipe through command”. That’s the neatest thing (and a simple feature to ask for, I’d say). Yeah, I know that Vim does a lot of that too, but you know, I just want things to work. Integration with Mac OS X is not that great and I don’t like to mess with the Vim config file everytime I want to add a macro or a snippet or whatever. I still like Vim, but it seems that its days as the desktop editor of choice are over. But luckily there’s still the Terminal, since I like using Vim and its command mode. Quite some powerful stuff, if you master it.

There are some downsides though, but it seems the developer is quite responsive and open to new stuff and user’s wishes. And v1.1 is still in beta, so I’ll cut him some slag.

If you didn’t already check it out yet, I suggest you have a look (22 MB Ruby on Rails introduction video) at it or, better yet, try it yourself.
Welcome aboard my toolset, TextMate! Or should I say: G’day, mate!

CruiseControl Widget for Tiger

Saturday, June 4th, 2005

If you’re up and about with Tiger, Apple’s latest hit at the operating system market, and use CruiseControl to ensure your projects’ compilability (if you don’t, think again, read this sample chapter and do it!), then you might wanna check out this little Dashboard widget for CruiseControl. That’s the second useful widget I’ve come across so far. Otherwise there’s just crap out there ;)

(via)

The Value of Code Metrics

Thursday, June 2nd, 2005

For quite some time now I’ve been having using, or at least had an eye on, tools to measure different kind of code metrics. And I’ve been torn by their usefulness. I’ve been amazed by Clover and used it with pleasure, but there’s always the risk of being dogmatic about code metrics. The question is, who do you work for, yourself, and therefore the project you’re working on, or the tool?

I’ve seen colleagues sitting hours to make their code look good in Checkstyle’s reports, and I’ve caught myself working down that list of “errors” in an almost endless manner. Talking to friend lately about a project he’s joined, he told me that he checked in his first piece of code and not long after he got contacted by the developers telling him that he caused a lot of Checkstyle errors. This got me to thinking about code metrics and code style. Well, a recent discussion on the Pragmatic Programmers Mailing List may have added some thoughts as well.

The first thing are those code metrics. Robert C. Martin did a good job at introducing one of those metrics which I consider as useful. JDepend is a simple, yet useful implementation of Martin’s metrics, and I like to use it, though not obey it, as much as I can. Static code analysis can be a tricky thing, as you can see in FindBugs. It’s one of those tools that can find real bugs (or, if that sounds better to you, common programming mistakes), but can find a lot of false-positives as well. So there’s a thin line here. You can spend a lot of time chasing those “bugs” until you conclude that there is no bug (you’ve expected the word spoon here, didn’t you?).

But then how do you decide on what needs fixing or not? If you have thousands of those “nasty” Checkstyle errors in front of you, do you ignore them (and therefore tend to make this kind of code metric useless) or do you work from the list and fix them all. Or, even worse, do you change the Checkstyle configuration to include fewer checkers to reduce the overall number of errors and let your code formatter do the rest? I’ve seen and done all three of those and lately I asked myself what’s the need for tools like Checkstyle then? Is anybody really giving in to those numbers? If you have one of those huge documents describing the code style of your company, do you really enforce these?

When it comes to getting around enforcement like that, programmers tend to get very creative and do all sorts of stuff just to program the way they’re used to. Getting back to the discussion on the Pragmatic Programmers Mailing List (which is a terrific read, by the way), here are some examples. There was the possibility of having a code formatter run over your code on check-out and then again on check-in. On check-out the code is formatted to the programmer’s style and on check-in it’s being converted back to conform to the company’s coding conventions. Now that’s creativity! But hey, that’s what we’re being paid for, right?

Another way of “enforcing” code conventions is having a formatter hook into your SCM and format checked-in code as it arrives. I don’t want to see that merge, when you check in a file (being reformatted on the SCM server), decide to make some big changes to the code and then check in again. If programmers tend

But anyway, code conventions are another item that could be discussed in an endless manner. Those were just examples of programmer creativity to avoid being forced to do something with or about code metrics.

All that make me think about how useful those metrics really are. JDepend can give you really useful information about how loosely/closely coupled your code is. It has a steep learning curve and you’ve gotta learn to interpret the results correctly and draw your conclusions appropriately. But if you think again, that’s what you’ll have to do with every single one of these tools. With JDepend it just becomes more apparent that the results are not that easy to interpret. Having a list of formatting errors or bugs in front of you is tempting for a novice to work down that list until everyone is happy and cheers because Checkstyle doesn’t give any errors at all.

The same goes for Clover. While it’s a great tool for developers and managers alike to see the quality of the code ensured in some way (unit tests in this way), it’s hard to judge just how much code coverage is acceptable. Using test-driven development will ensure in its very own ways that your code will have a good coverage. But again, it’s a judgement call, if it’s necessary for every single method being tested or if you’re better off testing as much state as necessary to ensure a correctly working class or code base. I definitely prefer the latter, but I’ve seen programmers bump up their tests to see the coverage go as high as possible. It might sound stupid, but I’m sure there are even some out there that bump up the logging level just to have these kinds of statements included in the coverage results:

if (logger.isDebugEnabled()) logger.debug("logging for the fun of it");

So the question is, what are those tools good for, if not for the programmer feel good about himself? Do they bring any value to the progress of a project? You gotta be very careful at choosing the right tools. You gotta be careful at configuring them. And you gotta teach your fellow programmers to interpret those results correctly. Being tempted to fix the errors in your code is always good karma, but spending too much time with fixing doesn’t help a project to succeed.

If you really take the time to learn, how those tools can help you, they’ll add some value to the project, your code, the way you develop. But you gotta keep it small. Having more and more of these tools run over your code to give you their opinion on your code will just make you panic and be an even bigger waste of time. I prefer having Clover and JDepend on my side, both in appropriate and not-time-wasting doses. And a little bit of Checkstyle as well, but not with every available checker turned on. Tools like FindBugs can be a nice way to check your code every now and then, but using them on a daily basis is overkill in my opinion.

Different stories or opinions, anyone?

And now, for the finishing touch: a Technorati link to claim this weblog ;)