[Home]

ProgrammingStyle

I was trying to explain programming to my wife whose background is in law, history and writing. In the process of doing this I brought some issues around programming style into focus in my own mind. I was using the analogy of writing a legal brief, but, as I explained this, I realized the analogy fell apart. Most writing for humans has a single audience in mind, so her legal brief must be clear and understandable to the judge and other lawyers, but not to me.

Here's the problem: Programming has two audiences. First is the compiler/interpreter, and this is the audience that most programming courses focus on (sadly). The second audience is humans, those who need to debug and maintain your code, those who need to review your code, and even your future self. I have had the displeasure of looking at a piece of code I wrote 15 years ago, and being simultaneously confused and appalled. The latter audience is obviously the most important, but the most frequently forgotten.

During my career I have seen many programming style guides. Most of them get tied up in minutiae of where to place braces, whether to use tabs or spaces, or, worse yet, spend pages explaining an arcane naming standard. But they all missed the key issue: Clarity. And while well-formatted code is the first step towards clarity, I think such issues should be instinctual to any competent programmer (and these days Emacs and most IDEs will take care of such mundane details for you).

The real components of clarity are usually missed.

Cosmetic Issues

These items could each induce their own flamewar, but here goes:

Variable names

The obvious directive is: make variable names meaningful to others reading the code.

I have seen arcane rules for naming variables, like "hungarian notation". But I think this sort of things is pointless for the compiler, and often hinders "clarity". For example, I found this variable name in a co-worker's code:

 my $haHLPaths=shift;    # list of filespecs to search
 ...
 foreach my $sHLFind(@$haHLPaths){ ...

What the hell is that? The perl sigils indicate the type, a list reference, and, therefore the comment is wrong. I would argue that this would be much clearer:

 my $searchpaths = shift;  # reference to a list of pathnames to search
 ...
 foreach my $p (@$searchpaths) { ...

Yes, I used a single character variable. It is short lived and the letter is meaningful (p => path). Having a long variable name in such a place could actually hinder clarity.

Variable Scope

Here are my directives about variables:

Some elaborations of the latter point:

 if (my ($suffix) = /\.(.*?)$/)  { my $lang = $suffix2lang->{$suffix}; ... }

 foreach my $vob (@voblist) { ... do something with $vob ... }

Cascading ifs

Consider this fragment of code:

 if (not thing1) {
     warn "thing1 is wrong!";
 } else {
     if (not thing2) {
         warn "thing2 is wrong";
     } else {
         if (not thing3) {
             warn "thing3 is wrong";
         } else {
             # ... do some real work here ...
         }
     }
 }

Compare that with this:

 if (not thing1) {
     warn "thing1 is wrong!";
     return;
 }
 if (not thing2) {
     warn "thing2 is wrong";
     return;
 }
 if (not thing3) {
     warn "thing3 is wrong";
     return;
 }
 # ... do some real work here ...

I'm sure some purists would object to the latter style, but I think it is clearer for two reasons. First, the "real work" code is indented way off to the side, forcing people to widen their screens, and secondly, it is not clear what happens after an error, there could be more code down below. If the if clauses are sizeable, you would have to scan down (while carefully keeping track of indentation) to verify that other things are not happening after the error message is issued.

Of course, in Perl you can use an "elsif" chain, but, I would argue that using constructs like "next", "last", and "return" saves the effort of figuring out what happens after the "elsif" chain.

Error Messages

Error messages must be meaningful to people running the program. They must include: what went wrong, what file/resource was being manipulated, why that file/resource was being manipulated and some marker that clearly indicates this is an error. Here is a sample sequence of an error message that is gradually being improved. Consider which one would be most helpful to you.
  1. "not able to open file"
  2. "not able to open file: Permission Denied"
  3. "not able to write file: Permission Denied"
  4. "not able to write foo.html: Permission Denied"
  5. "not able to write /tmp/foo.html: Permission Denied"
  6. "not able to write report output to /tmp/foo.html: Permission Denied"
  7. "Error: not able to write report output to /tmp/foo.html: Permission Denied"

Pipelining

Cansider the following code fragment:

 @files = qx(cleartool find -all -print);
 foreach my $e (@files) { my @out = qx(cleartool desc $e); ... }

There is nothing wrong with this code given that you are working in a small vob. But, let's imagine you are running that line of code in /vobs/vob_test. That vob has nearly 400,000 elements, so assuming that each pathname is about 60 characters (which is a low estimate for that vob), that means your array will take up a minimum of 24 megs!

Furthermore, assuming find can return elements at a rate of 30 per second (I found this number using the following command:

 ct find /vobs/vob_test -all -print | /vobs/cm/utils/sys/flow.pl > /dev/null
This means that find command will run for 3.7 hours! During that time, your script does not issue any status updates (which can make people think that nothing is happening or that it has crashed), and, more importantly, your script cannot do any other useful work.

So, for each element name returned by "find" we need to run "desc", and let's assume that "desc" can do about 15 elements a second

 ct find /vobs/vob_test -all -print | while read i; do echo desc -fmt '%n\n' \"$i\"; done | cleartool | /vobs/cm/utils/sys/flow.pl > /dev/null
If we run the "find" and "desc" sequentially, we are talking about 11 hours (3.7+7.4), but if we can do the "desc" commands in parallel, as soon as we get an element name from find the entire process should take about 7.4 hours (max(3.7, 7.4)). Note, that the find runs twice as fast as the "desc" commands, this means that if we were real clever, we could run the "desc" commands two at a time, which could cut the time in half again (however, that requires more cleverness than I have at the moment).

So, a better way to write that code is like so:

 open(F, "cleartool find -all -print |") || die "Error: ct find failed: $!\n";
 while(<F>) { my @out = qx(cleartool desc $e); ... }
 close(F);



VeganMilitia | RecentChanges | What links here | Preferences | Upload | Page List
This page is read-only | View other revisions
Last edited April 15, 2008 5:15 pm by inet-nc01-o.oracle.com (diff)
Search: SearchHelp
Powered by Laughing Squid