The Terror of the Long Comment

A code comment can be a wonderful thing. It can offer a gem of context around a quirky bit of code that will make the reader’s life easier for years to come:

// Fields c and d are intentionally switched in version 3 of the
// protocol; see RFC 2324 S 12.
fields.push_back( a );
fields.push_back( b );
fields.push_back( d );
fields.push_back( c );

If I were reading the above code snippet without the comment, I’d probably be very skeptical about the order in which things were pushed into the fields container. The comment cures this skepticism and justifies the odd-looking but entirely correct code.

I find that most useful comments are about the length of the one above, and they usually have the same purpose: to explain something that would otherwise give the reader some pause. The other necessary condition for a useful comment is that there isn’t any obvious way to rewrite the code so that no comment is required for it to be easily understood. In the case of the above code snippet, it could perhaps be rewritten as follows:

void reorder_fields_for_version_3( container& fields )
{
    assert( fields.size() >= 4 );
    std::swap( fields[2], fields[3] ); // See RFC 2324 S 12.
}

fields.push_back( a );
fields.push_back( b );
fields.push_back( c );
fields.push_back( d );
reorder_fields_for_version_3( fields );

I think that each of the above code snippets could be appropriate, depending on its context. If the field-switching only occurs in one place, the first approach would be the simplest. Otherwise, if the field-switching happens in several places, the second approach would probably be best. At any rate, if I ran across either of the solutions in some code I was reading, I would understand what was going on.

This brings me to the topic of this post, which is that while comments can be extremely useful, they can also be one of the most terrifying implements of the confused programmer — striking pain and suffering directly into the soul of the reader. Comments of this type can twist the reader’s contented mood into one of abject pessimism:

// When using this function, ensure that you obtain the
// base_class_lock_ before you obtain the send_queue_lock_,
// otherwise when the base class' munge() function is called,
// it will update the target map prematurely -- and if that
// happens, the next asynchronous call to pop_queue() will skip
// over some of the records in the queue.  Normally this would be
// acceptable, but other_class_x depends on the records coming
// out of the queue in the order they were put in (unless
// WEIRD_MACRO_Z is defined).
queue_result enqueue( const record& r )
{
    /* ... */
}

If you’re anything like me, you will anticipate the pain of that comment before you start reading it, simply because of its length. I almost never see six-line comments that do not hurt my brain. The only occasion on which such a lengthy comment is really justified is when it describes some complicating factor that is external to the code itself. For example, a comment in a piece of client software might need to be lengthy to describe some bizarre behavior in the server it is meant to talk to. That doesn’t seem to be avoidable. However, if the code that one is currently writing requires a six-line comment to explain itself, that should be seen as a huge red flag. Such a long comment should tell the author to step back and look at their design, to figure out how to simplify it so that no such comments are necessary. For instance, the author might read that comment and ask themselves the following questions:

// When using this function, ensure that you obtain the
// base_class_lock_ before you obtain the send_queue_lock_,

Do there need to be two locks? If so, could the send_queue_lock_ be integrated directly into the send queue? Could the locking be factored out into a separate function so that it always happens in the right order?

// otherwise when the base class' munge() function is called,
// it will update the target map prematurely -- and if that
// happens, the next asynchronous call to pop_queue() will skip
// over some of the records in the queue.

Can the munge() function be changed so that it always updates the target map correctly? Maybe it should be responsible for obtaining its own locks. Why should pop_queue() ever be allowed to skip records? Can it be changed so that it doesn’t?

// Normally this would be
// acceptable, but other_class_x depends on the records coming
// out of the queue in the order they were put in (unless
// WEIRD_MACRO_Z is defined).

The user of this function shouldn’t have to know about what other_class_x does. And is WEIRD_MACRO_Z necessary? If it’s specific to other_class_x, why does the user of this function need to know about it?

By addressing a few of these questions, it might be possible for the author to shrink that big, complicated, painful, six-line comment down to a reasonable size. And by doing so, they might save some of the future maintainers of their code from developing severe drinking problems. More importantly, though, they will probably end up with less buggy code, because in my experience, the more complicated rules that I have to hold in my head while writing code, the more I screw up. The code that contains the most lengthy, most frequent comments is commonly the worst code.

Comments are disabled for this post