The Terror of the Long Comment

A code com­ment can be a won­der­ful thing. It can offer a gem of con­text around a quirky bit of code that will make the read­er’s life eas­i­er for years to come:

// Fields c and d are intentionally switched in version 3 of the
// protocol; see RFC 2324 S 12.
fields.push_back( a );
fields.push_back( b );
fields.push_back( d );
fields.push_back( c );

If I were read­ing the above code snip­pet with­out the com­ment, I’d prob­a­bly be very skep­ti­cal about the order in which things were pushed into the fields con­tain­er. The com­ment cures this skep­ti­cism and jus­ti­fies the odd-look­ing but entire­ly cor­rect code.

I find that most use­ful com­ments are about the length of the one above, and they usu­al­ly have the same pur­pose: to explain some­thing that would oth­er­wise give the read­er some pause. The oth­er nec­es­sary con­di­tion for a use­ful com­ment is that there isn’t any obvi­ous way to rewrite the code so that no com­ment is required for it to be eas­i­ly under­stood. In the case of the above code snip­pet, it could per­haps be rewrit­ten as follows:

void reorder_fields_for_version_3( container& fields )
{
    assert( fields.size() >= 4 );
    std::swap( fields[2], fields[3] ); // See RFC 2324 S 12.
}

fields.push_back( a );
fields.push_back( b );
fields.push_back( c );
fields.push_back( d );
reorder_fields_for_version_3( fields );

I think that each of the above code snip­pets could be appro­pri­ate, depend­ing on its con­text. If the field-switch­ing only occurs in one place, the first approach would be the sim­plest. Otherwise, if the field-switch­ing hap­pens in sev­er­al places, the sec­ond approach would prob­a­bly be best. At any rate, if I ran across either of the solu­tions in some code I was read­ing, I would under­stand what was going on.

This brings me to the top­ic of this post, which is that while com­ments can be extreme­ly use­ful, they can also be one of the most ter­ri­fy­ing imple­ments of the con­fused pro­gram­mer — strik­ing pain and suf­fer­ing direct­ly into the soul of the read­er. Comments of this type can twist the read­er’s con­tent­ed mood into one of abject pessimism:

// When using this function, ensure that you obtain the
// base_class_lock_ before you obtain the send_queue_lock_,
// otherwise when the base class' munge() function is called,
// it will update the target map prematurely -- and if that
// happens, the next asynchronous call to pop_queue() will skip
// over some of the records in the queue.  Normally this would be
// acceptable, but other_class_x depends on the records coming
// out of the queue in the order they were put in (unless
// WEIRD_MACRO_Z is defined).
queue_result enqueue( const record& r )
{
    /* ... */
}

If you’re any­thing like me, you will antic­i­pate the pain of that com­ment before you start read­ing it, sim­ply because of its length. I almost nev­er see six-line com­ments that do not hurt my brain. The only occa­sion on which such a lengthy com­ment is real­ly jus­ti­fied is when it describes some com­pli­cat­ing fac­tor that is exter­nal to the code itself. For exam­ple, a com­ment in a piece of client soft­ware might need to be lengthy to describe some bizarre behav­ior in the serv­er it is meant to talk to. That does­n’t seem to be avoid­able. However, if the code that one is cur­rent­ly writ­ing requires a six-line com­ment to explain itself, that should be seen as a huge red flag. Such a long com­ment should tell the author to step back and look at their design, to fig­ure out how to sim­pli­fy it so that no such com­ments are nec­es­sary. For instance, the author might read that com­ment and ask them­selves the fol­low­ing questions:

// When using this function, ensure that you obtain the
// base_class_lock_ before you obtain the send_queue_lock_,

Do there need to be two locks? If so, could the send_queue_lock_ be inte­grat­ed direct­ly into the send queue? Could the lock­ing be fac­tored out into a sep­a­rate func­tion so that it always hap­pens in the right order?

// otherwise when the base class' munge() function is called,
// it will update the target map prematurely -- and if that
// happens, the next asynchronous call to pop_queue() will skip
// over some of the records in the queue.

Can the munge() func­tion be changed so that it always updates the tar­get map cor­rect­ly? Maybe it should be respon­si­ble for obtain­ing its own locks. Why should pop_queue() ever be allowed to skip records? Can it be changed so that it doesn’t?

// Normally this would be
// acceptable, but other_class_x depends on the records coming
// out of the queue in the order they were put in (unless
// WEIRD_MACRO_Z is defined).

The user of this func­tion should­n’t have to know about what other_class_x does. And is WEIRD_MACRO_Z nec­es­sary? If it’s spe­cif­ic to other_class_x, why does the user of this func­tion need to know about it?

By address­ing a few of these ques­tions, it might be pos­si­ble for the author to shrink that big, com­pli­cat­ed, painful, six-line com­ment down to a rea­son­able size. And by doing so, they might save some of the future main­tain­ers of their code from devel­op­ing severe drink­ing prob­lems. More impor­tant­ly, though, they will prob­a­bly end up with less bug­gy code, because in my expe­ri­ence, the more com­pli­cat­ed rules that I have to hold in my head while writ­ing code, the more I screw up. The code that con­tains the most lengthy, most fre­quent com­ments is com­mon­ly the worst code.

Comments are disabled for this post