Monday, August 4, 2008

Perl and is_valid_aref()

This post is about Perl, the politics of interacting with fellow developers, and Duck Typing (although we didn't really know it at the time). It's also fairly stream-of-consciousness.

I used to work at a Perl shop, and in a big rewrite we did, we made heavy use of array references. A dynamic lanugage like Perl (or Ruby, or whatever) will encourage conscientious developers to have some tests and/or error checking, so we often wanted to ensure that our input parameters to some specific subroutines that expected array refs were actually valid arefs. How should one do this checking in Perl?

Here's one obvious approach.

sub is_valid_aref_mere_mortals {
ref shift eq 'ARRAY';

Here we have a straightforward subroutine. We use Perl's shift to pull the first parameter off the stack, get the type to which it refers with ref, and check with the stringy equality operator whether or not the referred type is 'ARRAY'.

This wasn't good enough for an ex co-worker of mine, who fancied himself a Perl Wizard (with some justification, in fairness to him).

Some idiot could bless a non-ARRAY behaving scalar as a ref to an ARRAY. My solution in that case is to berate the offender in public for having done something so horrible. Such a blessing is a direct violation of the duck typing idea that you don't have a datum pretend to be a specific type (however your language defines type) unless it can implement all the pertinent behavior expected of that type in that context. With great power comes great responsibility.

We also had no instances of such blessing in our app. All of our args were all either just simple arefs, or errors (usually an undef value). So in practice, I think a simple ref check would work fine. However, if you're concerned about the blessed as aref issue, the solution below has some advantages.

sub is_valid_aref {
my( $arg ) = @_;
$arg and eval { @{ $arg } or 1 };

It's a bit tricky, so I'll explain it. You extract the first parameter in the stack as $arg. The and keyword assures that $arg is truthy before we proceed to the eval. The eval attempts to dereference $arg into a regular (non-reference) array. Rubyists can think of this use of eval in Perl as akin to a begin rescue end block.

If the dereferencing results in an error, it will stop evaluation of the eval block, resulting in an undef value, meaning that the expression returned by the function is false.

Why is there an or 1 at the end of the eval block? That's for situation in which your $arg is an empty array ref. An array of zero length evaluated in truthy scalar context in Perl is false. However, we want such a data structure to produce a truthy return value from this function. Therefore, when a valid but empty array ref is dereferenced into an actual array, it is false but does not break out of the eval block. The expression with the eval block then continues with the or 1, which ends up being truthy, ensuring that a valid empty array ref is considered valid by this function.

I lie when I say this was my ex co-worker's solution. He strongly objected to putting this test inside a function, because developers should just know Perl idioms, and the function call would add too much overhead. I thought instead that the value of naming this obscure bit of code with what its purpose is would be worthwhile. He disagreed. So he would have tests like the line of code below copied and pasted with no explanatory comments wherever we needed to check aref validity:

fail if not $arg and eval { @{ $arg } or 1 };

I saw that for the first time and thought WTF? I then added a comment explaining what that particular monstrosity was supposed to do, something like 'checks if a valid aref'. He objected strongly (again), saying that coders should just know Perl idioms.

We finally got him to accept the presence of the comment after about 30 minutes of arguing.

Try them out for yourself, if you like. I use $proc as a generic name for a procedure/function/subroutine/code reference. Some other people prefer $cref for code ref or $sref for subroutine ref.

my( %proc_of ) = (
'mere mortals' => \&is_valid_aref_mere_mortals,
'eval version' => \&is_valid_aref

sub report {
my( $label, $candidate, $proc_of_href ) = @_;
for $proc_name ( keys %{ $proc_of_href } ) {
my( $proc ) = $proc_of_href->{$proc_name};
print $proc->( $candidate )
? "$label is a valid aref according to $proc_name\n"
: "$label is not a valid aref according to $proc_name\n";

report( '[]', [], \%proc_of );
report( '()', (), \%proc_of );
report( '{}', {}, \%proc_of );
report( '(undef)', (undef), \%proc_of );

Results in this output. Note that I haven't included the deceptive blessing situation, on which the two subs would differ:

[] is a valid aref according to mere mortals
[] is a valid aref according to eval version
{} is not a valid aref according to mere mortals
{} is not a valid aref according to eval version
(undef) is not a valid aref according to mere mortals
(undef) is not a valid aref according to eval version

It seems to me that the clarity improvement from having the name is_valid_aref for that code is worth the relatively small overhead of the function call. I also think that it serves a pedagogical use - a new developer who isn't familiar with the blessing concern could in fact be educated about it by seeing this code under such an explanatory label. But I'd be interested in any other reasons to avoid putting this test inside a named subroutine. If I'm being stubborn for insufficient reason, I'd like to be illuminated and corrected.

No comments: