Perl Best Practices

Damian Conway

Mentioned 12

Presents guidelines on the art of coding with Perl, covering such topics as naming conventions, data and control structures, program decomposition, interface design, and error handling.

More on Amazon.com

Mentioned in questions and answers.

I'm planning to learn Perl 5 and as I have only used PHP until now, I wanted to know a bit about how the languages differ from each other.

As PHP started out as a set of "Perl hacks" it has obviously cloned some of Perls features.

  • What are the main differences in the syntax? Is it true that with Perl you have more options and ways to express something?

  • Why is Perl not used for dynamic websites very often anymore? What made PHP gain more popularity?

I've noticed that most PHP vs. Perl pages seem to be of the

PHP is better than Perl because <insert lame reason here>

ilk, and rarely make reasonable comparisons.

Syntax-wise, you will find PHP is often easier to understand than Perl, particularly when you have little experience. For example, trimming a string of leading and trailing whitespace in PHP is simply

$string = trim($string);

In Perl it is the somewhat more cryptic

$string =~ s/^\s+//;
$string =~ s/\s+$//;

(I believe this is slightly more efficient than a single line capture and replace, and also a little more understandable.) However, even though PHP is often more English-like, it sometimes still shows its roots as a wrapper for low level C, for example, strpbrk and strspn are probably rarely used, because most PHP dabblers write their own equivalent functions for anything too esoteric, rather than spending time exploring the manual. I also wonder about programmers for whom English is a second language, as everybody is on equal footing with things such as Perl, having to learn it from scratch.

I have already mentioned the manual. PHP has a fine online manual, and unfortunately it needs it. I still refer to it from time to time for things that should be simple, such as order of parameters or function naming convention. With Perl, you will probably find you are referring to the manual a lot as you get started and then one day you will have an a-ha moment and never need it again. Well, at least not until you're more advanced and realize that not only is there more than one way, there is probably a better way, somebody else has probably already done it that better way, and perhaps you should just visit CPAN.

Perl does have a lot more options and ways to express things. This is not necessarily a good thing, although it allows code to be more readable if used wisely and at least one of the ways you are likely to be familiar with. There are certain styles and idioms that you will find yourself falling into, and I can heartily recommend reading Perl Best Practices (sooner rather than later), along with Perl Cookbook, Second Edition to get up to speed on solving common problems.

I believe the reason Perl is used less often in shared hosting environments is that historically the perceived slowness of CGI and hosts' unwillingness to install mod_perl due to security and configuration issues has made PHP a more attractive option. The cycle then continued, more people learned to use PHP because more hosts offered it, and more hosts offered it because that's what people wanted to use. The speed differences and security issues are rendered moot by FastCGI these days, and in most cases PHP is run out of FastCGI as well, rather than leaving it in the core of the web server.

Whether or not this is the case or there are other reasons, PHP became popular and a myriad of applications have been written in it. For the majority of people who just want an entry-level website with a simple blog or photo gallery, PHP is all they need so that's what the hosts promote. There should be nothing stopping you from using Perl (or anything else you choose) if you want.

At an enterprise level, I doubt you would find too much PHP in production (and please, no-one point at Facebook as a counter-example, I said enterprise level).

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).

However, the code works so I'm loathe to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.

I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).

You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.

The spec was to successfully process a CSV file as follows and output the individual fields:

User ID,Name , Level,Numeric ID
pax, Pax Morgan ,admin,0
gt,"  Turner, George" rubbish,user,1
ms,"Mark \"X-Men\" Spencer","guest user",2
ab,, "user","3"

The output was to be something like this (the potential hire's code actually output this):

User ID,Name , Level,Numeric ID:
   [User ID]
   [Name]
   [Level]
   [Numeric ID]
pax, Pax Morgan ,admin,0:
   [pax]
   [Pax Morgan]
   [admin]
   [0]
gt,"  Turner, George  " rubbish,user,1:
   [gt]
   [  Turner, George  ]
   [user]
   [1]
ms,"Mark \"X-Men\" Spencer","guest user",2:
   [ms]
   [Mark "X-Men" Spencer]
   [guest user]
   [2]
ab,, "user","3":
   [ab]
   []
   [user]
   [3]

Here is the code they submitted:

#!/usr/bin/perl

# Open file.

open (IN, "qq.in") || die "Cannot open qq.in";

# Process every line.

while (<IN>) {
    chomp;
    $line = $_;
    print "$line:\n";

    # Process every field in line.

    while ($line ne "") {
        # Skip spaces and start with empty field.

        if (substr ($line,0,1) eq " ") {
            $line = substr ($line,1);
            next;
        }

        $field = "";
        $minlen = 0;

        # Detect quoted field or otherwise.

        if (substr ($line,0,1) eq "\"") {
            $line = substr ($line,1);
            $pastquote = 0;
            while ($line ne "") {
                # Special handling for quotes (\\ and \").

                if (length ($line) >= 2) {
                    if (substr ($line,0,2) eq "\\\"") {
                        $field = $field . "\"";
                        $line = substr ($line,2);
                        next;
                    }
                    if (substr ($line,0,2) eq "\\\\") {
                        $field = $field . "\\";
                        $line = substr ($line,2);
                        next;
                    }
                }

                # Detect closing quote.

                if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) {
                    $pastquote = 1;
                    $line = substr ($line,1);
                    $minlen = length ($field);
                    next;
                }

                # Only worry about comma if past closing quote.

                if (($pastquote == 1) && (substr ($line,0,1) eq ",")) {
                    $line = substr ($line,1);
                    last;
                }
                $field = $field . substr ($line,0,1);
                $line = substr ($line,1);
            }
        } else {
            while ($line ne "") {
                if (substr ($line,0,1) eq ",") {
                    $line = substr ($line,1);
                    last;
                }
                if ($pastquote == 0) {
                    $field = $field . substr ($line,0,1);
                }
                $line = substr ($line,1);
            }
        }

        # Strip trailing space.

        while ($field ne "") {
            if (length ($field) == $minlen) {
                last;
            }
            if (substr ($field,length ($field)-1,1) eq " ") {
                $field = substr ($field,0, length ($field)-1);
                next;
            }
            last;
        }

        print "   [$field]\n";
    }
}
close (IN);

I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices for them and you will be set. After best practices a copy of Intermediate Perl and they could work out.

Any suggestion how I can document my Perl code? What do you use and what tools are available to help me?

Which module do you use to convert pod to html?

You might also want to check out Perl Best Practices by Damian Conway. I used some of the tips to clean up a small Perl code base I inherited.

I've been writing Perl for several years now and it is my preferred language for text processing (many of the genetics/genomics problems I work on are easily reduced to text processing problems). Perl as a language can be very forgiving, and it's possible to write very poor, but functional, code in Perl. Just the other day, my friend said he calls Perl a write-only language: write it once, understand it once, and never ever try to go back and fix it after it's finished.

While I have definitely been guilty of writing bad scripts at times, I feel like I have also written some very clear and maintainable code in Perl. However, if someone asked me what makes the code clear and maintainable, I wouldn't be able to give a confident answer.

What makes Perl code maintainable? Or maybe a better question is what makes Perl code hard to maintain? Let's assume I'm not the only one that will be maintaining the code, and that the other contributors, like me, are not professional Perl programmers but scientists with programming experience.

I don't use all of Perl Best Practices, but that's the thing that Damian wrote it for. Whether or not I use all the suggestions, they are all worth at least considering.

I have not done Perl for about 8 years and now I'm going into project that's heavily utilizing object-oriented Perl so I need to resharpen my Perl skills and do it quickly. During these past years I mainly did all sorts of Java development and some PHP. I'm very good at OO and I'm not a novice programmer by any remote extent.

So here comes the question: what are the best resources/sites/practices/ways/books you guys can recommend to pick up on my rusty Perl skills and learn Perl "the new way"? Your suggestions will be much appreciated.

P.S. I did researched some previous answers. I want to emphasize that I'm not looking for novice book/resource (syntax, core principals, etc) but specifically ones that cover OOP capabilities that weren't there when I was programming in Perl (or I may overlooked it at that time)

P.P.S. Thanks to everyone for their suggestions and tips. After some consideration I went with @MBO answer since it's first mentioning Moose which I really like so far.

Well, first of all Higher-Order Perl is really good, but it's about functional programming, not objects.

Perl Best Practices is an excellent book, but it has limitations, and one of them is that Conway recommends using his own Class::Std module to do inside-out objects, and the general consensus seems to be (1) that if you're going to do inside-out objects, Object::InsideOut and Class::InsideOut are better ways to do it (2) and anyway, using "Moose" based objects is a better way to go.

This illustrates what is probably the major difference between the Java and Perl world: There's rarely one standard way of doing anything with Perl. Starting as someone who feels comfortable with objects in another language, I would guess that the most interesting thing about Conway's now slightly dated Object Oriented Perl is watching him gradually develop different ways of adding OOP features you've been taking for granted.

I like the basic blessed-hash style of Perl OOP myself, but you need to understand that it's encapsulation is really weak, and that while method-inheritance works, there's typically no data-inheritance. There are also some rather perlish tricks in wide use, such as automatically generating accessors using an AUTOLOAD routine.

As for what you can read on the subject, don't neglect the on-line documentation that comes with Perl: perldoc. Note the "Tutorials" section at the top. If you're rusty on Perl's references and data structures, read the first two: perldoc perlreftut and perldoc perldsc. A little down the list, you'll see multiple OOP tutorials. These are largely about simple href-based objects, though there are some serious oddities in there, such as Tom Christiansen's scheme for closure-based objects down at the bottom of: perldoc perltoot

If you're interested in some of the newer ways people do things, you might want to start with Moose which is supposed to be the closest you can get to perl6 objects while still writing perl5 code. By the way: ignore the word "postmodern" there, it's a silly joke that doesn't make any sense.

If you're interested in inside-out objects (which have bullet proof encapsulation, but are perhaps a little annoying to debug-- you can't just use Data::Dumper on the object to get it's status), I'd suggest starting with this perl5 wiki page.

I have this code

foreach my $key (keys %ad_grp) {

    # Do something
}

which works.

How would the same look like, if I don't have %ad_grp, but a reference, $ad_grp_ref, to the hash?

As others have stated, you have to dereference the reference. The keys function requires that its argument starts with a %:

My preference:

foreach my $key (keys %{$ad_grp_ref}) {

According to Conway:

foreach my $key (keys %{ $ad_grp_ref }) {

Guess who you should listen to...

You might want to read through the Perl Reference Documentation.

If you find yourself doing a lot of stuff with references to hashes and hashes of lists and lists of hashes, you might want to start thinking about using Object Oriented Perl. There's a lot of nice little tutorials in the Perl documentation.

Perl has a conditional operator that is the same a C's conditional operator.

To refresh, the conditional operator in C and in Perl is:

(test) ? (if test was true) : (if test was false)

and if used with an lvalue you can assign and test with one action:

my $x=  $n==0 ? "n is 0" : "n is not 0";

I was reading Igor Ostrovsky's blog on A neat way to express multi-clause if statements in C-based languages and realized this is indeed a "neat way" in Perl as well.

For example: (edit: used Jonathan Leffler's more readable form...)

# ternary conditional form of if / elsif construct:
my $s=
      $n == 0     ? "$n ain't squawt"
    : $n == 1     ? "$n is not a lot"
    : $n < 100    ? "$n is more than 1..."
    : $n < 1000   ? "$n is in triple digits"
    :               "Wow! $n is thousands!" ;  #default

Which reads a LOT easier than what many would write in Perl: (edit: used cjm's more elegant my $t=do{ if }; form in rafi's answer)

# Perl form, not using Switch or given / when
my $t = do {
    if    ($n == 0)   { "$n ain't squawt"        }
    elsif ($n == 1)   { "$n is not a lot"        }
    elsif ($n < 100)  { "$n is more than 1..."   }
    elsif ($n < 1000) { "$n is in triple digits" }
    else              {  "Wow! $n is thousands!" }
};

Are there any gotchas or downside here? Why would I not write an extended conditional form in this manner rather than use if(something) { this } elsif(something) { that }?

The conditional operator has right associativity and low precedence. So:

a ? b : c ? d : e ? f : g

is interpreted as:

a ? b : (c ? d : (e ? f : g))

I suppose you might need parenthesis if your tests used one of the few operator of lower precedence than ?:. You could also put blocks in the form with braces I think.

I do know about the deprecated use Switch or about Perl 5.10's given/when constructs, and I am not looking for a suggestion to use those.

These are my questions:

  • Have you seen this syntax used in Perl?** I have not, and it is not in perlop or perlsyn as an alternate to switch.

  • Are there potential syntax problems or 'gotchas' with using a conditional / ternary operator in this way?

  • Opinion: Is it more readable / understandable to you? Is it consistent with Idiomatic Perl?

-------- Edit --

I accepted Jonathan Leffler's answer because he pointed me to Perl Best Practices. The relevant section is 6.17 on Tabular Ternaries. This allowed me to investigate the use further. (If you Google Perl Tabular Ternaries, you can see other comments.)

Conway's two examples are:

my $salute;
if ($name eq $EMPTY_STR) {
    $salute = 'Dear Customer';
}
elsif ($name =~ m/\A ((?:Sir|Dame) \s+ \S+)/xms) {
    $salute = "Dear $1";
}

elsif ($name =~ m/([^\n]*), \s+ Ph[.]?D \z/xms) {
    $sa1ute = "Dear Dr $1";
}
else {
    $salute = "Dear $name";
}

VS:

           # Name format...                            # Salutation...
my $salute = $name eq $EMPTY_STR                       ? 'Dear Customer'
           : $name =~ m/ \A((?:Sir|Dame) \s+ \S+) /xms ? "Dear $1"
           : $name =~ m/ (.*), \s+ Ph[.]?D \z     /xms ? "Dear Dr $1"
           :                                             "Dear $name"
           ;

My conclusions are:

  • Conway's ?: example is more readable and simpler to me than the if/elsif form, but I could see how the form could get hard to understand.

  • If you have Perl 5.13.1, use my $t=do { given { when } }; as an assignment as rafi has done. I think given/when is the best idiom now, unless the tabular ternary format is better for your particular case.

  • If you have Perl 5.10+ use given/when in general instead of Switch or if you need some sort of case type switch.

  • Older Perl's, this is a fine form for simple alternatives or as an alternate to a case statement. It is better than using Switch I think.

  • The right to left associativity means the form is evaluated bottom to top. Remember that when using...

The layout shown for the conditional operator is hard to read. This is more like what I recall Perl Best Practices recommending:

my $s = $n == 0   ? "$n ain't squawt"
      : $n == 1   ? "$n is not a lot"
      : $n < 100  ? "$n is more than 1..."
      : $n < 1000 ? "$n is in triple digits"
      :             "Wow! $n is thousands!";  # default...

And there are times when it is better to use a more compact notation with the if notation, too:

  if    ($n == 0)   { $t = "$n ain't squawt";        }
  elsif ($n == 1)   { $t = "$n is not a lot";        }
  elsif ($n < 100)  { $t = "$n is more than 1...";   }
  elsif ($n < 1000) { $t = "$n is in triple digits"; }
  else              { $t = "Wow! $n is thousands!" ; }  

Both these reformattings emphasize the similarity of the various sections of the code, making it easier to read and understand.

I'm using Damian Conway's "inside-out" objects as described is his wonderful book Perl Best Practices to construct an object-oriented interface to a security system at my client. I'm coming across the need to use internal helper methods within my module that I would normally designate as "_some_method". However, this seems to break encapsulation since they can be called directly via the package name. Is there any way of making these methods truly private? As an example,

use SOD::MyOOInterface;

my $instance1 = SOD::MyOOInterface->new();
$instance1->_some_method;  #this produces an error: 
SOD::MyOOInterface::_some_method;   # this results in a 
                                    # successful method call 

Obviously I don't want the direct call of _some_method to succeed. Is there any way of guaranteeing this?

Don't use the PBP for object practices. It is very old. In fact, now the best practices regarding Perl and objects can be found in Moose, an almost must-have for Perl.

In short, the way Perl blurs namespaces and classes most methods can be called statically on the class. This is not a bad thing, just don't document it. There is really no reason to want to seal the methods into the instance. Not having private methods is kind of annoying but the convention of not relying on undocumented methods is so strong it has sufficed for our community.

A trait is effectively a role (doesn't permit instantiation) that can be compiled into an object at runtime. This will further obscure the origin of the methods from your typical user (because they won't be in the original class), but it comes at a runtime cost. See MooseX::Traits for more information on traits.

The prepending underscore is a great convention to further state the method is private to peering eyes.

As a last note if you really want to push this issue, you might be able to create an anonymous class with those methods using Class::MOP::Class->create_anon_class()

I have been delivering training on Programming Practices and on Writing Quality Code to participants who have been working on Java since sometime. Object Oriented Analysis and Design is the base and I cover S.O.L.I.D. Principles and excerpts from books like Clean Code, Code Complete 2 and so on.

I am scheduled to deliver training to Perl Programmers(with less than 1 yr. exp. in Perl) in two days and they do not use the Moose(an extension of the Perl 5 object system which brings modern object-oriented language features).

I am now confused as to how to structure my training as they don't follow OOPs.

Any suggestions?

Regards, Shardul.

Take a good look at 'Perl Best Practices' by Damian Conway. It has lots of solid material in it, and you won't go far wrong taking his advice.

Be aware, though, that Getopt::Clade is only available as a placeholder package - it is vapourware, in other words.

Currently, I am using following code to convert an irregular multidimensional array into one dimensional array.

my $array = [0, 
        [1],
        2,
        [3, 4, 5],
        [6, 
            [7, 8, 9 ],
        ],
        [10],
        11,
        ];

my @mylist;
getList($array);

print Dumper (\@mylist);


sub getList

{

        my $array = shift;

        return if (!defined $array);
        if (ref $array eq "ARRAY")
        {
               foreach my $i (@$array)
               {
                   getList($i);
               }
        }
        else
        {
               print "pushing $array\n";
               push (@mylist, $array);
        }
}

This is based on recursion where I am checking each element. If element is a reference to an array then calling it recursively with new array.

Is there a better way to solve this kind of problem?

First of all your function should never return data by modifying a global variable. Return a list instead.

As for efficiency, Perl has a surprisingly large function call overhead. Therefore for large data structures I would prefer a non-recursive approach. Like so:

use Data::Dumper;
my $array = [
  0, 
  [1],
  2,
  [3, 4, 5],
  [6, [7, 8, 9 ]],
  [10],
  11,
];

my @mylist = get_list($array);

print Dumper (\@mylist);

sub get_list {
    my @work = @_;
    my @result;
    while (@work) {
        my $next = shift @work;
        if (ref($next) eq 'ARRAY') {
            unshift @work, @$next;
        }
        else {
            push @result, $next;
        }
    }
    return @result;
}

Note that the formatting that I am using here matches the recommendations of perlstyle. We all know the futility of arguing the One True Brace Style. But at the least I'm going to suggest that you reduce your 8 space indent. There is research into this, and code comprehension has been shown to be improved with indents in the 2-4 space range. Read Code Complete for details. It doesn't matter where you are in that range for young people, but older programmers whose eyesight is going find 4 a better indent. Read Perl Best Practices for more on that.

I have read some articles online on the naming convention for Perl style which suggest using lowercase letters and separating words by underscore for functions or methods names. Some others use the first word in lowercase then capitalize the other words. Of course Windows .NET etc Capitalize every word and no underscore.

I have some packages methods many words like entriesoncurrentpage, if I follow Perl style suggested I should do it like:

sub entries_on_current_page {...}

this added four underscore letters to the method name, the other style is:

sub entriesOnCurrentPage {...}

and Windows style should be:

sub EntriesOnCurrentPage {...}

PHP sometimes uses all lowercase with underscore like mysql_real_escape_string() and sometimes uses all lowercase without underscore like htmlspecialchars, of course PHP function names are not case sensitive so this feature is not supported in Perl.

So the question is, for the long name with many words what is the best style to use for Perl coding.

Originally, most Perl developers used camel casing with the first letter lowercased. This is the standard with most programming languages. Names with first letter capitalized were used for classes and methods.

Later on, Damian Conway's book Perl Best Practices suggested using underscores rather than camel casing. Damian argued that it increased readability, and was not that much harder to type.

Damian Conway's suggestion on names became the standard because 1). He was correct. It's much more legible and isn't that much harder to type, and most importantly 2). It was incorporated into Perltidy. Perltidy is a program that helps standardize your code according to Damian's suggestions. Perltidy is much like CheckStyle in Java.

Are these arbitrary standards? Yes, all standards are somewhat arbitrary in nature. You have a few candidate suggestions for rules, and you must make a decision:

  • Should the curly brace in while loops and if statements be appended on the end of the line, or go under the while or if statement. In standerd C style, curly braces are cuddled. In Java, they're not suppose to be according to CheckStyle. In Kornshell, the then goes under the if. In Bash, the standard is now that the then goes on the same line even though the Bash interpreter doesn't really like it. (You have to add a semicolon before the then because it's considered a separate command.
  • How should variable names be done. In most languages, CamelCase rules. In .NET, you even capitalize the first character, but in Perl, we use underscores.
  • Should constants be all uppercase? Most languages have agreed with that. However, in shell script, you usually reserve all uppercase variable names for special environment variables such as $PWD, $PATH, etc. Therefore, in Bash and Kornshell scripts, constant variables are all camelCased like regular variables.

The idea is to follow the standard for that language. Why? Because the standard says so. Because you can't argue with The Standard as you can with your fellow programmers whether or not curly braces are cuddled or not. The main this is to realize that most standards may be somewhat arbitrary, but they don't really affect the way you program. By everyone following a standard, you make it easier to understand other people's code.

I have the following data I wish to access:

$data1=   
{'Family' => {
    'House' => [
        {
            'Id' => '1111',
            'Name' => 'DFG'
        },
        {
            'Id' => '211',
            'Name' => 'ABC'
        }
               ]
            }
}

I want to access the each Name field value. I am using this code:

foreach(keys%$data1) {

    if(ref($data1->{$_}) eq 'HASH') {

        foreach my $inner_key (keys%{$data1->{$_}})    {

            print "Key:$inner_key and value:$data1->{$_}->{$inner_key}\n";   
        }
    } 
    else {

            print "Key: $_ and Value: $data1->{$_}\n"
    } 
}

It prints Key:House and value:ARRAY(OXXX).

I know I am doing something wrong. Since the data in 'House' is an array of hashes, I even tried accessing through $data1->{$_}->{$inner_key}[0]. What is wrong in the code???

[Edit I typed too slowly while answering, so this response bascially duplicates @mpapec's below - I will leave the references here and you can vote me up for those ;-) but do not accept my response as the answer].


Try something like the following to see if it works:

for $inner_hash (@{ $data1->{Family}{House} }) { 
   say "Name: $inner_hash->{Name}" 
   }

since you need to get the inner hashes' values from inside the elements of the array (that is what value:ARRAY(OXXX) is telling you).

You can use perldoc to look at the perldata, perlref, perlreftut and perldsc PODs to learn more about data structures and dereferencing. If keeping your data structure in mind while you are writing code gets to be too hard to do, it may mean you need to simplify things: either the data itself or by writing sub's to make it easier to access, or making use some of the excellent utility modules from CPAN.

There's also some good perl data structure related tutorials out there. The POD/perldoc documentation that ships with perl (along with Chapter 9 of Programming Perl) is the canonical reference, but you might browse these nodes from perlmonks:

NB Above I'm using the perlcritic and Perl Best Practices style of dereferencing: e.g.: @{ $data1->{Family}{House} } so the syntax reminds me that the inner hashes (or inner-inner?) are inside an array. There's a cool new way of dereferencing being introduced in perl 5.20 called postfix dereferencing which will be great, but you can't go wrong following the recommendations of PBP.

"Until you start thinking in terms of hashes, you aren't really thinking in Perl." -- Larry Wall

Cheers,