0

I have an existing project that requires license headers to be used at the beginning of every source file. The problem is that the license header is not static:

#+======================================================================
# \$HeadURL [filled in by svn] \$
# \$Id [filled in by svn] \$
#
# Project       : Project blah blah - only one line
#
# Description   : A longer description
#                 which may or may not span multiple lines
#
# Author(s)     : some author text
#                 but there may be a few more authors, too!
#
# Copyright (c) : 2010-YYYY Some company,
#                 and a few fixed lines of 
#                 address
#
# and a few more lines of fixed license code
# that does not change
#
#-======================================================================

I have an existing perl script that scans a list of files to determine file type (C, Java, bash, etc.) and does a rudimentary check to see if a license preamble exists.

If it does not, it can insert a blank license header which must be manually updated.

But I would like to know how I can:

  1. Detect an existing license with non-static information, and
  2. Extend the existing perl processFile($fileName, $type) function (below) to preserve the existing "Project", "Description" and "Author(s)" information?

I suspect that I may need to place markers in the license templates to indicate dynamic text, which should be preserved in the regenerated header..?

Can you please give me pointers on how to use perl regex or pattern matchers to grab the current variable information so that I can re-insert it into the header and update the year?

I can see that all the magic needs to happen in the "for ($i = 0; $i < 5; ++$i)" loop...

sub processFile {
    my $i;
    my $lineno = 0;
    my $filename = $_[0];
    my $type = $_[1];
    my @license = split(/\n/, $licenses{$type});
    my @contents;

    #print "$filename is a $type file\n";
    tie @contents, 'Tie::File', $filename or die $!;

    if ($prolog{$type}) {   # should not insert license at line 0
        my $len = scalar(@contents);
        while ($lineno < $len) {
            if ($contents[$lineno] =~ /^$prolog{$type}$/) {
                last;
            } else {
                $lineno++;
            }
        }
        if ($lineno >= $len) {
            # no prolog, so let's just insert it into the start
            $lineno = 0;
        } else {
            $lineno = $lineno + 1;
        }
    } else {
        $lineno = 0;
    }

    # Compare the first 5 lines excluding prolog with the license
    # header. If they match, the license header won't be inserted.
    for ($i = 0; $i < 5; ++$i) {
        my $line = $contents[$i + $lineno];
        $line =~ s/\$(\w+)\:.*\$/\$$1\$/;
        if ($line ne $license[$i]) {
            splice @contents, $lineno, 0, @license;
            push @processedFiles, $filename;
            last
        }
    }

    untie @contents;
}
KevinM
  • 567
  • 4
  • 21
  • 1
    What you have shown could be the header for bash or Perl. Java and C *could* share another. I think you would need to take a "language handler" approach, partitioned--*probably*--by file extension. – Axeman Jan 18 '13 at 13:01
  • A lot of preprocessing has already taken place. This preprocessing has determined the file type (e.g. bash or Java) and has selected the default license text to insert. So by the time "processFile" is called, I already know the file language and have a matching header (in the @licenses array). – KevinM Jan 18 '13 at 20:40

1 Answers1

1

In order to answer your question in more depth, I would have to see what more of the stock headers look like. As I wrote in my comment, the one you show here could be used for bash or Perl--possibly Python or Ruby (but you don't mention those).

As I indicated you might take a handler approach. You set an array of handlers and each handler tells the main processor portion whether or not it handles a file of that name.

package CommentFileHandler;

sub _unimpl { 
    local $Carp::CarpLevel = $Carp::CarpLevel + 1;
    Carp::croak( $_[0] . 'needs to implement ' . ( caller 1 )[3] );
}

sub accept_file     { &_unimpl; }
sub scan_for_header { &_unimpl; }
sub has_header      { &_unimpl; }
sub insert_header   { &_unimpl; }

package Perlish::CommentHandler;
use strict;
use warnings;
use parent -norequire 'CommentFileHandler';

sub accept_file { 
    my ( $self, $file_name, $h ) = @_;
    return 1 if $file_name =~ m/\.(?:cgi|(?:ba|[ck])sh|p[lmy]|rb)$/;
    # might need to scan for shbang.
    while ( <$h> ) { 
        if ( m/^#!/ ) { # we're locked in...
            return m/\b(python|perl|ruby|(?:ba|k)?sh) /;
        }
    }
    return;
}

sub scan_for_header {
    # might be as simple as this:
    my ( $self, $h ) = @_;
    my $text = <$h> . <$h>;
    return $text =~ m{
        \A        # Start of all text
        [#]       # literal hash
        [+]       # literal '+'
        ={70}     # 70 '='
        \s*?      # any number of spaces (wide accepter pattern) -- non-greedy
        $         # end of one line
        \s*?      # possible other control character
        ^         # start of the next
        [#]       # literal '#'
        \W+       # At least one "Non-word" word=[_0-9A-Za-z]
        HeadURL\b # The word 'HeadURL' 
    }msx; # x - means that we can expand it as above.
}

sub insert_header { 
    my ( $self, $h ) = @_;
    # expect $h to be rewound back to top of file: seek( $h, 0, 0 )
    print $h <<'END_COMMENT_HEADER';
#+======================================================================
# \$HeadURL [filled in by svn] \$
# \$Id [filled in by svn] \$
#
# Project       : Project blah blah - only one line
#
# Description   : A longer description
#                 which may or may not span multiple lines
#
# Author(s)     : some author text
#                 but there may be a few more authors, too!
#
# Copyright (c) : 2010-YYYY Some company,
#                 and a few fixed lines of 
#                 address
#
# and a few more lines of fixed license code
# that does not change
#
#-======================================================================
END_COMMENT_HEADER
    }

You would then keep track of every file that was modified so you would have a list to manually edit the files if modified.

You could load your handlers, like so:

my @file_handlers = qw<Perlish::CommentHandler CStyle::CommentHandler ...>;

use List::Util qw<first>;
File::Find::find( sub { 
        # sorry, I'm an "early return" guy.
        unless ( -r $File::Find::name and -w $File::Find::name ) { 
            say "File: $File::Find::name could not be processed.";
            return;
        }

        return unless my $handler 
            = first { $_->accept_file( $File::Find::name ) } @file_handlers
            ;

        # We *have* a handler from here on out.

        unless ( open( my $fh, '<', $File::Find::name )) {
            Carp::carp( "Could not open $File::Find::name!" );
            return;
        }

        # We *have* an open file from here on out...

        return unless $handler->scan_for_header( $File::Find::name, $fh );

        seek( $fh, 0, 0 ); # back to the start of the file.
        my $tmp = File::Temp->new( UNLINK => 1, SUFFIX => '.dat' );
        $handler->insert_header( $tmp );
        print $tmp, $_ while <$fh>;
        $tmp->seek( 0, 0 );
        $fh->close;

        unless ( open( $fh, '>', $File::Find::name )) {
            Carp::carp( "Could not open $File::Find::name to write out header!" );
            return; # You have to check on this
        }

        print $fh $_ while <$tmp>;
        $tmp->close;
        $fh->close;
        # printing out is an easy enough way to "record" our changes.
        say "File $File::Find::name was modified at " . localtime;

    } => @selected_roots 
    ); 
Axeman
  • 29,660
  • 2
  • 47
  • 102
  • Thanks, but what is not clear to me is how, once I know the file type and have selected the "default" license text, how do I scan the *file's* header (which has variable content: some files have one author, some have two, etc) and compare it to the "default" license to ensure that all sections exist? I have made some progress by searching for the "Project", "Description", etc, section headings. I can now determine if all the sections *exist*. Perhaps that is good enough for the moment - an expansion would be to check which are missing and fill them in with defaults, while copying the rest. – KevinM Jan 18 '13 at 20:44