6

I have grades.tsv file with three columns that show students' names, subjects and grades:

Liam    Mathematics 5
Liam    History 6
Liam    Geography   8
Liam    English 8
Aria    Mathematics 8
Aria    History 7
Aria    Geography   6
Isabella    Mathematics 9
Isabella    History 4
Isabella    Geography   7
Isabella    English 5
Isabella    Music   8

I wanted to calculate the average grade for each student and add it to a separate column. For this I used two filehandles DATA and OUT opening the same file:

use strict;
use warnings;

# Open file with grades for calculation of average grade for each student
open (DATA,"grades.tsv") or die "Cannot open file\n";

my %grade_sums;
my %num_of_subjects;

# Calculate sum of grades and number of subjects for each student
while( <DATA> ) {

   chomp;
   my ($name, $subject, $grade) = split /\t/;

   $grade_sums{$name} += $grade;
   $num_of_subjects{$name} += 1;
}

close DATA;


# Open file with grades again but this time for a purpose of adding a separate column with average grade and printing a result
open (OUT,"grades.tsv") or die "Cannot open file\n";

while ( <OUT> ) {
   chomp;
   my ($name, $subject, $grade) = split /\t/;

   # Calculate average grade
   my $average_grade = $grade_sums{$name} / $num_of_subjects{$name};
   my $outline = join("\t", $name, $subject, $grade, $average_grade);

   # Print a file content with new column
   print "$outline\n";
}

close OUT;

The code works but I am not sure if it is a proper way for this task. Is it a good practice or are there better ways that should be preferred?

zubenel
  • 300
  • 4
  • 10
  • 2
    You should not use the name DATA, it is a [special package handle](https://perldoc.pl/perldata#Special-Literals). – Grinnz Jan 30 '20 at 17:33
  • 3
    You should use three-argument open() and lexical filehandles: `open my $data, '<', 'grades.tsv' or die ...; while (my $line = <$data>) { ... }` – Grinnz Jan 30 '20 at 17:34
  • 1
    The only danger I can see - if your code does not work properly you overwrite original file (loose it's content, keep backup copy). Why not to create a new file with summary per student? `student number_of_subjects_taken average_mark` – Polar Bear Jan 30 '20 at 19:38
  • 2
    I'd also mention that split \t and join \t are not sufficient to deal with TSV files. See http://thomasburette.com/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/. Consider [Text::CSV](https://metacpan.org/pod/Text::CSV) – Grinnz Jan 30 '20 at 20:10

3 Answers3

7

Reopening the file is fine. One alternative would be to seek to the start of the file.

use Fcntl qw( SEEK_SET );

seek(DATA, 0, SEEK_SET);

Seeking is more efficient because it doesn't have to check for permissions, etc. It also guarantees that you get the same file (but not that noone changed it).

The other alternative would be to load the entire file into memory. That's what I'd usually do.


Note that

open(FH, $qfn) or die "Cannot open file\n";

is better written as

open(my $FH, '<', $qfn)
   or die("Can't open file \"$qfn\": $!\n");
  • Three-arg open avoids some problems.
  • Including the error reason in the error message is beneficial.
  • Including the path in the error message is beneficial.
  • One should avoid DATA as Perl sometimes creates a handle by that name automatically.
  • One should avoid using global variables (e.g. FH) in favour or lexical variables (my $FH).
ikegami
  • 367,544
  • 15
  • 269
  • 518
4

There's another thing to consider in this sort of operation. What do you do if you mess up when writing the new data? How are you going to tolerate a program that truncates the original file but fails to completely write the new data?

Instead of opening the write filehandle on the same filename, use a temporary file. File::Temp is part of the Standard Library:

use File::Temp;
my( $temp_fh, $tempfile ) = tempfile();

Now, write everything to $temp_fh until you are satisfied that you were able to complete the output. After that, use rename to move the completed file into place:

rename $tempfile => $original;

Shawn also correctly points out that this will change the inode, thus breaking hard links. You can copy the new file into the old if that matters to you, but I've rarely seen a situation where the technology was that advanced :)

If you mess up, the original data are still there and you can try again. Note: this assumes that the two files are on the same partition since that's a requirement of rename.

Although this might not matter in your case, you also have to consider what other consumers do in the time it takes you to write the new file. If another program wants to read the original file immediately after you've truncated it but haven't written the data (or incompletely written it), what happens? Typically, you want to ensure the file is complete before it's available to other programs.

If you don't like the temp file, there are other ways to handle the problem. Move the original file to a backup name then read that and write to the original name. Or, write to a different filename and move it into place. See, for example, Perl's adjustments to the -i command-line switch for just this problem.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • 1
    These techniques will not preserve hard links. To do that, `use File::Copy qw( copy );`. Copy the original to the backup copy, and if successful, copy the new file to the original. – shawnhcorey Jan 31 '20 at 14:21
  • 1
    I expanded my comments in [Use a temporary file instead of clobbering data](https://www.learning-perl.com/2020/01/use-a-temporary-file-instead-of-clobbering-data/) and [Copy instead of renaming to preserve hard links](https://www.learning-perl.com/2020/02/copy-instead-of-renaming-to-preserve-hard-links/). – brian d foy Feb 14 '20 at 22:34
3

Sample code for student's report card

#!/usr/bin/perl
#
# USAGE:
#   prog.pl
#
# Description:
#   Demonstration code for StackOverflow Q59991322
#
# StackOverflow: 
#   Question 59991322
#
# Author:
#   Polar Bear    https://stackoverflow.com/users/12313309/polar-bear
#
# Date: Tue Jan 30 13:37:00 PST 2020
#

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $debug = 0;      # debug flag
my %hash;
my $student;
my ($subject,$mark);

map{
    chomp;
    my($name,$subject,$mark) = split "\t",$_;
    $hash{$name}{subjects}{$subject} = $mark;
    $hash{$name}{compute}{Total} += $mark;
    $hash{$name}{compute}{Num_subjects}++;
} <DATA>;

say Dumper(\%hash) if $debug;

foreach $student ( sort keys %hash ) {
    $hash{$student}{compute}{GPA} = $hash{$student}{compute}{Total}/$hash{$student}{compute}{Num_subjects};
    $~ = 'STDOUT_REPORT';
    write;
    print_marks($student);
    $~ = 'STDOUT_REPORT_END';
    write;
}

sub print_marks {
    my $student = shift;

    $~ = 'STDOUT_MARKS';

    while( ($subject,$mark) = each %{$hash{$student}{subjects}} ) {
        write;
    }
}

format STDOUT_REPORT = 
+----------------------------+
| Student: @<<<<<<<<<<       |
$student
+----------------------------+
.

format STDOUT_REPORT_END =
+----------------------------+
| Subjects taken:     @<<    |
$hash{$student}{compute}{Num_subjects}
| Grade average:      @<<    |
$hash{$student}{compute}{GPA}
+----------------------------+

.

format STDOUT_MARKS =
| @<<<<<<<<<<<<<<     @<<    |
$subject, $mark
.

__DATA__
Liam    Mathematics 5
Liam    History 6
Liam    Geography   8
Liam    English 8
Aria    Mathematics 8
Aria    History 7
Aria    Geography   6
Isabella    Mathematics 9
Isabella    History 4
Isabella    Geography   7
Isabella    English 5
Isabella    Music   8

Output

+----------------------------+
| Student: Aria              |
+----------------------------+
| Mathematics         8      |
| History             7      |
| Geography           6      |
+----------------------------+
| Subjects taken:     3      |
| Grade average:      7      |
+----------------------------+

+----------------------------+
| Student: Isabella          |
+----------------------------+
| Music               8      |
| Mathematics         9      |
| History             4      |
| English             5      |
| Geography           7      |
+----------------------------+
| Subjects taken:     5      |
| Grade average:      6.6    |
+----------------------------+

+----------------------------+
| Student: Liam              |
+----------------------------+
| Geography           8      |
| English             8      |
| History             6      |
| Mathematics         5      |
+----------------------------+
| Subjects taken:     4      |
| Grade average:      6.7    |
+----------------------------+
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
  • 2
    Tip: The forms provided by Perl6::Form (a module for Perl5) are much cleaner than the builtin ones – ikegami Jan 30 '20 at 22:20
  • 2
    I don't really like the use of `map` in void context. Using `for` instead would be clearer imo. – Dada Jan 31 '20 at 07:26
  • I still have a soft spot for formats. So much so that I got the rights to post the [Formats chapter](https://www.learning-perl.com/2014/07/formats/) from the original [Learning Perl](https://www.learning-perl.com) book. – brian d foy Jan 31 '20 at 13:35
  • @brian d foy, Perl6::Form is basically the same without the weird global vars. – ikegami Jan 31 '20 at 14:24
  • I don't value the tradeoff of some sort of purity for a lot more unnecessary dependencies. People can make their own value judgments based on their situation. – brian d foy Jan 31 '20 at 14:59