-3

I have a set of data file looks like below. I would like to get the interpolation final value (final,P) by referring to 2 set of number range (scoreA and scoreB). Let's say for "Eric", his scoreA is 35 (value between range 30.00 - 40.00) and scoreB is 48 (a value between range 45.00 - 50.00). He will get 2 set of final values range which are (22.88,40.90) & (26.99,38.99). And I would like to get the final value of "Eric" and "George" in the data file. "George"'s scoreA = 38 and scoreB = 26.

After formula calculation, I want to get the exact final value when his scoreA=35 & scoreB=45. Let's assume formula is P=X+Y (P is final value), so far I have been trying the code as shown below. However it cannot get the correct lines.

How to get the exactly final value range by referring to the data given?

data file

Student_name ("Eric")   
/* This is a junk line */   
scoreA ("10.00, 20.00, 30.00, 40.00")  
scoreB ("15.00, 30.00, 45.00, 50.00, 55.00")     
final (  
"12.23,19.00,37.88,45.98,60.00",\  
"07.00,20.11,24.56,45.66,57.88",\  
"05.00,15.78,22.88,40.90,57.99",\  
"10.00,16.87,26.99,38.99,40.66"\)  

Student_name ("Liy") 
/* This is a junk line */   
scoreA ("5.00, 10.00, 20.00, 60.00")  
scoreB ("25.00, 30.00, 40.00, 55.00, 60.00")     
final (  
"02.23,15.00,37.88,45.98,70.00",\  
"10.00,28.11,34.56,45.66,57.88",\  
"08.00,19.78,32.88,40.90,57.66",\  
"10.00,27.87,39.99,59.99,78.66"\)

Student_name ("Frank") 
/* This is a junk line */   
scoreA ("2.00, 15.00, 25.00, 40.00")  
scoreB ("15.00, 24.00, 38.00, 45.00, 80.00")     
final (  
"02.23,15.00,37.88,45.98,70.00",\  
"10.00,28.11,34.56,45.66,57.88",\  
"08.00,19.78,32.88,40.90,57.66",\  
"10.00,27.87,39.99,59.99,78.66"\)

Student_name ("George") 
/* This is a junk line */   
scoreA ("10.00, 15.00, 20.00, 40.00")  
scoreB ("25.00, 33.00, 46.00, 55.00, 60.00")     
final (  
"10.23,25.00,37.88,45.98,68.00",\  
"09.00,28.11,34.56,45.66,60.88",\  
"18.00,19.78,32.88,40.90,79.66",\  
"17.00,27.87,40.99,59.99,66.66"\) 

Coding

data();      
sub data() {   
    my $cnt = 0;
    while (my @array = <FILE>) {
        foreach $line(@array) {    
            if ($line =~ /Student_name/) {
                $a = $line;

                if ($a =~ /Eric/ or $cnt > 0 ) {
                    $cnt++;
                }
                if ( $cnt > 1 and $cnt <= 3 ) {
                    print $a;
                }
                if ( $cnt > 2 and $cnt <= 4 ) {
                    print $a;
                }
                if ( $cnt == 5 ) {
                    $cnt  =  0;  
                }
            }
        }
    }
}

Result

Eric    final=42.66  
George  final=24.30  
gpojd
  • 22,558
  • 8
  • 42
  • 71
Zoe
  • 57
  • 1
  • 1
  • 6
  • There are two seperate problems here: (1) how to parse the file, and (2) how to calculate the scores. Parsing is fairly easy, but I don't understand how you calculate the scores: What data points are you using? Can you clarify the algorithm? Can you explain your terminology? – amon May 30 '13 at 14:22
  • 1
    Please explain 1) Where the values for scoreA and scoreB for each person come from? 2) What meaning the `final` values have in the data file? 3) What meaning the two pairs of final values has? 4) What relation the final value `P` has to the four "final values" – Borodin May 30 '13 at 14:28
  • @amon-This is a simple example similar to my file. Actually I use bilinear interpolation equation for calculation. – Zoe May 31 '13 at 01:36
  • [google](http://www.ajdesigner.com/phpinterpolation/bilinear_interpolation_equation.php) – Zoe May 31 '13 at 01:57
  • @Borodin-Those example numbers are random generated by computer, I would like to find out the exact final value after bilinear interpolation calculation if his score fall in between the range. For an example "Eric", note: * is computer generated formula and calculation. scoreA (x0, x1, x2, x3) scoreB (y0, y1, y2, y3, y4) each final value obtained: first row(x0*y0),(x0*y1),(x0*y2),(x0*y3),(x0*y4)//second row (x1*y0),(x1*y1),(x1*y2)... – Zoe May 31 '13 at 01:58
  • I added a few notes about implementing bilinear interpolation to my answer. However, I don't understand the meaning of your last comment, and why you suddenly talk about *single* values for `scoreA` and `scoreB` in the first paragraph of your question when they are lists. – amon May 31 '13 at 09:52

1 Answers1

1

In my comment I said that parsing is fairly easy. Here is how it could be done. As the question lacks a proper specification of the file format, I will assume the following:

The file consists of properties, which have values:

document ::= property*
property ::= word "(" value ("," value)* ")"

A value is a double-quoted string containing numbers seperated by commata, or a single word:

value ::= '"' ( word | number ("," number)* ) '"'

Spaces, backslashes, and comments are irrelevant.

Here is a possible implementation; I will not go into the details of explaining how to write a simple parser.

package Parser;
use strict; use warnings;

sub parse {
  my ($data) = @_;

  # perform tokenization

  pos($data) = 0;
  my $length = length $data;
  my @tokens;
  while(pos($data) < $length) {
    next if $data =~ m{\G\s+}gc
         or $data =~ m{\G\\}gc
         or $data =~ m{\G/[*].*?[*]/}gc;
    if ($data =~ m/\G([",()])/gc) {
      push @tokens, [symbol => $1];
    } elsif ($data =~ m/\G([0-9]+[.][0-9]+)/gc) {
      push @tokens, [number => 0+$1];
    } elsif ($data =~ m/\G(\w+)/gc) {
      push @tokens, [word => $1];
    } else {
      die "unreckognized token at:\n", substr $data, pos($data), 10;
    }
  }

  return parse_document(\@tokens);
}

sub token_error {
  my ($token, $expected) = @_;
  return "Wrong token [@$token] when expecting [@$expected]";
}

sub parse_document {
  my ($tokens) = @_;
  my @properties;
  push @properties, parse_property($tokens) while @$tokens;
  return @properties;
}

sub parse_property {
  my ($tokens) = @_;
  $tokens->[0][0] eq "word"
    or die token_error $tokens->[0], ["word"];
  my $name = (shift @$tokens)->[1];
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '('
    or die token_error $tokens->[0], [symbol => '('];
  shift @$tokens;
  my @vals;
  VAL: {
    push @vals, parse_value($tokens);
    if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
      shift @$tokens;
      redo VAL;
    }
  }
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq ')'
    or die token_error $tokens->[0], [symbol => ')'];
  shift @$tokens;
  return [ $name => @vals ];
}

sub parse_value {
  my ($tokens) = @_;
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
    or die token_error $tokens->[0], [symbol => '"'];
  shift @$tokens;

  my $value;

  if ($tokens->[0][0] eq "word") {
    $value = (shift @$tokens)->[1];
  } else {
    my @nums;
    NUM: {
      $tokens->[0][0] eq 'number'
        or die token_error $tokens->[0], ['number'];
      push @nums, (shift @$tokens)->[1];
      if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
        shift @$tokens;
        redo NUM;
      }
    }
    $value = \@nums;
  }

  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
    or die token_error $tokens->[0], [symbol => '"'];
  shift @$tokens;

  return $value;
}

Now, we get the following data structure as output from Parser::parse:

(
  ["Student_name", "Eric"],
  ["scoreA", [10, 20, 30, 40]],
  ["scoreB", [15, 30, 45, 50, 55]],
  [
    "final",
    [12.23, 19, 37.88, 45.98, 60],
    [7, 20.11, 24.56, 45.66, 57.88],
    [5, 15.78, 22.88, 40.9, 57.99],
    [10, 16.87, 26.99, 38.99, 40.66],
  ],
  ["Student_name", "Liy"],
  ["scoreA", [5, 10, 20, 60]],
  ["scoreB", [25, 30, 40, 55, 60]],
  [
    "final",
    [2.23, 15, 37.88, 45.98, 70],
    [10, 28.11, 34.56, 45.66, 57.88],
    [8, 19.78, 32.88, 40.9, 57.66],
    [10, 27.87, 39.99, 59.99, 78.66],
  ],
  ...,
)

As a next step, we want to transform it into nested hashes, so that we have the structure

{
  Eric => {
    scoreA => [...],
    scoreB => [...],
    final  => [[...], ...],
  },
  Liy => {...},
  ...,
}

So we simply run it through this small sub:

sub properties_to_hash {
  my %hash;
  while(my $name_prop = shift @_) {
    $name_prop->[0] eq 'Student_name' or die "Expected Student_name property";
    my $name = $name_prop->[1];
    while( @_ and $_[0][0] ne 'Student_name') {
      my ($prop, @vals) = @{ shift @_ };
      if (@vals > 1) {
        $hash{$name}{$prop} = \@vals;
      } else {
        $hash{$name}{$prop} = $vals[0];
      }
    }
  }
  return \%hash;
}

So we have the main code

my $data = properties_to_hash(Parser::parse( $file_contents ));

Now we can move on to Part 2 fo the problem: calculating your scores. That is, once you make clear what you need done.

Edit: Bilinear interpolation

Let f be the function that returns the value at a certain coordinate. If we have a value at those coordinates, we can return that. Else, we perform bilinear interpolation with the next known values.

The formula for bilinear interpolation[1] is:

f(x, y) = 1/( (x_2 - x_1) · (y_2 - y_1) ) · (
              f(x_1, y_1) · (x_2 - x) · (y_2 - y)
            + f(x_2, y_1) · (x - x_1) · (y_2 - y)
            + f(x_1, y_2) · (x_2 - x) · (y - y_1)
            + f(x_2, y_2) · (x - x_1) · (y - y_1)
          )

Now, scoreA denote the positions of the data points in the final table on the first axis, scoreA the positions on the second axis. We have to do the following:

  1. assert that the requested values x, y are inside the bounds
  2. fetch the next smaller and next larger positions
  3. perform interpolation

.

sub f {
   my ($data, $x, $y) = @_;

   # do bounds check:
   my ($x_min, $x_max, $y_min, $y_max) = (@{$data->{scoreA}}[0, -1], @{$data->{scoreB}}[0, -1]);
   die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
      unless $x_min <= $x && $x <= $x_max
          && $y_min <= $y && $y <= $y_max;

To fetch the boxing indices x_1, x_2, y_1, y_2 we need to iterate through all possible scores. We'll also remember the physical indices x_i1, x_i2, y_i1, y_i2 of the underlying arrays.

   my ($x_i1, $x_i2, $y_i1, $y_i2);
   for ([$data->{scoreA}, \$x_i1, \$x_i2], [$data->{scoreB}, \$y_i1, \$y_i2]) {
      my ($scores, $a_i1, $a_i2) = @$_;
      for my $i (0 .. $#$scores) {
         if ($scores->[$i] <= $x) {
            ($$a_i1, $$a_i2) = $i == $#$scores ? ($i, $i+1) : ($i-1, $i);
            last;
         }
      }
   }
   my ($x_1, $x_2) = @{$data->{scoreA}}[$x_i1, $x_i2];
   my ($y_1, $y_2) = @{$data->{scoreB}}[$y_i1, $y_i2];

Now, interpolation according to above formula can be performed, but each access at a known index can be changed to an access via physical index, so f(x_1, y_2) would become

$final->[$x_i1][$y_i2]

Detailed Explanation of sub f

  • sub f { ... } declares a sub with name f, although that is probably a bad name. bilinear_interpolation might be a better name.

  • my ($data, $x, $y) = @_ states that our sub takes three arguments:

    1. $data, a hash reference containing fields scoreA, scoreB and final, which are array references.
    2. $x, the position along the scoreA-axis where interpolation is required.
    3. $y, the position along the scoreB-axis where interpolation is required.
  • Next, we want to assert that the positions for $x and $y are valid aka inside bounds. The first value in $data->{scoreA} is the minimal value; the maximal value is in the last position (index -1). To get both at once, we use an array slice. A slice accesses multiple values at once and returns a list, like @array[1, 2]. Because we use complex data structures which use references, we have to dereference the array in $data->{scoreA}. This makes the slice look like @{$data->{scoreA}}[0, 1].

    Now that we have the $x_min and $x_max values, we throw and error unless the requested value $x is inside the range defined by the min/max values. This is true when

    $x_min <= $x && $x <= $x_max
    

    Should either $x or $y be out of bounds, we throw an error which shows the actual bounds. So the code

    die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
    

    could, for example, throw an error like

    indices (10, 500) out of range ([20, 30], [25, 57]) at script.pl line 42
    

    Here we can see that the value for $x is too small, and $y is too large.

  • The next problem is to find neighbouring values. Assuming scoreA holds [1, 2, 3, 4, 5], and $x is 3.7, we want to select the values 3 and 4. But because we can pull some nifty tricks a bit later, we would rather remember the position of the neighbouring values, not the values themselves. So this would give 2 and 3 in above example (remember that arrows are zero-based).

    We can do this by looping over all indices of our array. When we find a value that is ≤ $x, we remember the index. E.g. 3 is the first value that is ≤ $x, so we remember the index 2. For the next higher value, we have to be a bit carful: Obviously, we can just take the next index, so 2 + 1 = 3. But now assume that $x is 5. This passes the bounds check. The first value that is ≤ $x would be value 5, so we can remember position 4. However, there is no entry at position 5, so we could use the current index itself. Because this would lead to division by zero later on, we would be better off remembering positions 3 and 4 (values 4 and 5).

    Expressed as code, that is

    my ($x_i1, $x_i2);
    my @scoreA = @{ $data->{scoreA} }; # shortcut to the scoreA entry
    for my $i (0 .. $#scores) {        # iterate over all indices: `$#arr` is the last idx of @arr
       if ($scores[$i] <= $x) {        # do this if the current value is ≤ $x
          if ($i != $#scores) {        # if this isn't the last index
             ($x_i1, $x_i2) = ($i, $i+1);
          } else {                     # so this is the last index
             ($x_i1, $x_i2) = ($i-1, $i);
          }
          last;                        # break out of the loop
       }
    }
    

    In my original code I choose a more complex solution to avoid copy-pasting the same code for finding the neighbours of $y.

    Because we also need the values, we obtain them via a slice with the indices:

    my ($x_1, $x_2) = @{$data->{scoreA}}[$x_i1, $x_i2];
    
  • Now we have all surrounding values $x1, $x_2, $y_1, $y_2 which define the rectangle in which we want to perform bilinear interpolation. The mathematical formula is easy to translate to Perl: just choose the correct operators (*, not · for multiplication), and the variables need dollar signs before them.

    The formula I used is recursive: The definition of f refers to itself. This would imply an infinite loop, unless we do some thinking and break the recursion. f symbolizes the value at a certain position. In most cases, this means interpolating. However, if $x and $y are both equal to values in scoreA and scoreB respectively, we don't need bilinear interpolation, and can return the final entry directly.

    This can be done by checking if both $x and $y are members of their arrays, and doing an early return. Or we can use the fact that $x_1, ..., $y_2 all are members of the arrays. Instead of recursing with values we know don't need interpolating, we just do an array access. This is what we have saved the indices $x_i1, ..., $y_i2 for. So wherever the original formula says f(x_1, y_1) or similar, we write the equivalent $data->{final}[$x_i1][$y_i2].

amon
  • 57,091
  • 2
  • 89
  • 149
  • @amon-thanks for your guidance. I have questions about coding. 1) The main code: my $data = properties_to_hash(Parser::parse( $file_contents )); what is $file_contents? 2) what if I want only Eric and George' final values to be shown? – Zoe Jun 04 '13 at 02:05
  • there are errors like: Bareword found where operator expected at line 53, near "document ::" (Do you need to predeclare document?) Bareword found where operator expected at line 54, near "property ::" (Do you need to predeclare property?)String found where operator expected at Dvalue.pl line 55, near ") '"'" (Missing operator before '"'?) syntax error at line 53, near "document ::" Can't use global @_ in "my" at line 58, near "= @_" syntax error at line 81, near "}" // I don't understand how to use parser and declaration – Zoe Jun 04 '13 at 02:49
  • (1) The `$file_contents` are the contents of the file you want to parse. E.g. `use File::Slurp; my $file_contents = read_file("data.file");`. (2) Once you have fully parsed the file into a hash, you can delete any entries you are not interested in, e.g. `/^(?:Eric|George)$/ or delete $data->{$_} for keys %$data`. (3) I dont know where your errors come from, my code worked for me. The parser can be put in a file `Parser.pm` in the same dir as your script. Then `use Parser` to load the code. – amon Jun 04 '13 at 06:10
  • Ah, now I see your problem: my first two code sections are *not* Perl code, but [EBNF](https://en.wikipedia.org/wiki/EBNF) notation for the grammar (the structure) of your file format, which is used to describe Parsers. – amon Jun 04 '13 at 06:13
  • @amon- meaning, document ::= property*; #property ::= word "(" value ("," value)* ")"; are not perl code right. How do the coding handle input from my database file by using parser? – Zoe Jun 04 '13 at 08:56
  • You load the file into a string, e.g. via `File::Slurp`. The `Parser::parse` function then operates on that string. I won't write out the whole code for you, I already covered all the difficult parts. What is left for you to do really isn't hard, provided that you have minimal experience with programming, and a basic grasp of Perl. – amon Jun 04 '13 at 09:06
  • @amon- Perhaps my perl version cannot locate parser.pm. After compilation, it shows: can't locate File/Slurp.pm in my perl version pakage. Anyway, Thanks a lot for your guidance and help =D Very much appreciate. – Zoe Jun 04 '13 at 09:36
  • `File::Slurp` is not a core module, but can be installed from CPAN. Equivalently, you can do `my $file_contents = do { open my $fh, "<", $filename or die "Can't open $filename: $!"; local $/; <$fh>};` – amon Jun 04 '13 at 09:41
  • @amon- I wrote this to find specific name after interpolation. sub name { if ($name =~ /^(?:Eric|George)$/) { print "[$x_i1][$y_i2]\n"; } } is this fine? It seems wrong in somewhere. I have another question. In interpolation function, I have fixed scoreA and scoreB as reference numbers to get the nearest range. in sub f { my $a = 38; my $b = 26; if ( $x_min < $a < $x_max) { @{$data->{scoreA}}[0, -1]; } if ( $y_min < $b < $y_max) { @{$data->{scoreB}}[0, -1]; } } Is this the way to find the correct range? Can you explain what does [0, -1] means? Thanks... – Zoe Jun 04 '13 at 12:44
  • I added a detailed explanation of my half-implementation for bilinear interpolation which explains some shortcuts, and the general algorithm. Note: in Perl, you have to write `$x < $y < $z` as `$x < $y && $y < $z`. – amon Jun 04 '13 at 13:59