5

I have the following Python code snippet:

import zlib

def object_read(repo, sha):
    path = repo + "/objects/" + sha[0:2] +  "/" + sha[2:]

    with open (path, "rb") as f:
        raw = zlib.decompress(f.read())
        return len(raw)

print(object-read(".git", "1372c654fd9bd85617f0f8b949f1405b0bd71ee9"))

and one of its P6 counterparts:

#!/usr/bin/env perl6
use Compress::Zlib;

sub object-read( $repo, $sha ) {
    my $path = $repo ~ "/objects/" ~ $sha.substr(0, 2) ~ "/" ~
               $sha.substr(2, *);

    given slurp($path, :bin) -> $f {
        my $raw = uncompress($f).decode('utf8-c8'); # Probable error here?!
        return $raw.chars;
    }

}

put object-read(".git", "1372c654fd9bd85617f0f8b949f1405b0bd71ee9")

However, when I run them, they give me back off-by-one results:

$ python bin.py
75
$ perl6 bin.p6
74
jjmerelo
  • 22,578
  • 8
  • 40
  • 86
uzluisf
  • 2,586
  • 1
  • 9
  • 27
  • 1
    What is "raku"? – melpomene Mar 31 '19 at 14:56
  • 3
    Why are you calling `decode` in the Perl6, but not the Python version? – melpomene Mar 31 '19 at 14:57
  • @melpomene Raku is the codename for Perl 6. See the redirect at http://raku.do/ – phd Mar 31 '19 at 15:13
  • @phd That's Rakudo, not Raku. – melpomene Mar 31 '19 at 15:14
  • 1
    @phd Oh, I found it in the [FAQ](https://docs.perl6.org/language/faq): "*Perl 6 (which can also be called "Raku") is the definition of the language.*" (And Rakudo is an implementation.) That's news to me. Last time I looked it wasn't there. :-) But then why are there separate `perl6` / `raku` tags? – melpomene Mar 31 '19 at 15:19
  • 3
    It's a long and painful story. My personal takes: https://liztormato.wordpress.com/2018/11/06/on-raku/ and https://liztormato.wordpress.com/2018/11/09/on-raku-again/ . Please note that these are personal opinions. – Elizabeth Mattijsen Mar 31 '19 at 15:32
  • @melpomene "why are there separate perl6 / raku tags?" See [Synonym Perl6 and Raku](https://meta.stackoverflow.com/questions/376267/synonym-perl6-and-raku), especially Pat's answer. – raiph Jun 05 '19 at 10:59

1 Answers1

8

@melpomene has hit the spot. You are not decoding in Python, and the number of bytes in the raw file might be a bit more; insert

say uncompress($f).elems;

before decoding to $raw and you will see that it includes (in the file and in my system) 2 bytes more. Rendering via utf8-c8 might merge a couple of bytes into a single codepoint (or more). In general, the number of codepoints will be less than the number of bytes in an IO stream.

jjmerelo
  • 22,578
  • 8
  • 40
  • 86