4

A few questions scratch an itch around perl6 grammars and raster (binary in general) data. For what I understand, the text approach is to work at the grapheme-level trhough grammars, may we approach raster data that way ? Can we make custom grapheme definition to approach raster data or a basic unit of binary data to parse them using Grammars ?

Seeing that perl6 is defined by perl6 grammars, can we define similar grammars as kind of "validation" test with a basic case being if the grammar can parse the data, the data is well-formed and is structurally validated ? Using this approach for text data, it is kind of obvious with grammars as the basic unit are text-oriented but can we customize those back-end definition (by example, it's possible to overwrite the :sigspace to make rules and tokens parse with a another separator for grapheme) to enable the power of grammars in the binary data territory ?

Thanks!

For the background part:

During the past few weeks, I begin to learn-ish Perl6 by personal interest. After seeing this talk at FOSDEM 2019 and I begin to ask myself (and the people around me) about using using grammars to inspect/parse binary data. My usecase will be for example to replicate the Cloud Optimized Geotiff validator without the support of a GDAL binding (I didn't see one yet in perl6). It's clearly a learning project for me.

The Spec for Cloud Optimized Geotiff

For now, the basic idea is to parse the binary structure with the help of perl6 grammars if it possible as a first basic step, hoping to be able to inspect the data and metadata as a main goal.

Note : Not native speaker, if some parts need rewriting/precisions feel free to point out.

  • 1
    cf https://stackoverflow.com/questions/48202133/parsing-binary-structure-with-perl6-grammar – raiph Feb 03 '19 at 16:03
  • 1
    "for example to ... without the support of a ... binding (I didn't see one yet in perl6).". cf https://stackoverflow.com/questions/54465122/perl6-equivalent-of-perls-store-or-use-storable#comment95750043_54465122 and https://stackoverflow.com/questions/54487122/cannot-import-perl5-module-using-inlineperl5-into-perl6#comment95796367_54495124 and nearby comments. – raiph Feb 03 '19 at 16:10
  • 2
    A short time ago, this article was published on using Perl 6 grammars for GFX3 files http://blogs.perl.org/users/sylvain_colinet/2019/01/mis-using-perl-6-grammars-decompressing-zelda-3-gfx.html, which I understand is a binary format. So I understand it can be done, although of course you'd have to put the GeoTIFF in grammar form to parse it. – jjmerelo Feb 03 '19 at 16:34
  • @raiph I'll dig in the pack module and yes, I was aware of the possible import from others languages modules inside perl6 but I was focusing more on the grammar approach. – notagoodidea Feb 03 '19 at 21:35
  • 1
    @jjmerelo Didn't found that link, thanks, i'll dig in it. But as I understand, perl6.c actually didn't natively manage binary data. For example in the blog article linked : "Since grammars does not really support pure binary data you have to pass your data that you store originally in a buf as latin1 encoded string. " The logic in the article is what I'm looking for, thanks! – notagoodidea Feb 03 '19 at 21:36
  • 1
    @notagoodidea here's an answer to the "what's perl6's definition of a grapheme?"; perl6 adheres to unicode's algorithm for grapheme cluster rules; here's a link to the algorithm concisely shown: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules - I hope that makes things clear! There is no mechanism in perl6 to change how graphemes work for strings, at least to my knowledge – timotimo Feb 03 '19 at 23:36

1 Answers1

1

As only comments where posted, I will summarize all the answers I got from the comments here, my further research and the #perl6 IRC chatroom.


Concerning the support of binding for X library (in the test case, it was GDAL), the strategy inside the perl6 community is to either leverage :

  • Use the Inline::Foo modules aiming at launching and accessing the ecosystem of the Foo language (by example : Inline::Perl5, Inline::Python and so on). List of Inline::X modules from the Perl6 Module Directory ;
  • Use or write a binding using NativeCall to bind to dynamic libraries who follow the C Calling Covention ;
  • Use or write a native perl6 module.

Concerning the parsing of binary data, I'll split the subject in two parts :

  1. Generally speaking ;
  2. Leveraging Grammars ;

1. Generally speaking

Leveraging the P5pack module or using Inline::Perl5 to use the unpack/pack is actually (with perl6.c) the best to parse binary data structure (the former seems favoraed as it's native module). Go to see first comment from @raiph to a SO anwser showing a basic use case.

2. Leveraging the grammars

With perl6.c, grammars can only parse text. However, the question about parsing binary data seems to be moderatly hot (based on feedbacks seen on the #perl6 irc channel) and a few to document, yet not implemetend, seems to pave the way with a hope to see it happens in a future (near or distant?).

The last part of the @raiph's anwser list a lot of ressources aiming at that direction. Moreover, in the Synopses 05 - Regexes and Rules : line 432 a :bytes modifier is evoked. We will have to see at which point those modifiers will be implemented and what is missing to bring them to the language. On the #perl6 irc channel, MasterDuke said « also, i think the nqp binary reading/writing ops that jnthn recently specced and nine implemented were a prerequisite for anything further ». I still have to investigated what exactly he is talking about but it seems to go to the good direction.

One of the main point, IMO, is the related to the grapheme definition based on the UTF-8 one. If we were able to overwrite the grapheme definition to a custom one for specialized grammar as we can for now overwrite the :sigspace modifier to affect what is the separators for rulesand tokens, we will access a new way to operate around data structure and grammars. For now, the grapheme is defined in the string-level not the grammar-level or meta. See @timotimo comments linking to the UTF-8 document describing the Grapheme Cluster Boundary Rules.

A way to bend the rules was linked by @jjmererlo : Parsing GFX3 format with perl6 grammars.