1

Imagine that I have a long file of Rebol-formatted data, with a million lines, that look something like

REBOL []

[
    [employee name: {Tony Romero} salary: $10,203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10,000 + $203.04)]

...

    [employee name: {Stacey Christie} salary: (10% * $102,030.40)]
]

If the enclosing block wasn't there, I could use LOAD/NEXT to read through the employee items one at a time (as opposed to parsing the entire file into structured data with LOAD). Is there any way to do something similar if the enclosing block is there?

What if I wanted to go back to a previously visited item? Could there be a "structural seek"?

Is there a viable database solution that one could use for this kind of desire for Rebol-structured data, which might even permit random access insertions?

3 Answers3

1

If you are happy to tweak your file format a little so it is a file with one record per line, no enclosing blocks nor REBOL header:

employee-name: {Tony Romero} salary: $10203.04
employee-name: {Marcus "Marco" Marcami} salary: 'default
employee-name: {Serena Derella} salary: ($10000 + $203.04)
employee-name: {Stacey Christie} salary: (10% * $102030.40)

Then....

data: read/lines %data-file.txt

....gets you a block of unloaded strings

One way to work with them is like this:

foreach record data [
    record: make object! load/all record
    probe record
]

I had to tweak your data format too to make it easily loadable by REBOL:

  • employee-name rather than employee name
  • $10203.04 rather than $10'203.04
  • 10% -- only works with REBOL3

If you can't tweak the data format like that, you could always do some edits on each string prior to LOAD/ALL to normalise it for REBOL.

Sunanda
  • 1,555
  • 7
  • 9
  • Yes, this is possible... but I was hoping for something more robust with respect to the full spectrum of the Rebol data format. Ladislav brought up PARSE, but it won't work on PORT! http://stackoverflow.com/questions/4127569/using-parse-on-a-port-value – HostileFork says dont trust SE Nov 08 '10 at 20:06
1

I recall, that it was you who proved, that this should be doable in PARSE? ;-)

Nevertheless, to give you a useful answer: the code I wrote for the link text can be described exactly as parsing (in essence) REBOL not using the default LOAD/NEXT when needing something else. So, have a look, read the documentation, run the tests, write some tests, and if you have more questions, just ask.

Ladislav
  • 970
  • 4
  • 15
  • To use PARSE for this is very interesting, and you made me wonder how LOAD is working under the hood. When I looked I was surprised it is a mezzanine, and it seems to READ the entire data source (even if you're just doing /NEXT!) Unsure of the precise details, but would a PARSE-based LOAD mezzanine have a more incremental reading nature, and be capable of LOAD/BACK? – HostileFork says dont trust SE Nov 08 '10 at 19:44
  • One issue is that PARSE won't work on a PORT! currently, see http://stackoverflow.com/questions/4127569/using-parse-on-a-port-value – HostileFork says dont trust SE Nov 08 '10 at 20:05
  • Regarding LOAD/BACK - that surely would be possible to write, but there is a tradeoff - it takes time and effort, and the usage would be just exceptional. – Ladislav Dec 27 '10 at 09:55
  • PARSE work on a port - I wrote code parsing a port input "per partes", which is publicly available as a part of an open source protocol (BEER). But, I do not think a general solution exists - there are things like: buffer overflow, backtracking, timeout, etc., that may need ad hoc solution. – Ladislav Dec 27 '10 at 09:58
1

Sunanda's answer is not good as you can have multiline data! You can use something like that:

data: {REBOL []

[
    [employee name: {Tony Romero} salary: $10'203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10'000 + $203.04)]
]}

unless all [
    set [value data] load/next data
    value = 'REBOL
][  print "Not a REBOL data file!" halt ]
set [header data] load/next data
print ["data-file-header:" mold header]
data: find/tail data #"["

attempt [
    ;you must use attempt as there will be at least one error at the end of file!
    ;** Syntax Error: Missing [ at end-of-block
    indexes: copy []
    while [
        append indexes data
        set [loaded-row data] load/next data
        data
    ][
        probe loaded-row
    ]

]
print "done"

remove back tail indexes ;removes the last erroneous position

foreach data-at-pos reverse indexes [
    probe first load/next data-at-pos
]

So the output would be:

[employee name: "Tony Romero" salary: $10203.04]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
done
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Tony Romero" salary: $10203.04]
Oldes
  • 937
  • 1
  • 9
  • 24
  • also if you have large data file, you can add some buffering and do not read all the file at once. – Oldes Nov 05 '10 at 10:34
  • btw.. the best solution is not to use the enclosing block for the data:) – Oldes Nov 05 '10 at 10:45
  • The ATTEMPT is obviously a little dodgy since it assumes your data is well-formed. But thanks for putting the effort in to write code for a workaround for the specific scenario described in the question. I was really musing about a more-generalized "seek" that would work on a disk file, but then realized that even if the parser were capable of LOAD/BACK then the inability to make modifications would mean it would only be useful for a very narrow set of circumstances... – HostileFork says dont trust SE Nov 08 '10 at 19:27
  • If you would really require to store data on disk and be able to seek/modify them without loading completely, than you don't want the data in REBOL format, but rather in some binary form as real databases do it. – Oldes Nov 11 '10 at 15:44
  • Also I expect that if this would be a real life example, you should be pretty sure that the data are well-formed. And you can always add error type detection as well. – Oldes Nov 11 '10 at 15:49