2

I am opening a .gz file and reading it chunk by chunk for uncompressing it.

The data in the uncompressed file is like : aRSbRScRSd, There are record separators(ASCII code 30) between each record (records in my dummy example a,b,c).

    File file = File(mylog.gz, "r");
    auto uc = new UnCompress();
    foreach (ubyte[] curChunk; file.byChunk(4096*1024))  
    { 
        auto uncompressed = cast(string)uc.uncompress(curChunk);
        writeln(uncompressed);
        auto stringRange = uncompressed.splitLines();
        foreach (string line; stringRange)
        {
            ***************** Do something with line 

The result of the code above is: abcd unfortunately record separators(ASCII 30) are missing.

I realized by examining the data record separators are missing after I cast ubyte[] to string.

Now I have two questions:

  • What should I change in the code to keep record separator?

  • How can I write the code above without for loops? I want to read line by line.

Edit

A more general and understandable code for first question :

    ubyte[] temp = [ 65, 30, 66, 30, 67];
    writeln(temp);
    string tempStr = cast(string) temp;
    writeln (tempStr);

Result is : ABC which is not desired.

halfer
  • 19,824
  • 17
  • 99
  • 186
Kadir Erdem Demir
  • 3,531
  • 3
  • 28
  • 39
  • Are you sure the result is ABC? The value 30 in the array is also casted to be a character and is not a printable character as from [ASCII 30](http://www.theasciicode.com.ar/ascii-control-characters/record-separator-ascii-code-30.html) – xtreak Feb 04 '15 at 08:23
  • I've also checked the small sample in two different software: windows cmd, display is ok, another soft: ' ABC' (note that the white's displayed before !). Which soft do you use to display the string ? I think that it'possible that your lead to think there is a parsing error while actually it's just a display issue. – Abstract type Feb 04 '15 at 11:59

1 Answers1

2

The character 30 is not a printable character although some editors may show a symbol in its place. It is not being lost, but it doesn't print out. Also note that casting a ubyte[] to string is usually incorrect because a ubyte[] array is mutable while a string is immutable. It is better to cast a ubyte[] to a char[].

yaz
  • 1,340
  • 1
  • 10
  • 14
  • Sounds like what would be nice in this case is the reverse of `std.string.representation`. `representation` retains the proper constness when it does the cast (which is part of why it's better than using a cast), but it's going from `char` to `ubyte` rather than `ubyte` to `char`, so it's the reverse of what's required here. – Jonathan M Davis Feb 05 '15 at 12:15