1

I'm trying to convert VTK (vtu) XML-format files from base64 binary strings to ASCII strings. The files look a bit like this:

<Points>
    <DataArray type="Float32" NumberOfComponents="3" format="binary">
`gJQGAGp7+sJTMbPCVWiWv4RP+8LbKrTCj0yDv1kC+8J5w7PCUe0xv34YAMNqprTCtsRDv7yw/8IgdLTCUE0lv/8 (etc...)
    </DataArray>
</Points>

You can also have these files in ASCII format, so in ASCII the same thing looks like:

<Points>
    <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="9.6120050431" RangeMax="280.36424584">
      -125.24104309 -89.596336365 -1.1750589609 -125.65530396 -90.083702087 -1.0257738829
      -125.50458527 -89.881782532 -0.69502741098 -128.09567261 -90.325027466 -0.7647203207
      -127.84518433 -90.226806641 -0.64571094513 -128.24607849 -90.475311279 -0.61999017
      (etc...)
    </DataArray>
</Points>

I need my code to work for when the files come in ASCII or binary, so I need to be able to convert the base64 string in the first case to the ASCII format in the second case.

Right now I have:

string pointString = nodeList[0].ChildNodes.Item(0).InnerText.Trim();
if(format.Equals("binary", StringComparison.InvariantCultureIgnoreCase))
{
    byte[] bytes = Convert.FromBase64String(pointString);
    pointString = Encoding.ASCII.GetString(bytes);
}

aaand my string is coming out all wrong:

pointString: ?$

I feel like I'm missing something simple here. Where am I going wrong?

GSerg
  • 76,472
  • 17
  • 159
  • 346
user430481
  • 315
  • 1
  • 4
  • 14
  • Your "binary" byte array contains the bytes making up `float` numbers (`float` aka `System.Single` = 32-bit floating point numbers; every 4 bytes in your byte array thus constituting a `float` value). What makes you think you have to decode this byte array as ASCII string if all those bytes are just the components of `float` numbers? Clearly the 4 bytes of a 32-bit float number have no relationship with any ASCII character. Where did you get that idea from? –  Jun 07 '19 at 20:20
  • This is interesting. Can you post a complete base64 string? The answer might be proprietary. This might have nothing at all to do with .NET binary serialization. There are some nuget packages for VTK. – Scott Hannen Jun 07 '19 at 20:26

1 Answers1

2

Try to convert bytes directly to floats and check whether it produce valid result:

byte[] bytes = Convert.FromBase64String(pointString);
float[] dataArray = Enumerable.Range(0, bytes.length / 4).Select(i => BitConverter.ToSingle(bytes, i * 4)).ToArray();
pakeha_by
  • 2,081
  • 1
  • 15
  • 7
  • 2
    You can replace that `Enumerable.Range.Select` mess with a single `Buffer.BlockCopy` call. It'll be faster as well. – Ben Voigt Jun 07 '19 at 21:59
  • @BenVoigt That doesn't seem like an improvement - you would need to preallocate the `dataArray` and copy along some index still. Not any less messy. – NetMage Jun 07 '19 at 22:02
  • @NetMage: Preallocating is much better than streaming from an enumerable, extending the buffer a bunch of times. Just because the code `.ToArray()` is short doesn't mean it is efficient. Even if it checks that the Range has a Length known in advance and does preallocate, it still will be calculating indexes using a lambda and calling BitConverter hundreds of times which is not efficient either. – Ben Voigt Jun 07 '19 at 22:10
  • @BenVoigt `ToArray` is inefficient compared to `ToList` but hundreds of times seems like an exaggeration. Even if not, the time is hardly going to compare to converting from Base64 to binary reading an XML file. It all seems like premature optimization to me. – NetMage Jun 07 '19 at 22:28
  • @NetMage: You might notice that in my original comment the improved performance was a footnote. The main advantage is how much simpler `Buffer.BlockCopy` makes the code. It's not "Calculate the end index, generate the indexes, find the corresponding bytes in the source array, convert them, and stream into a new array" but just "Calculate the number of items, allocate new array, copy data". `var dataArray = new float[bytes.Length / sizeof(float)]; Buffer.BlockCopy(bytes, 0, dataArray, 0, bytes.Length);` Shorter code, conceptually simpler, lots fewer operations. – Ben Voigt Jun 07 '19 at 22:33
  • @BenVoigt Perhaps that should be its own answer? But how could you handle wrong endian? – NetMage Jun 07 '19 at 23:02
  • @NetMage: The conceptually simplest way to handle big endian inputs is (1) reverse the whole bytes array. (2) Buffer.BlockCopy (3) reverse the floats array. Only slightly more code (and less work for the computer) is a loop walking the byte array four-at-a-time swapping in place (0 <-> 3 and 1 <-> 2) and then the endianness problem is gone, so Buffer.BlockCopy works as normal. – Ben Voigt Jun 07 '19 at 23:06
  • This worked. Gonna have to make a trillion cases for all of the different types. – user430481 Jun 10 '19 at 16:39
  • @user430481 Perhaps update your question or ask a different question explaining more about the different types? – NetMage Jun 10 '19 at 17:30
  • @NetMage probably not worth it, it has more to do with VTK. The files can take on any type that the user wants: They can use Int32, Int64, Float32, Float64, Int8, etc. Totally up to the user. I think I just need to read in the data type and then make a trillion cases for the different cases that will use BitConverter.{Appropriate Converter Method}. D – user430481 Jun 10 '19 at 19:12