I wrote the following to convert a byte array data
to a string array hex
containing 32 bytes as hex string per entry to write them to a file.
byte[] data = new byte[4*1024*1024];
string[] hex = data.Select((b) => b.ToString("X2")).ToArray();
hex = Enumerable.Range(0, data.Length / 32).Select((r) => String.Join(" ", hex.Skip(r * 32).Take(32))).ToArray(); // <= This line takes forever
The problem was that it took minutes(!) to finish although the resulting file was less than 20MB. So I tried to optimize it and came up with the following:
byte[] data = new byte[4*1024*1024];
string[] hex = new string[4*1024*1024/32];
for (var i = 0; i <= hex.Length - 1; i++)
{
var sb = new System.Text.StringBuilder();
sb.Append(data[i * 32].ToString("X2"));
for (var k = 1; k <= 32 - 1; k++)
{
sb.Append(' ');
sb.Append(data[i * 32 + k].ToString("X2"));
}
hex[i] = sb.ToString();
}
This version does the same but is several orders of magnitude faster (133 ms vs 8 minutes).
My problem is that I don't really understand why the original version is so slow. I looked at the source of String.Join()
and it looks pretty similar to my improved version.
I like to use LINQ for these kind of thinks, because you can solve all kinds of problems pretty easily and I thought it was efficient in most cases because of it's lazy evaluation. So I would like to know what I am missing here to improve my future usage of LINQ.
As a side not I know that it could probably be written even faster but this is really not the point here, because the second version is fast enough for a function only used for debugging purposes.