0

Is there a way to get a string out of an IReadOnlyList<byte>, given a specific Encoding?

To be more precise, is there a way that doesn't copy the content of the collection before passing it to the Encoding object?

My main concern is performance, followed by memory usage.

miniBill
  • 1,743
  • 17
  • 41

2 Answers2

0

First, you would have to test if you were using a single or dual byte Encoding.

If you are using single byte encoding, you could simply Linq query the byte value directly to a string using Select and Encoding.GetString(byte);

If you are using dual-byte encoding, you could ennumerate two bytes at a time into a buffer. Since you would be re-writing a value type (byte) into an array element, you would only ever use storage for two bytes during the process, although you would be copying each byte out.

I think it would look something like this, but BEWARE: I don't have a compiler on this machine so I cannot verify the syntax (this is C#-ish code :) )

public string example(IReadOnlyList<byte> someListIGotSomewhere, Encoding e)
{
 string retVal = null;
 if(e.IsSingleByte)
 {
     retVal = string.Join("",someListIGotSomewhere.Select(b=>e.GetString(new byte[]{b})));
 }
 else
 {
   StringBuilder sb = new StringBuilder(someListIGotSomewhere.Count()/2);
   var enumerator = someListIGotSomewhere.GetEnumerator();
   var buffer = new byte[2]
   while(enumerator.MoveNext())
   {
     buffer[0] = enumerator.Current;
     buffer[1] = enumerator.MoveNext()?enumerator.Current:0;
     sb.Append(e.GetString(buffer));
   }
   retVal = sb.ToString();
 }
 return retVal;
}
RJ Programmer
  • 788
  • 5
  • 7
  • Just thought of something -- you might have to factor out the encoding preamble... It's been a while since I played in the this "encoding" space... – RJ Programmer Jul 24 '13 at 22:51
  • Well that down vote came quick, and without even a comment... If the answer is bad, can you tell me why? – RJ Programmer Jul 24 '13 at 22:54
  • Although I haven't tested it I can bet this code is MUCH slower that simply copying the list to a byte array... (downvote isn't mine btw) – miniBill Jul 24 '13 at 22:59
  • Not the downvoter, but I don't believe this code would work for the UTF32 encoding. – Nicole DesRosiers Jul 24 '13 at 23:00
  • @miniBill - Your question didn't indicate that performance was the concern. The question was "Is there a way to ..." It's my mistake for assuming you want to optimize memory utilization with a very large ReadOnly list, rather than reduce the performance cost of the byte copy. – RJ Programmer Jul 24 '13 at 23:05
  • 1
    I'm sorry, I'll make that explicit – miniBill Jul 25 '13 at 09:32
0

We now have someone working on high performance and zero copy parsing of strings and byte sequences.

https://github.com/dotnet/corefxlab/blob/master/docs/specs/parsing.md

miniBill
  • 1,743
  • 17
  • 41