11

I have this code:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = System.Text.Encoding.UTF8.GetString(b);
byte[] b2 = System.Text.Encoding.UTF8.GetBytes(s);
Int32 i2 = BitConverter.ToInt32(b2,0);;

i2 is equal to -272777233. Why isn't it the input value? (14000000) ?

EDIT: what I am trying to do is append it to another string which I'm then writing to file using WriteAllText

mcmillab
  • 2,752
  • 2
  • 23
  • 37
  • 2
    Did you know that string is not just an array of bytes? – John Saunders Jan 05 '13 at 02:44
  • In c and c++, a string is just an array of bytes with a zero at the end. That's not true in C# and .NET. – John Saunders Jan 05 '13 at 02:47
  • BTW, what were the `Length` properties of `b`, `s`, and `b2`? – John Saunders Jan 05 '13 at 02:48
  • so what's the best way to convert an int to a string and back? Ideally with the string being fixed length, and as short as possible? – mcmillab Jan 05 '13 at 02:48
  • It's unclear what you mean. Do you mean to conver the integer to a string or to interpret the string as an integer or to interpret the reference pointer as int? What? – Cole Tobin Jan 05 '13 at 02:57
  • I want the 4 bytes representing the int, but as a string not byte array – mcmillab Jan 05 '13 at 03:07
  • Not sure why I've been downvoted here? For not knowing something? – mcmillab Jan 05 '13 at 03:19
  • @mcmillab Because your question isn't clear. You need to write what you want to do. Although you already made comments on that you didn't edit the question, so... (And BTW please check my updated answer) – Alvin Wong Jan 05 '13 at 03:23
  • You can't turn bytes into text like that. A character is represented using two bytes, so four bytes would be two characters, but not all character codes are used, so turning bytes into characters would result in characters that can't be encoded into bytes and written to a file. So, what exactly are you trying to do? – Guffa Jan 05 '13 at 03:41

5 Answers5

15

Because an Encoding class is not going to just work for anything. If a "character" (possibly a few bytes in case of UTF-8) is not a valid character in that particular character set (in your case UTF-8), it will use a replacement character.

a single QUESTION MARK (U+003F)

(Source: http://msdn.microsoft.com/en-us/library/ms404377.aspx#FallbackStrategy)

Some case it is just a ?, for example in ASCII/CP437/ISO 8859-1, but there is a way for you to choose what to do with it. (See the link above)

For example if you try to convert (byte)128 to ASCII:

string s = System.Text.Encoding.ASCII.GetString(new byte[] { 48, 128 }); // s = "0?"

Then convert it back:

byte[] b = System.Text.Encoding.ASCII.GetBytes(s); // b = new byte[] { 48, 63 }

You will not get the original byte array.

This can be a reference: Check if character exists in encoding


I can't imagine why you would need to convert a byte array to a string. It obviously doesn't make any sense. Let's say you're going to write to a stream, you could just directly write byte[]. If you need to use it in some text representation, it makes perfect sense to just convert it to a string by yourIntegerVar.ToString() and use int.TryParse to get it back.


Edit:

You can write a byte array to a file, but you are not going to "concatenate" the byte array to a string and use the lazy method File.WriteAllText because it is going to handle the encoding conversion and you will probably end up having question marks ? all over your file. Instead, Open a FileStream and use FileStream.Write to directly write the byte array. Alternatively, you can use a BinaryWriter to directly write an integer in its binary form (and also a string) and use its counterpart BinaryReader to read it back.

Example:

FileStream fs;

fs = File.OpenWrite(@"C:\blah.dat");
BinaryWriter bw = new BinaryWriter(fs, Encoding.UTF8);
bw.Write((int)12345678);
bw.Write("This is a string in UTF-8 :)"); // Note that the binaryWriter also prefix the string with its length...
bw.Close();

fs = File.OpenRead(@"C:\blah.dat");
BinaryReader br = new BinaryReader(fs, Encoding.UTF8);
int myInt = br.ReadInt32();
string blah = br.ReadString(); // ...so that it can read it back.
br.Close();

This example code will result in a file which matches the following hexdump:

00  4e 61 bc 00 1c 54 68 69 73 20 69 73 20 61 20 73  Na¼..This is a s  
10  74 72 69 6e 67 20 69 6e 20 55 54 46 2d 38 20 3a  tring in UTF-8 :  
20  29                                               )   

Note that BinaryWriter.Write(string) also prefix the string with its length and it depends on it when reading back, so it is not appropriate to use a text editor to edit the resulting file. (Well you are writing an integer in its binary form so I expect this is acceptable?)

Community
  • 1
  • 1
Alvin Wong
  • 12,210
  • 5
  • 51
  • 77
  • because I'm appending it to another string which I'm then writing to file using WriteAllText. Anyhow your answer has explained why I can't do this, thanks – mcmillab Jan 05 '13 at 03:02
12

You shouldn't use Encoding.GetString to convert arbitrary binary data into a string. That method is only intended for text that has been encoded to binary data using a specific encoding.

Instead, you want to use a text representation which is capable of representing arbitrary binary data reversibly. The two most common ways of doing that are base64 and hex. Base64 is the simplest in .NET:

string base64 = Convert.ToBase64String(originalBytes);
...
byte[] recoveredBytes = Convert.FromBase64String(base64);

A few caveats to this:

  • If you want to use this string as a URL parameter, you should use a web-safe version of base64; I don't know of direct support for that in .NET, but you can probably find solutions easily enough
  • You should only do this at all if you really need the data in string format. If you're just trying to write it to a file or similar, it's simplest to keep it as binary data
  • Base64 isn't very human-readable; use hex if you want humans to be able to read the data in its text form without extra tooling. (There are various questions specifically about converting binary data to hex and back.)
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
5

It's not working because you are using encoding backwards.

Encoding is used to turn text into bytes, and then back into text again. You can't take any arbitrary bytes and turn into text. Every character has a corresponding byte pattern, but every byte pattern doesn't translate into a character.

If you want a compact way to represent bytes as text, use base-64 encoding:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = Convert.ToBase64String(b);

byte[] b2 = Convert.FromBase64String(s);
Int32 i2 = BitConverter.ToInt32(b2, 0);
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • This would write a base64 string, but the OP wants to write 4-bytes directly in its binary form, so my answer will be more appropriate. – Alvin Wong Jan 05 '13 at 03:19
  • @AlvinWong: Converting an integer into four bytes doesn't make it text, so your answer is not appropriate at all. – Guffa Jan 05 '13 at 03:38
3

If your goal here is to store an integer as a string then back to an integer, unless I am missing something wouldn't the following suffice:

int32 i1 = 1400000;
string s = il.ToString();
Int32 i2 = Int32.Parse(s);
James
  • 80,725
  • 18
  • 167
  • 237
1

To make a long story short:

You need a encoding that maps each bytevalue to a unique char and vice versa. A UTF8 Character can be from 1 to 4 bytes long so you wont archive that mapping, you need a more basic encoding like ASCII. Unfortunaly the original ASCII doesnt do that, it is just a 7-bit encoding and only defines the lower 128 Codes, the upper half (extended codes) is codepage specific. To get the full range translation, you just need a complete 8-bit encoding like in codepage 437 or 850 or whatever:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = System.Text.Encoding.GetEncoding(437).GetString(b);
byte[] b2 = System.Text.Encoding.GetEncoding(437).GetBytes(s);
Int32 i2 = BitConverter.ToInt32(b2,0);