0

I have a text file and i am converting it to unicode, and then want to save the content to a file. I want to save in the file in 2 formats:

  1. In unicode
  2. In English like characters (as file.doc)

UnicodeEncoding u = new UnicodeEncoding();
byte[] filebytes = u.GetBytes("C:/file.doc");
File.WriteAllBytes(@"C:/uni.doc", filebytes); // unicode
File.WriteAllBytes(@"C:/ori.doc", filebytes); // As the Original file
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
Sharon Watinsan
  • 9,620
  • 31
  • 96
  • 140
  • 1
    Since when are `.doc` files text files? I think you may be confused, and have a Microsoft Word document there instead. There's a _lot_ more going in a Word document than just text or Unicode. – Joel Coehoorn Sep 17 '13 at 21:04
  • You use `.doc` as the file extension. Is this a Word file? Also, the path delimiter in Windows is \ , not /. – Andrew Morton Sep 17 '13 at 21:04
  • @AndrewMorton actually windows supports both directions. – Scott Chamberlain Sep 17 '13 at 21:05
  • @ScottChamberlain Isn't it potentially unreliable to use that? [Get directory separator char on Windows? ('\', '/', etc.)](http://stackoverflow.com/questions/7314606/get-directory-separator-char-on-windows-etc). And the MS document [Naming Files, Paths, and Namespaces](http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx) specifies using \ as the directory separator character. – Andrew Morton Sep 17 '13 at 21:13
  • @AndrewMorton it only unsafe if you use the `\\?\ ` prefix. From the same document you link: "*These prefixes are not used as part of the path itself. They indicate that the path should be passed to the system with minimal modification, which means that **you cannot use forward slashes to represent path separators**, or a period to represent the current directory...*" so as long as you are not using `\\?\ ` it should be fine. – Scott Chamberlain Sep 17 '13 at 21:23

2 Answers2

3

Bytes are bytes: just 8-bit binary numbers.

Encodings apply only to text, which you've not got if you've done a binary read.

If you want to read a text file in one encoding and write it in another, you can do so something like so:

Encoding sourceEncoding = Encoding.UTF8  ; // or whatever encoding the source file is encoded with
Encoding targetEncoding = Encoding.UTF32 ; // or whatever destination encoding you desire
string   data           = File.ReadAllText( @"C:\original.txt" , sourceEncoding ) ;
File.WriteAllText( @"C:\different-encoding.txt" , data , targetEncoding ) ;

You should bear in mind that strings are internally represented in the CLR infrastructure as a UTF-16 encoding of Unicode text.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
0

GetBytes convert string to bytes, does not Take file path as input. You have to use StreamReader to read the file text. And to get encoding bytes, you just pass the Read bytes to System.Text.Encoding.UTF16.GetBytes(stringIJustReadFromFile);

For ASCII, Use System.Text.Encoding.ASCII.GetBytes(stringIJustReadFromFile), the you can use StreamWriter to write them to other files.

fahadash
  • 3,133
  • 1
  • 30
  • 59