0

I'm trying to modify specific lines in a 6 gig text file (SQL script). So I read it in with IO.StreamReader.ReadLine and write to a new file with IO.StreamWriter.WriteLine. If the line matches a certain condition, I'm modifiying it before I write it.

The problem is, the resulting file is exactly half (1.999582...) the size of the original file...

I'm trying to make sure the encoding is the same using:

sw = New IO.StreamWriter(NewFilepath, False, sr.CurrentEncoding)

But it doesn't make a difference, the new file is half the size of the old...

John
  • 2,653
  • 4
  • 36
  • 57
  • 1
    hmm I would try looking at the first part of each file with a hex viewer. And second try: define another encoding for reading and writing. Start with ascii, or because of half size try utf8 and then utf16. Maybe the reader will change the encoding when it notices that something differs, but just assumptions that may be wrong – Amegon Apr 09 '13 at 22:22
  • 1
    Could you post your code? Have you compared the files? A look through the first few lines should reveal something - open them in a raw hex editor to see what has changed. Are you missing characters, bytes, or lines, etc? What about testing on smaller files - something short for debugging? – J... Apr 10 '13 at 00:47
  • It was definitely an encoding thing... When I ran the SQL script, I noticed some characters didn't render correctly (for instance the o with the double dots on top rendered as a different character). I'd just like to know how to preserve encoding in this situation from the source file to the destination file... – John Apr 11 '13 at 17:28

1 Answers1

1

Where are you setting the encoding for your StreamReader, sr? If you are not doing this explicitly, and if you are setting the encoding of the StreamWriter before you perform any reads of your file(my best guess), then the CurrentEncoding of the StreamReader may change (it autodetects from the source file).

From MSDN on StreamReader.CurrentEncoding

The current character encoding used by the current reader. The value can be different after the first call to any Read method of StreamReader, since encoding autodetection is not done until the first call to a Read method.

To determine the encoding you can read off the first line of the file with the StreamReader and then do :

sw = New IO.StreamWriter(NewFilepath, False, sr.CurrentEncoding)
J...
  • 30,968
  • 6
  • 66
  • 143
  • Ohhh, the encoding detection isn't done until AFTER the first read... That explains it. Thanks! – John Apr 15 '13 at 13:17