22

I get a FileStream(filename,FileMode.Open,FileAccess.Read,FileShare.ReadWrite) and then a StreamReader(stream,true).

Is there a way I can check if the stream started with a UTF8 BOM? I am noticing that files without the BOM are read as UTF8 by the StreamReader.

How can I tell them apart?

bookclub
  • 579
  • 2
  • 4
  • 8

3 Answers3

17

Rather than hardcoding the bytes, it is prettier to use the API

public string ConvertFromUtf8(byte[] bytes)
{
  var enc = new UTF8Encoding(true);
  var preamble = enc.GetPreamble();
  if (preamble.Where((p, i) => p != bytes[i]).Any()) 
    throw new ArgumentException("Not utf8-BOM");
  return enc.GetString(bytes.Skip(preamble.Length).ToArray());
}
Carlo V. Dango
  • 13,322
  • 16
  • 71
  • 114
  • 1
    @carlo-v-dango, I'd recommend adding some kind of null-check since bytes may be empty if file is empty. `if (preamble.Where((p, i) => bytes.Length > i && p != bytes[i]).Any())` or whatever floats your boat. – Martin Oct 11 '19 at 08:06
8

You can detect whether the StreamReader encountered a BOM by initializing it with a BOM-less UTF8 encoding and checking to see if CurrentEncoding changes after the first read.

var utf8NoBom = new UTF8Encoding(false);
using (var reader = new StreamReader(file, utf8NoBom))
{
    reader.Read();
    if (Equals(reader.CurrentEncoding, utf8NoBom))
    {
        Console.WriteLine("No BOM");
    }
    else
    {
        Console.WriteLine("BOM detected");
    }
}
Nathan Baulch
  • 20,233
  • 5
  • 52
  • 56
  • I never would have thought that this would work. Thanks! It is really too bad that the opposite isn't true. You can't pass int UTF8Encoding(true) and have it return UTF8Encoding(false). – Cameron Taggart Jun 30 '15 at 00:13
8

Does this help? You check the first three bytes of the file:

    public static void Main(string[] args)
    {
        FileStream fs = new FileStream("spork.txt", FileMode.Open);
        byte[] bits = new byte[3];
        fs.Read(bits, 0, 3);

        // UTF8 byte order mark is: 0xEF,0xBB,0xBF
        if (bits[0] == 0xEF && bits[1] == 0xBB && bits[2] == 0xBF)
        {

        }

        Console.ReadLine();
    }
}