3

My goal is to take a file of sentences, apply some basic filtering, and output the remaining sentences to a file and the terminal. I'm using the Hunspell library.

Here's how I get sentences from the file:

    public static string[] sentencesFromFile_old(string path)
    {
        string s = "";
        using (StreamReader rdr = File.OpenText(path))
        {
            s = rdr.ReadToEnd();
        }
        s = s.Replace(Environment.NewLine, " ");
        s = Regex.Replace(s, @"\s+", " ");
        s = Regex.Replace(s, @"\s*?(?:\(.*?\)|\[.*?\]|\{.*?\})", String.Empty);
        string[] sentences = Regex.Split(s, @"(?<=\. |[!?]+ )");
        return sentences;
    }

Here's the code that writes to file:

        List<string> sentences = new List<string>(Checker.sentencesFromFile_old(path));
        StreamWriter w = new StreamWriter(outFile);
        foreach(string x in xs)
            if(Checker.check(x, speller))
            {
                w.WriteLine("[{0}]", x);
                Console.WriteLine("[{0}]", x);
            }

Here's the checker:

    public static bool check(string s, NHunspell.Hunspell speller)
    {
        char[] punctuation = {',', ':', ';', ' ', '.'};
        bool upper = false;
        // Check the string length.
        if(s.Length <= 50 || s.Length > 250)
            return false;
        // Check if the string contains only allowed punctuation and letters.
        // Also disallow words with multiple consecutive caps.
        for(int i = 0; i < s.Length; ++i)
        {
            if(punctuation.Contains(s[i]))
                continue;
            if(Char.IsUpper(s[i]))
            {
                if(upper)
                    return false;
                upper = true;
            }
            else if(Char.IsLower(s[i]))
            {
                upper = false;
            }
            else return false;
        }
        // Spellcheck each word.
        string[] words = s.Split(' ');
        foreach(string word in words)
            if(!speller.Spell(word))
                return false;
        return true;
    }

The sentences are printed on the terminal just fine, but the text file cuts off mid-sentence at 2015 characters. What's up with that?

EDIT: When I remove some parts of the check method, the file is cut off at various lengths somewhere around either 2000 or 4000. Removing the spellcheck eliminates the cutoff entirely.

Isaac G.
  • 93
  • 1
  • 6

2 Answers2

7

You need to flush the stream before closing it.

w.Flush();
w.Close();

The using statement (which you should also use) will Close the stream automatically, but it will not flush it.

using( var w = new StreamWriter(...) )
{
  // Do stuff
  w.Flush();
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
Paul Alexander
  • 31,970
  • 14
  • 96
  • 151
  • Np, personally I think the natural behavior of Closing the stream explicitly should also flush it. Not sure why they decided not to do it that way. – Paul Alexander Apr 12 '11 at 20:03
  • 1
    You *don't* need to explicitly flush - but you *do* need to explicitly close. Just closing will flush it automatically. The problem is that nothing is closing the writer at all at the moment. – Jon Skeet Apr 12 '11 at 20:05
  • @Paul Checking in ILSpy for StreamWriter, it looks like Flush will be called on Dispose. Anyway, calling Flush before close certainly won't hurt anything. – rsbarro Apr 12 '11 at 20:06
  • Wish that were true. I've walked through dozens of bugs where the stream had to be flushed before it was closed. – Paul Alexander Apr 12 '11 at 20:07
  • @rsbarro: It will hurt the readability of the code, IMO... especially if you want to flush even if an exception is thrown. It's a shame to see an (unintentionally) misleading answer being accepted :( Just a `using` statement is a better approach. – Jon Skeet Apr 12 '11 at 20:08
  • @Paul: Well `StreamWriter` *definitely* flushes before closing, and that's what we're dealing with here. In .NET 1.1 `CryptoStream` *didn't*, and I raised a bug - it was fixed in .NET 2.0. Care to cite any other specific examples? `FileStream` definitely flushes... it's not clear to me whether `NetworkStream` does or not, admittedly. – Jon Skeet Apr 12 '11 at 20:10
  • @Jon True, but I see where Paul is coming from though. It looks like the StreamWriter implementation does call Flush, but not every stream implementation may do that. – rsbarro Apr 12 '11 at 20:11
  • @Jon - I agree that simply Disposing the stream as via the `using` statement is the natural way. I've just seen _several_ cases where it just didn't get the stream flushed. – Paul Alexander Apr 12 '11 at 20:12
  • @rsbarro: StreamWriter isn't an implementation of Stream though... it's things like this that mean people end up setting variables to null pointlessly all over the place because they've seen *one* place where it was actually necessary. – Jon Skeet Apr 12 '11 at 20:12
  • @Paul: If you could enumerate those cases (in terms of the types involved) that would certainly be useful. For simple `FileStream` it's not necessary, and personally I think it *shouldn't* be necessary in any well-written stream class... – Jon Skeet Apr 12 '11 at 20:13
  • @Jon- I Absolute agree in principle. I don't have any ready made code handy, but most of the cases were related to reading/writing from secure streams and network streams. There have been a few occasions as well when doing exactly what the OP is attempting that the console flushed just fine and the stream writer didn't. Next time I come across this I'll post a gist for reference. – Paul Alexander Apr 12 '11 at 20:16
  • @Jon Fair enough point and I agree. Personally, I just go with the using statement. – rsbarro Apr 12 '11 at 20:16
2

Are you closing the StreamWriter after you are done writing? You could try something like this:

using(StreamWriter w = new StreamWriter(outFile))
{
    foreach(string x in xs)
    {
        if(Checker.check(x, speller))
        {
            w.WriteLine("[{0}]", x);
            Console.WriteLine("[{0}]", x);
        }
    }
}

The using statement will close the StreamWriter (by calling the Dispose method on the StreamWriter) after the code inside is done executing.

rsbarro
  • 27,021
  • 9
  • 71
  • 75