2

I have a text file that contains a fixed length table that I am trying to parse. However, the beginning of the file is general information about when this table was generated (IE Time, Data, etc).

To read this I have attempted to make a FileStream, then read the first part of this file with a StreamReader. I parse out what I need from the top part of the document, and then when I am done, set the stream's position to the first line of the structured data.

Then I attach a TextFieldParser to the stream (with appropriate settings for the fixed length table), and then attempt to read the file. On the first row, it fails, and in the ErrorLine property, it lists off the last half of the third row of the table. I stepped through it and it was on the first row to read, yet the ErrorLine property suggests otherwise.

When debugging, I found that if I tried using my StreamReader.ReadLine() method after I had attached the TextFieldParser to the stream, the first 2 row show up fine. When I read the third row however, it returns a line where it starts with the first half of the third row (and stops right where the text in ErrorLine would be) appends some part from much later in the document. If I try this before I attach the TextFieldParser, it reads all 3 rows fine.

I have a feeling this has to do with my tying 2 readers to the same stream. I'm not sure how to read this with a structured part and an unstructured part, without just tokenizing the lines myself. I can do that but I assume I am not the first person to want to read part of a stream one way, and a later part of a stream in another.

Why is it skipping like this, and how would you read a text file with different formats?

Example:

Date: 3/1/2013
Time: 3:00 PM
Sensor:  Awesome Thing

Seconds   X        Y          Value
0         5.1      2.8        55
30        4.9      2.5        33
60        5.0      5.3        44

Code tailored for this simplified example:

Boolean setupInfo = true;
DataTable result = new DataTable();
String[] fields;
Double[] dFields;

FileStream stream = File.Open(filePath,FileMode.Open);

StreamReader reader = new StreamReader(stream);

String tempLine;

for(int j = 1; j <= 7; j++)
{
   result.Columns.Add(("Column" + j));
}

//Parse the unstructured part
while(setupInfo)
{
   tempLine = reader.ReadLine();
   if( tempLine.StartsWith("Date:  "))
   {
       result.Rows.Add(tempLine);
   }
   else if (tempLine.StartsWith("Time:  "))
   {
       result.Rows.Add(tempLine);
   }
   else if (tempLine.StartsWith("Seconds")
   {
      //break out of this loop because the 
      //next line to be read is the unstructured part
      setupInfo =  false;
   }
}

//Parse the structured part
TextFieldParser parser = new TextFieldParser(stream);
parser.TextFieldType = FieldType.FixedWidth;
parser.HasFieldsEnclosedInQuotes = false;
parser.SetFieldWidths(10, 10, 10, 10);

while (!parser.EndOfData)
{
   if (reader.Peek() == '*')
   {
       break;
   }
   else
   {
       fields = parser.ReadFields();

       if (parseStrings(fields, out dFields))
       {
           result.Rows.Add(dFields);
       }
   }
}
return result;
Xantham
  • 1,829
  • 7
  • 24
  • 42
  • can you post your code? it will help identify the problem – VladL Mar 01 '13 at 22:58
  • @VladL Okay, I added code tailored to this example. One thing to note is taht I am adding the data to a `DataTable` and returning it from this function. – Xantham Mar 01 '13 at 23:09

3 Answers3

5

The reason it's skipping is that the StreamReader is reading blocks of data from the FileStream, rather than reading character-by-character. For example, the StreamReader might read 4 kilobytes from the FileStream and then parse out the lines as required to respond to ReadLine() calls. So when you attach the TextFieldParser to the FileStream, it's going to read from the current file position -- which is where the StreamReader left it.

The solution should be pretty simple: just connect the TextFieldParser to the StreamReader:

TextFieldParser parser = new TextFieldParser(reader);

See TextFieldParser(TextReader reader)

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • That does seem to fix it. Tell me if I understand this correctly. `Streamreader` leaves off (text-wise) at the first row of the table, which (lets say) is part way through block 3. The `TextFieldParser.ReadFields()`, then starts reading block 4, it being the next block. It then fails since it is trying to only parse half of a line with the widths I stated. If I pass in the `StreamReader`, it forces it to start at the next character, not at the next block of memory? – Xantham Mar 02 '13 at 00:37
  • 1
    @Xantham: Yes, you have the concept down. `StreamReader` put some characters in its pocket. By attaching your `TextFieldReader` to the `StreamReader`, you're reading those characters. As the parser continues to read, it requests characters from the `StreamReader`, which in turn gets data from the `FileStream` and passes it on to the parser. – Jim Mischel Mar 02 '13 at 05:12
2

Generally speaking, most streams are consuming - that is, once read, it's no longer available. You could fork off to multiple streams by writing an intermediary class that derives from Stream and either raises an event, republished to other streams, etc.

JerKimball
  • 16,584
  • 3
  • 43
  • 55
0

In your case you don't need the StreamReader. The best choice is to check the file contents is using the File.ReadLines method instead. It will not load the whole file content, just the lines until you've found all that you need:

foreach (string line in File.ReadLines(filePath))
{
    if( line.StartsWith("Date:  "))
    {
        result.Rows.Add(line);
    }
    else if (line.StartsWith("Time:  "))
    {
        result.Rows.Add(line);
    }
    else if (line.StartsWith("Seconds"))
    {
       break;
    }
}

EDIT

You can do it even more simple using LINQ:

var d = from line in File.ReadLines(filePath) where line.Contains("Date:  ") select line;
result.Rows.Add(d);
VladL
  • 12,769
  • 10
  • 63
  • 83
  • But how does that help him parse the second part of the file? – Jim Mischel Mar 01 '13 at 23:31
  • @JimMischel as much as I understood, he has no problem there, just using the stream twice is a problem – VladL Mar 01 '13 at 23:33
  • My point being that, unless I misunderstood, he's trying to read the first N lines of the file as raw lines, then read the next part of the file with the `TextFieldParser`. The problem he's having is how to start the `TextFieldParser` at the proper position in the file. – Jim Mischel Mar 01 '13 at 23:37