0

I have written a program that processes text files one at a time and extract relevant information. My program works well with some of the text files and not others. There is no obvious difference between the files that run seamlessly through my program and those that don't.

As far as the problematic files are concerned:

  1. the program opens the file
  2. it reads in and processes a good chunk of the lines one at a time as it should
  3. But then it reaches a problem line and gives the error message:

    "Debug Assertion Failed File: f:/dd/vctools/crt_bld/self_x86/src/isctype.c Line: 56 Expression: (unsigned)(c+1) <= 256"

When I enter the debugger mode the problem seems to arise from the "while(tokenScanner)" loop in my code below. I pulled up the content of the problem line being processed and compared that across a couple of problem files and I found that the Assertion Failure message pops up at </li> where the last token being processed is ">". It's not clear to me why this is a problem. This particular token in the original text file is contiguous with <li in the form </li><li. Therefore the scanner is having trouble half way throught this string. Any thoughts on why this is and how I can fix this? Any advice would be much appreciated!

Here is the relevant portion of my code:

#include <string>
#include <iostream>
#include <fstream> //to get data from files
#include "filelib.h"
#include "console.h"
#include "tokenScanner.h"
#include "vector.h"
#include "ctype.h"
#include "math.h"

using namespace std;


/*Prototype Function*/
void evaluate(string expression);
Vector<string> myVectorOfTokens; //will store the tokens
Vector<string> myFileNames;

/*Main Program*/
int main() {

    /*STEP1 : Creating a vector of the list of file names 
              to iterate over for processing*/
    ifstream infile; //declaring variable to refer to file list
    string catchFile = promptUserForFile(infile, "Input file:");
    string line; //corresponds to the lines in the master file containing the list files
    while(getline(infile, line)){
        myFileNames.add(line);
    }

    /* STEP 2: Iterating over the file names contained in the vector*/
    int countFileOpened=0; //keeps track of number of opened files

    for (int i=1; i< myFileNames.size(); i++){
        myVectorOfTokens.clear(); //resetting the vector of tokens for each new file

        string fileName;
        string line2;
        ifstream inFile;
        fileName= myFileNames[i];

        inFile.open(fileName.c_str());   //open file convert c_str

        if (inFile){
            while(getline(inFile, line2)){
                evaluate(line2);
            }
        }

        inFile.close();
        countFileOpened++;
    }
    return 0;
}

/*Function for Extracting the Biographer Name*/
void evaluate(string line){
    /*Creating a Vector of Tokens From the Text*/
    TokenScanner scanner(line); //the constructor
    while (scanner.hasMoreTokens()){
        string token=scanner.nextToken();
        myVectorOfTokens.add(token);
    }
}
  • The problem Dieter points out is important, and since the actual tokenizing code is *not* in this non-reproducible, non-compilable sample, its really all you're likely to get. I suggest you fix that first, and if you still have a problem post an [SSCCE](http://www.sscce.org) that we can actually work with. – WhozCraig Sep 18 '13 at 19:34
  • hello, thanks for your note. where would i add the text files to create the SSCCEE you suggest? – user2792668 Sep 19 '13 at 00:10
  • As an update to your question. *Don't* paste a wall of code here in a comment. Edit the question. And I would use a *local*-scope `std::ifstream` for the token-reads, but that is minor. You description of the error messages is indicating your problem is *not* in the code as-posted (beyond the problem `Deiter found). Forego the tokenization and just print the lines as you read them to verify the files are reading correctly. – WhozCraig Sep 19 '13 at 00:17
  • hello again, forgive me but i am a first time user of this forum and i don't understand what you are saying. i described the problem in as much detail as I can as you can see from my post and i've included the relevant code with comments. then you proposed an SSCCE post, which based on the description entails me adding example text files along side my code. Am I missing something here? – user2792668 Sep 19 '13 at 00:23
  • A SSCCE means the code your *posted code* can literally be copied, pasted into a text file, compiled and it exhibits *the problem you're having.* The code should be short, self-contained, **compilable** (on *our* environments; not just yours) and exemplary of the problem you're describing. – WhozCraig Sep 19 '13 at 00:26
  • thanks for your reply. i have tried whether the lines are reading correctly and they are. When I have the program simply print the lines of the text files there are not problems there. – user2792668 Sep 19 '13 at 00:26
  • The the problem is in code you never posted. It is in your tokenizer itself. Look there. An start debugging. – WhozCraig Sep 19 '13 at 00:26
  • I'm also having this when the tokenizer enconters a "°" char. In fact its ASCII code is 176 (0xb0) in my string... and not 248 as it should...? I'll continue looking for a solution... – Pedro Ferreira Aug 28 '15 at 11:03
  • I found a working solution, don't know if it is the best one. Tracing inside the tokenizer source code I found that we only go into the asserting code if BOOST_NO_CWCTYPE is not defined, so I added "#define BOOST_NO_CWCTYPE" to my code and now it works ok. I guess it is something to do with wide chars and corresponding ASCII codes... – Pedro Ferreira Aug 28 '15 at 15:43

1 Answers1

1

while(!inFile.eof()

is just wrong (in almost any case)

while(getline(inFile, line2))
      evaluate(line2);

is better

  • thanks for yout note but this is the only way to get it to run through the entire file since it is not well formatted. I don't believe that is where my problem resides however. thanks again though. – user2792668 Sep 18 '13 at 19:33
  • I made the change you recommeneded and my code is still working in the same way. – user2792668 Sep 19 '13 at 00:09