0

I have an xml file that I want to extract some specific fields from

Sample test file

<ScheduleTask ID="ZmwnBtczlnYOBLyK" Code="000-000.0.0.0" Name="SETTLEMENT.BRIDGE" ParentSummaryTaskID="">
    <Quantities>
        <STQuantity MethodID="416" RecipeID=""/>
    </Quantities>
    <Timings>
        <Timing LocationID="$$$$$&gt;C1&gt;NB" CompletionRate="1">
            <Planned Begin="2012-03-08T07:14:57" End="2012-03-28T07:14:57"/>
            <Actual Begin="2012-03-31T06:00:00" End="2012-05-01T14:00:00"/>
        </Timing>
        <Timing LocationID="$$$$$&gt;C1&gt;SB" CompletionRate="0">
            <Planned Begin="2012-12-04T06:07:29" End="2012-12-24T06:07:29"/>
            <Forecast Begin="2013-04-18T09:16:37" End="2013-06-04T12:06:02"/>
        </Timing>
    </Timings>
    </ScheduleTask>

So, I have this function that looks for a line with /ScheduleTask (includes the <)

//xml file into ontology
list<stack<string> > addWordsFromFile(string filename, int changeLevel)
{
    ifstream ontology;
    ontology.open(filename.c_str()); 
string ontTemp, line;

list<stack<string> > control_list;

while (true) {
    getline(ontology, line); //read line
    if (ontology.fail()) break; //boilerplate check for error
    line.erase(remove(line.begin(), line.end(), '\t'), line.end()); //remove tabs

    if(line == "</ScheduleTasks>") break; //check for end of document

        if(line == "</ScheduleTask>") {
        ontTemp.clear(); //clear memory
    }

    //look for activity
    if(line.substr(1, 15) == "ScheduleTask ID") {
        int i = 41;
        while (line[++i] != '"') {ontTemp += line[i]; }
    }

    if (ontTemp != "" ) {//ready to add
        stack<string> tempMem;

        tempMem.push(ontTemp);

        control_list.push_back(tempMem);
    }
}
ontology.close();
return control_list;
}

In windows this function works fine and the /ScheduleTask is found - in Linux it is not found, though other fields are found just fine with if(line.substr(1, 15) == "ScheduleTask ID")

compiled with VisualStudio 2008 and g++

My question: 1) why doesn't this work in Linux and 2) how does it work?

forest.peterson
  • 755
  • 2
  • 13
  • 30
  • I downloaded a hexadecimal reader, '' after the previous 'carriage return' '0D' is '0A' 'line feed' then '09', '09', '09' 'horizontal tab', for '0A090909'; I remove the horizontal tabs with 'line.erase(remove(line.begin(), line.end(), '\t'), line.end())' so is it the 0A line feed that Linux sees as still there for '0A' != '' – forest.peterson Feb 01 '13 at 03:51
  • Also, the file is saved as ANSI – forest.peterson Feb 01 '13 at 04:04

1 Answers1

1

This comes down to the definition of the end of a line. Windows EOL is \r\n while Linux EOL is \n. So the line

</ScheduleTask>\r\n

comes back from getline as

</ScheduleTask> on Windows but </ScheduleTask>\r in Linux.

Chris G
  • 1,026
  • 6
  • 6
  • I will try if(line == "" || line == "\r") - this seems sketchy, but after reading wiki Newline the development of these codes was sketchy; Teletype... ? – forest.peterson Feb 01 '13 at 04:18