4

All I want is a c++ program that will read in a txt file, put each row in an array, then print a duplicate copy into another txt file. Here's my code...

    #include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{
        string STRING ="";
        string list[10000];
        int i = 0;
    ifstream infile;
    infile.open ("C:/Users/Ryan/Desktop/data.txt");
      ofstream myfile;
  myfile.open ("C:/Users/Ryan/Desktop/data-2.txt");

        while(!infile.eof()) // To get you all the lines.
        {
            getline(infile,STRING);
            list[i]=STRING;
            myfile<<list[i];
            ++i;
        }


    infile.close();
  myfile.close();

  return 0;
}

For some reason though doing this, every other line gives me a bunch of funky Chinese symbols. Here's my data.txt...

BPC 20101206    V   0.13    0.13    0.13    0
BPC 20101207    V   0.13    0.13    0.13    6500
BPC 20101208    V   0.13    0.13    0.13    0
BPC 20101209    V   0.13    0.125   0.125   117000
BPC 20101210    V   0.125   0.125   0.125   0
BPC 20101213    V   0.125   0.125   0.125   0
BPC 20101214    V   0.13    0.13    0.13    5000
BPC 20101215    V   0.13    0.13    0.13    290
BPC 20101216    V   0.125   0.115   0.115   24000

And here's the output data-2.txt...

BPC 20101206    V   0.13    0.13    0.13    0
䈀倀䌀ऀ㈀ ㄀ ㄀㈀ 㜀ऀ嘀ऀ ⸀㄀㌀ऀ ⸀㄀㌀ऀ ⸀㄀㌀ऀ㘀㔀  ഀ BPC 20101208    V   0.13    0.13    0.13    0
䈀倀䌀ऀ㈀ ㄀ ㄀㈀ 㤀ऀ嘀ऀ ⸀㄀㌀ऀ ⸀㄀㈀㔀ऀ ⸀㄀㈀㔀ऀ㄀㄀㜀   ഀ BPC 20101210    V   0.125   0.125   0.125   0
䈀倀䌀ऀ㈀ ㄀ ㄀㈀㄀㌀ऀ嘀ऀ ⸀㄀㈀㔀ऀ ⸀㄀㈀㔀ऀ ⸀㄀㈀㔀ऀ ഀ BPC 20101214    V   0.13    0.13    0.13    5000
䈀倀䌀ऀ㈀ ㄀ ㄀㈀㄀㔀ऀ嘀ऀ ⸀㄀㌀ऀ ⸀㄀㌀ऀ ⸀㄀㌀ऀ㈀㤀 ഀ BPC  20101216    V   0.125   0.115   0.115   24000

Any ideas?

  • I just ran your code (on Linux), except that you need to include a new-line command at `myfile< – Unapiedra Dec 06 '11 at 11:41
  • 1
    not completely related to the problem, but prefer "while (getline(...))" over while(!infile.eof()) and be careful with declaring large buffers on the stack (string list[10000]), prefer "std::vector list(10000)" (you are on stackoverflow, after all) – stefaanv Dec 06 '11 at 11:45
  • If I use ANSI encoding and open in WordPage, it looks a little better. But this doesn't explain why every other line is messed up. It looks like this...BPC 20101206 V 0.13 0.13 0.13 0 ?????? ? ?? ???? ???? ???? ?????? ?? BPC 20101208 V 0.13 0.13 0.13 0 ?????? ? ?? ???? ???? ????? ???????? ?? BPC 20101210 V 0.125 0.125 0.125 0 ?????? ? ??????? ????? ????? ????? ?? BPC 20101214 V 0.13 0.13 0.13 5000 ?????? ? ??????? ???? ???? ?????? ?? BPC 20101216 V 0.125 0.115 0.115 24000 – user1083385 Dec 06 '11 at 12:55
  • @user1083385: I have copied your code and compiled it on gcc (code blocks) .Its working fine .Try opening it in editor.It should work fine.DO let me know reason for your particular output. – Imposter Dec 06 '11 at 14:05

4 Answers4

4

To address your original problem it looks like you are outputting some un-formatted characters (that happen to be Chinese). I don't see you inserting new lines (yet there seems to be new lines in the output) so there is something missing from your code that you are not showing us. Please CUT/PASTE REAL code.

  1. What is the real code.
  2. How do you generate the output from the files (was it cat did you open it in an editor)?

Main thing to note:

Never ever ever do this:

   while(!infile.eof())

You should read and test the line before using it.
This can be done in a single line by putting the read into the condition:

    while(getline(infile,STRING))
    {
        list[i]=STRING;
        myfile<<list[i];
        ++i;
    }

Other things to watch for:

  1. Format your code nicely!!!!!!
  2. Don't use all caps identifiers (these are usually reserved for macros).
  3. Don't use arrays. Use a std::vector instead.
  4. Declare and open file in one line

    ifstream infile("C:/Users/Ryan/Desktop/data.txt");
    
  5. Don't test for EOF as the loop conition.

    • It can cause an infinite loop if there is another problem.
    • If you don't check the read worked then you repeat the processing of the last line.
  6. Don't manually close the file (let the destructor do it)

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • Just one correction: you have to close the output file manually in order to ensure that the close worked, and there was no error. Returning `0` (or `EXIT_SUCCESS`) from main if there was a write error is a serious program error. – James Kanze Dec 06 '11 at 11:58
  • @James: OK I can agree with that to an extent (context depending). – Martin York Dec 06 '11 at 12:01
3

If I'm not mistaken, your OS is Windows?

The reason for these "chinese" symbols is that your .txt file is encoded in Unicode. Open it in notepad click Save As and in the Encoding drop-down list (somewhere at the bottom of the dialog box) choose ANSI, then save. That should fix the "chinese" problem :)

Check other answers to fix the problems with your code. Hope that helps.

jrok
  • 54,456
  • 9
  • 109
  • 141
  • @johnathon: Personally I am not convinced it is an encoding issue yet. I think more likely the code provided is not what the OP is running and the real problem is hidden (as there are definitely garbage characters in the file). – Martin York Dec 06 '11 at 16:36
  • @LokiAstari agreed. However , from his output , its much more likely that he's got a buffer overflow of some sort going on.. which would generate those nifty garbage characters. And if the file was opened with a editor that's displaying Unicode characters he'll get those nifty symbols he's seeing. Also possible is that he's writing the whole string class to the file, insted of the Cstring that the string class contains. – johnathan Dec 06 '11 at 17:01
  • I would agree that he has some overflow or memory over-right that is causing more than the specified string to be printed (which includes some garbage characters that cause Unicode interpretation by the following application). But the code above does not exhibit that problem (it uses string) and for the input file size does not overflow the array. Thus the OP is not sharing some critical piece of information or the code above is not identical to the code he is running. – Martin York Dec 06 '11 at 17:06
  • @johnathon @LokiAstari I was just about to write the same thing when I found this answer. I used firefox when I first saw this post (now under chrome in another computer), and can see the block characters containing the character code. There are things like 0900 3100 which just correspond to `'\t'` and `'1'`. And when I check the unicode of the first Chinese Character (the same as `'B''\0'`) output I am convinced that this is caused by unicode file. As others have answered, the OP dropped one `'\n'` each line, which caused the next line to shift one byte, and causing the bad output. – fefe Dec 06 '11 at 17:20
  • @LokiAstari The reason I suspected encoding is the first comment to OP, where Unapiedra says the output on Linux looks fine. – jrok Dec 06 '11 at 18:23
1

when you write back the data you are not including the newline so you are not creating a duplicate file if that was your intention.

there seems no reason to have the List[] array since you anyway are writing it back directly, instead you could do

getline(infile,STRING);
myfile << STRING << endl;  

btw STRING is not a good variable name, choose something more descriptive.

AndersK
  • 35,813
  • 6
  • 60
  • 86
1

Here is a nice article detailing alternatives to while(!eof)... here

Carl Winder
  • 938
  • 8
  • 18