string size() returns 1 too large value in evaluation system

Question

if I have very simple part of code like:

string myvariable;
getline(cin, myvariable);
cout << myvariable.size();

and if I run that program locally it returns appropriate value (so exactly number of characters including spaces in the given string).

But if I upload that program to the programs evaluation system (sth like programming olympics or spoj.com) I get value of size() 1 too much.

For example if myvariable value is "test", then:

locally:

size() == 4

in evaluation system:

size() == 5

I tried .length() but the result is exactly the same. What is the reason for that? Thank you for your answers!

If you do not trim whitespace explicitly, it might be something like a different line ending ('\n' vs. '\r\n'). — Peter H, Oct 21 '22 at 07:51
the reason is most probably that the value of `myvariable` is not what you think it is. — 463035818_is_not_an_ai, Oct 21 '22 at 07:51
you can for example print ascii codes of individual characters to see also non printable ones — 463035818_is_not_an_ai, Oct 21 '22 at 07:52
@463035818_is_not_a_number I verified the value of my string, it is what it should be - and additionally, as I wrote, locally it's all good. — Kentucker, Oct 21 '22 at 07:53
@PeterH I tried cin.ignore() which should help in that situation, but didn't work — Kentucker, Oct 21 '22 at 07:54
did you loop through the characters and inspected their ascii code? What do you get? If `size` says it is `5` then you should get `5` characters and their ascii code, not `4` — 463035818_is_not_an_ai, Oct 21 '22 at 07:54
how do you check that it is only 4 characters? A loop `for (const auto& c : myvariable) .. ` has only 4 iterations? cannot be if `size == 5` . Or your program has undefined behavior, then anything is possible. Please try to create a [mcve] — 463035818_is_not_an_ai, Oct 21 '22 at 07:55
cin.ignore() can help clearing the buffer (removing obsolete whitespace) _before_ reading the line. Depending on how you use it, it would not help with a spurious '\r' at the end. Without being able to inspect the actual characters on the "evaluation system", it will be hard to diagnose. — Peter H, Oct 21 '22 at 07:58
@Kentucker *I verified the value of my string,* -- How did you do that? There is nothing wrong in `size()` returning 5. It is telling you the truth. Did you verify this by trying to output the string? If you did, that is not verification -- there could be invisible control characters at the end of your string. The proper way to verify this is to either print out the value (hex, decimal, etc.) of each character in the string, or get the memory view of what that string holds. Using the console to verify what the string actually has in it is not verification. — PaulMcKenzie, Oct 21 '22 at 08:02
@PaulMcKenzie Ok, understood, but why it works differently locally and in that remote system? I verified that and for string "test test" I got hex: 74 65 73 74 20 74 65 73 74 d with that "d" in the end in remote system. Why? — Kentucker, Oct 21 '22 at 08:19
@Kentucker `for (char c : myvariable) std::cout << (int)c << "\n" ` -- That will reveal what is going on. — PaulMcKenzie, Oct 21 '22 at 08:19
@PaulMcKenzie Thank you, it actually helped, could you please verify my previous comment and suggest why I get that 'd' at the end? Thank you a lot! — Kentucker, Oct 21 '22 at 08:20
Hex 'd' is carriage return in ASCII, aka '\r'. I venture the guess that the evaluation system uses Windows line endings. — Peter H, Oct 21 '22 at 08:22
`d` is not a base 10 integer. If it were hex, then that is the carriage return. Maybe you are uploading Windows text files to a Unix system, or vice-versa, and not using the proper tools to ensure that the text files have been translated properly? — PaulMcKenzie, Oct 21 '22 at 08:23
Thank you a lot! :) That explains the problem. Is there any way how to make that more universal so that it can return the same value regardless the context? Is that problem with input files encoding then? — Kentucker, Oct 21 '22 at 08:25
The problem is not a C++ one. It is a systems level problem -- you need to contact whoever runs it and explain the issue. If the system expects Windows text files, then you have to supply those types of files to it. How you do that, again, is a system's issue. To Windows text interpretation, a Unix text file is alien, as it does not have the requisite line endings, or even the correct end-of-file marker. — PaulMcKenzie, Oct 21 '22 at 08:26
Depending on what the problem is you want to solve, you can just trim the whitespace (or just any type of new line characters) at the end of your string after reading it in. See also https://stackoverflow.com/questions/216823/how-to-trim-an-stdstring?answertab=trending#tab-top on how to do that in C++. — Peter H, Oct 21 '22 at 08:27

Peter H · Accepted Answer · 2022-10-21T14:12:53.650

1

After the discussion in the comments, it is clear that the issue involves different line ending encodings from different operating systems. Windows uses \r\n and Linux/Unix use \n. The same content may be represented as

"Hello World!\n" // in a Linux file

or

"Hello World!\r\n" // in a Windows file

The method getline by default uses \n as delimiter on Linux/Unix, so it would yield a one greater size for the Windows file, including the unwanted character \r, represented as hex value 0D.

In order to fix this, you can either convert the line endings in your file, or trim the string after reading it in. For example:

string myvariable;
getline(cin, myvariable);
myvariable.erase(myvariable.find_last_not_of("\n\r") + 1);

See also How to trim an std::string? for more ways to trim a string for different types of whitespace.

edited Oct 21 '22 at 14:12

answered Oct 21 '22 at 08:44

Peter H

475
5
14

1

The function `getline` does not "by default [use] `\n` as delimiter". It uses the delimiter that's standard for the platform that the compiler is targeting. On a platform where the convention is that `\n` marks a line ending, `getline` will swallow that one character. On a platform where the convention is that `\r\n` marks a line ending, `getline` will swallow both characters. The real answer is to make sure that your text files use the appropriate line ending convention for your operating system. – Pete Becker Oct 21 '22 at 12:51
@PeteBecker I added the qualification to the statement, specifying that getline uses '\n' as delimiter on Linux/Unix. Regardless, I feel that it is not necessarily always wise to rely on textfiles having the expected line endings for your current system. – Peter H Oct 21 '22 at 14:04
You're absolutely right that if you don't pay attention to the conventions of text files for your operating system you can't rely on standard facilities to handle them correctly. The solution is to not blindly copy files and hope that things will work out, but to use file transfer tools that apply the appropriate conventions. – Pete Becker Oct 21 '22 at 14:22

string size() returns 1 too large value in evaluation system

1 Answers1