How to optimize printing out the difference between the greater and the lesser of two integers?

Question

UVA Problem no. 10055, Hashmat the Brave Warrior, probably the easiest problem there. The input consists of a series of pairs of unsigned integers ≤ 2^32 (thus mandating the use of 64bit integers…) For each pair the task is to print out the difference between the greater and the lesser integer.

According to the statistics, the fastest solutions run in below 0.01 sec. However, all my attempts to solve this typically run in 0.02 sec, with probably random deviations of ± 0.01 sec.

I tried:

#include <cstdint>
#include <iostream>
using namespace std;

int main()
{
  ios_base::sync_with_stdio(false);
  cin.tie(nullptr);  

  uint_fast64_t i, j;
  while(cin >> i >> j) {
    if(i > j)
      cout << i-j << '\n';
    else
      cout << j-i << '\n';
  }
}

And also:

#include <cstdlib>
#include <cstdint>
#include <iostream>
using namespace std;

int main()
{
  ios_base::sync_with_stdio(false);
  cin.tie(nullptr);  

  int_fast64_t i, j;
  while(cin >> i >> j) {
    cout << abs(i-j) << '\n';
  }
}

And also:

#include <algorithm>
#include <cstdint>
#include <iostream>
using namespace std;

int main()
{
  ios_base::sync_with_stdio(false);
  cin.tie(nullptr);  

  uint_fast64_t i, j;
  while(cin >> i >> j) {
    cout << max(i,j)-min(i,j) << '\n';
  }
}

All with same results.

I also tried using printf()/scanf() instead of cin/cout, still with same results (besides, my benchmarks were showing that cin/cout preceded by cin.tie(nullptr) can be even a little faster than printf()/scanf() – at least unless there are some ways to optimize the performance of cstdio I’m not aware of).

Is there any way to optimize this down to below 0.01 sec., or should I assume that guys who’ve achieved this time are either extremely lucky or cheaters printing out a precomputed answer to the judge’s input?

The programs are compiled with C++11 5.3.0 - GNU C++ Compiler with options: -lm -lcrypt -O2 -std=c++11 -pipe -DONLINE_JUDGE.

EDIT: This is my attempt to combine the advices of @Sorin and @MSalters:

#include <stdio.h>
#include <stdint.h>

unsigned long long divisors[] = {
  1000000000,
  1000000000,
  1000000000,
  1000000000,
  100000000,
  100000000,
  100000000,
  10000000,
  10000000,
  10000000,
  1000000,
  1000000,
  1000000,
  1000000,
  100000,
  100000,
  100000,
  10000,
  10000,
  10000,
  1000,
  1000,
  1000,
  1000,
  100,
  100,
  100,
  10,
  10,
  10,
  1,
  1,
  1
};


int main()
{
  unsigned long long int i, j, res;

  unsigned char inbuff[2500000]; /* To be certain there's no overflow here */
  unsigned char *in = inbuff;
  char outbuff[2500000]; /* To be certain there's no overflow here */
  char *out = outbuff;

  int c = 0;

  while(1) {
    i = j = 0;

    inbuff[fread(inbuff, 1, 2500000, stdin)] = '\0';

    /* Skip whitespace before first number and check if end of input */
    do {
      c = *(in++);
    } while(c != '\0' && !(c >= '0' && c <= '9'));

    /* If end of input, print answer and return */
    if(c == '\0') {
      *(--out) = '\0';
      puts(outbuff);
      return 0;
    }

    /* Read first integer */
    do {
      i = 10 * i + (c - '0');
      c = *(in++);
    } while(c >= '0' && c <= '9');

    /* Skip whitespace between first and second integer */
    do {
      c = *(in++);
    } while(!(c >= '0' && c <= '9'));

    /* Read second integer */
    do {
      j = 10 * j + (c - '0');
      c = *(in++);
    } while(c >= '0' && c <= '9');

    if(i > j)
      res = i-j;
    else
      res = j-i;



    /* Buffer answer */
    if(res == 0) {
      *(out++) = '0';
    } else {
      unsigned long long divisor = divisors[__builtin_clzll(res)-31];
      /* Skip trailing 0 */
      if(res < divisor) {
        divisor /= 10;
      }
      /* Buffer digits */
      while(divisor != 0) {
        unsigned long long digit = res / divisor;
        *(out++) = digit + '0';
        res -= divisor * digit;
        divisor /= 10;
      }
    }
    *(out++) = '\n';
  }
}

Still 0.02sec.

Simply `while(cin >> i >> j) cout << j-i << '\n';`? From the description it seems, first is never greater than second. — Henrik, May 22 '17 at 09:11
@Henrik "These two numbers in each line denotes the number soldiers in Hashmat’s army and his opponent’s army **or vice versa**." — Rakete1111, May 22 '17 at 09:14
OK, I see. What about `max(i,j)-min(i,j)` with unsigned 32 bit integers? — Henrik, May 22 '17 at 09:26
@Henrik, as I said, "The input consists of a series of pairs of unsigned integers ≤ 2^32 (thus mandating the use of 64bit integers…)". From the problem description: "The input numbers are not greater than 2^32". I have confirmed that the input does indeed contain the number 2^32. The maximum number in range of an unsigned 32bit int is 2^32-1. — , May 22 '17 at 09:28
@Henrik besides, I think that on UVA `uint_fast32_t` binds to a 64bit integer. — , May 22 '17 at 09:29
*"thus mandating the use of 64bit integers"* Not necessary. if min possible value is `1`. (*"Hashmat’s soldier number is never greater than his opponent."*, and now is Hashmat is a soldier of his army ?) — Jarod42, May 22 '17 at 09:54
@Jarod42 I have checked and confirmed that number `0` does appear in the input as well. — , May 22 '17 at 10:33
@Jarod42 Besides, I have checked and confirmed that on UVA, `uint_fast32_t` binds to the same type as `uint_fast64_t`. Correct me if I’m wrong, but does this mean that there are no possible advantages in using a 32bit integer in lieu of a 64bit integer? — , May 22 '17 at 10:40
This whole exercise makes zero sense. 0.02±0.01 is mostly noise. — n. m. could be an AI, May 22 '17 at 13:15
@n.m. That may be correct, but since apparently some ppl managed to make this run below it, I’m curious how — , May 22 '17 at 13:34
Some [asm results](https://godbolt.org/g/nti4QD) of different ways to get the diff: `std::abs` has the smallest number of instructions. — Jarod42, May 23 '17 at 08:42

score 2 · Answer 1 · answered May 22 '17 at 12:17

2

I would try to eliminate IO operations. Read one block of data (as big as you can). Compute the outputs, write them to another string then write that string out.

You sscanf or stringstream equivalents to read/write from your memory blocks.

IO usually needs to go through the kernel so there's a small chance that you would loose the CPU for a bit. There's also some cost(time) associated with it. It's small but you are trying to run in less than 10ms.

answered May 22 '17 at 12:17

Sorin

11,863
22
26

I tried doing as You suggested. See my edited question or http://ideone.com/CAE0Yd . Still 0.02 sec :( – May 22 '17 at 13:22
See http://ideone.com/a18qoi and my edited question, this is my attempt to combine yours and MSSalters' advice, did I do this right? – May 22 '17 at 16:06

user3811082 · Answer 2 · 2017-05-22T17:11:48.123

0

Here is my variant with assembler routines.

#include <iostream>
#include <string>

using namespace std;

int main()
{
   unsigned long long i, j;
   string outv;   
   while(cin >> i >> j) {
     asm("movq %0, %%rax;"
         "movq %1, %%rdx;"  
         "subq %%rax, %%rdx;"
         "jns .L10;"        
         "notq %%rdx;"      
         "addq $0b1, %%rdx;"
         ".L10: movq %%rdx, %0": : "g"(i), "g"(j) );       
     string str = to_string(i);
     outv += str + "\n";     
    }
    cout << outv;   
 }

edited May 22 '17 at 17:11

answered May 22 '17 at 12:01

user3811082

218
1
7

code.cpp: Assembler messages: code.cpp:14: Error: too many memory references for `mov' code.cpp:14: Error: too many memory references for `sub' code.cpp:14: Error: no instruction mnemonic suffix given and no register operands; can't size instruction code.cpp:14: Error: too many memory references for `add' – May 22 '17 at 12:08
command line for gcc `-O2 -std=c++11 -pipe -DONLINE_JUDG -masm="intel"`. VS version compiled on VS2010 no compilation errors – user3811082 May 22 '17 at 12:10
Fine, but on UVA we’re stuck with `C++11 5.3.0 - GNU C++ Compiler with options: -lm -lcrypt -O2 -std=c++11 -pipe -DONLINE_JUDGE`. Or `C++ 5.3.0 - GNU C++ Compiler with options: -lm -lcrypt -O2 -pipe -DONLINE_JUDGE`. Or `ANSI C 5.3.0 - GNU C Compiler with options: -lm -lcrypt -O2 -pipe -ansi -DONLINE_JUDGE`. The compiler options are out of users’ control, so unless You can modify Your solution to work with one of those command lines, I sadly cannot use Your solution :( – May 22 '17 at 12:14
I add AT&T version of assembler. no need for `-masm="intel"` directive – user3811082 May 22 '17 at 12:33
code.cpp: Assembler messages: code.cpp:18: Error: unsupported instruction `mov' code.cpp:18: Error: unsupported instruction `mov' – May 22 '17 at 13:18
i delete all versions except AT&T instructions for GCC – user3811082 May 22 '17 at 13:20
Wrong Answer. I suppose this is because of Your use of `unsigned int`, whereas the problem requires the use of 64bit integers. Throws the aforementioned compile errors when I manually substitute `unsigned int` with `unsigned long long int`. Could You kindly modify Your assembler to work with 64bit integers? If using assembly is the solution to get this working faster, I think I’ll start learning assembly… – May 22 '17 at 13:34
`The input consists of a series of pairs of unsigned integers ≤ 2^32`. This code should work in all cases. If you need 64bit i'll write the code – user3811082 May 22 '17 at 13:40
Unfortunately, the range of an unsigned 32bit integer, which `unsigned int` seems to be, is up to 2^32-1 inclusive. So, to my best understanding, 2^32 sadly overflows `unsigned int`. I tested this with an assembler-less C++ program. The judge returns Wrong Answer when I use `unsigned int`, and Accepted when I use `unsigned long long int`. As of now, the online judge sadly returns Wrong Answer for Your code :( – May 22 '17 at 13:45
WOW! This is beyond my understanding. Your 64bit asm code runs in… 0.15 sec. Checked twice to make sure. :( – May 22 '17 at 13:58
I wasn’t trying to be ironic. Honestly, I’m sincerely thankful for Your effort to help me. – May 22 '17 at 14:01
No, wait, I’m really sorry. Your 64bit asm code before Your edit to the answer run in 0.15 sec because of my mistake. I forgot to add `ios_base::sync_with_stdio(false); cin.tie(nullptr);` to the beginning of `main`. With this fixed, Your 64bit asm code runs in 0.02 sec. – May 22 '17 at 14:20
Try the fixed code with mixed 64 bit - 32 bit directives. Unfortunately i have no compiler now then may be errors. – user3811082 May 22 '17 at 14:24
code.cpp: Assembler messages: code.cpp:16: Error: unsupported instruction `mov' – May 22 '17 at 14:26
Quack. For some reason the judge returns Wrong Answer :( – May 22 '17 at 16:17
) The variant `4294967296 0` does not allow to simplify the addition operator. Changed – user3811082 May 22 '17 at 16:25
Still Wrong Answer :( – May 22 '17 at 16:29
It's strange. My mingw64 allows such instructions. Retreated to the original version. – user3811082 May 22 '17 at 16:44
Wrong Answer means the program compiles correctly and runs without crashing, but what it prints out is different to what the judge expects it to print. – May 22 '17 at 16:48

score 0 · Answer 3 · answered May 22 '17 at 14:05

0

printf is a swiss army knife. It knows many ways to format its arguments, and that can be any number. In this case, you want a single dedicated function, so you don't wast time scanning for the single occurrence of %d. (BTW, this is a speed benefit of std::cout << - the compiler sorts out the overloading at compile time).

Once you have that single formatting function, make it output to a single char[] and call puts on that. As puts does no formatting of its own, it can be much faster than printf.

answered May 22 '17 at 14:05

MSalters

173,980
10
155
350

http://ideone.com/X8utoK – my try to implement Your idea. Is it OK, or can it be improved? 0.02 sec. still. – May 22 '17 at 15:07
See http://ideone.com/a18qoi and my edited question, this is my attempt to combine Yours and Sorin's advice, did I do this right? – May 22 '17 at 16:06

Mohamed El-Nakeep · Answer 4 · 2017-06-10T00:47:22.583

The trick is using :

unsafe Input : https://www.quora.com/What-is-the-fastest-input-output-method-in-C++ . On Windows use Microsoft Thread unsafe version https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/getchar-nolock-getwchar-nolock , as this Codeforces submission: http://codeforces.com/contest/339/submission/27533017 .

On Linux and Mac OS, for GCC and clang use https://linux.die.net/man/3/unlocked_stdio POSIX Standard thread unsafe version (Unlocked Stdio).
Custom input: or sometimes called Naive Input is faster than standard functions. It is about getting characters from input and converting it to integer. To optimize inputting from console, read: http://stackoverflow.com/questions/705303/faster-i-o-in-c/705378 . To optimize string to integer, read article: http://tinodidriksen.com/2010/02/16/cpp-convert-string-to-int-speed/ , and read code: http://tinodidriksen.com/uploads/code/cpp/speed-string-to-int.cpp . For Speed comparison read: http://codeforces.com/blog/entry/5217 and code: https://bitbucket.org/andreyv/cppiotest/src/tip/iotest.cpp?fileviewer=file-view-default

This solution which runs in less than 0.001 seconds , is based on UVa Online Judge submission http://ideone.com/ca8sDu that was solved by http://uhunt.felix-halim.net/id/779215 ; However this Solution is Abridged and modified #include

#define pll(n) printf("%lld ",(n))
#define plln(n) printf("%lld\n",(n))
typedef long long ll;

#if defined(_WINDOWS) // On Windows GCC, use the slow thread safe version
inline int getchar_unlocked() {
    return getchar();
}
#elif defined  (_MSC_VER)// On   Visual Studio
inline int getchar_unlocked(){
    return _getchar_nolock(); // use Microsoft Thread unsafe version
}
#endif 

inline int  scn( ll & n){
     n = 0;
     int  c = getchar_unlocked(),t=0;
    if (c == EOF) 
        return 0;
    while(c < '0' || c > '9') {
        if(c==45)
            t=1;
        c = getchar_unlocked(); 
    }
    while(c >= '0' && c <= '9'){
        n = n *10+ c - '0';       
        c = getchar_unlocked();
    }
    if(t!=0)
        n *=-1;
    return 1;
}

int main(){
    ll n, m;
    while(scn(n)+scn(m)==2){
        if (n>m)
            plln(n - m);
        else
            plln(m - n);
    }
    return 0;
}

How to optimize printing out the difference between the greater and the lesser of two integers?

4 Answers4

Linked