Why is functional-styled code so much faster than imperative code in C?

Question

I've been working on an algorithm for calculating maximum depth of an expression (i.e. how many nested parentheses there are) in various languages just for fun/practice.

I noticed there's a huge performance discrepancy in the performance of the functional styled C code and the imperative styled C code and was wondering why that is.

Given the string "(1+(2*3)+((8)/4))+1" the imperative code finishes consistently in about 10-13us but the functional code takes 2-3us, more than twice as fast. Both algorithms are compiled with -O2 and gcc, so I found this extremely surprising, but I don't know enough about the compiler's implementation to understand why.

So can anyone tell me why the functional code is so significantly faster?

Functional code (note the _ERR stuff are just #define's with integers):

const int max_depth_functional(
        const char *expr, const int sum, const int max) {
    switch(*expr) {
        case '\0':
            return sum == 0 ? max : UNTERM_PARENTH_ERR;
        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
        case '+': case '-': case '*': case '/': case '^':
            return max_depth_functional(expr + 1, sum, max);
        case '(':
            return max_depth_functional(
                expr + 1, sum + 1, sum + 1 > max ? sum + 1 : max
            );
        case ')':
            return max_depth_functional(expr + 1, sum - 1, max);
        default:
            return INVALID_EXPR_ERR;
    }
}

Imperative code:

const int max_depth_imperative(const char *expr) {
    int curr_sum = 0, curr_max = 0;
    while(*expr != '\0') {
        switch(*expr++) {
            case '0': case '1': case '2': case '3': case '4':
            case '5': case '6': case '7': case '8': case '9':
            case '+': case '-': case '*': case '/': case '^':
                break;
            case '(':
                curr_sum++;
                curr_max = curr_sum > curr_max ? curr_sum : curr_max;
                break;
            case ')':
                curr_sum--;
                break;
            default:
                return INVALID_EXPR_ERR;
        }
    }
    return curr_sum == 0 ? curr_max : UNTERM_PARENTH_ERR;
}

Both are called like:

const clock_t start = clock();
const int func_result = max_depth_func(args[1]);
const clock_t end = clock();

Also, I'm using Linux x86_64 to build and run

0) it is not important 1) locality of reference & cache trashing x) generalise the progam to also accept {} and []. xx) maybe even recognise "" and '' strings? — wildplasser, Oct 15 '20 at 19:32
Const doesn't add anything to a return by value. And [watch this talk](https://youtu.be/koTf7u0v41o) — JHBonarius, Oct 15 '20 at 19:35
A single run time measuring is really not representative. Run several thousands of times (on different inputs) and time that. — Eugene Sh., Oct 15 '20 at 19:35
Also: check the generated code. Possibly (doubtfull) the tail-recursion is detected and removed. [and please dont use microsecond benchmarks, or run them repeatedly] — wildplasser, Oct 15 '20 at 19:39
1) To over-generalize your specific example to "imperative vs. functional style" is simply *wrong*. 2) Per Eugene Sh: a single run is really not representative. 3) If you still see a significant diffence with many runs, then: a) Try different compilers (e.g. gcc vs MSVS), b) try different optimization levels (eg. -Oo vs -O3) and c) Generate assembly output (e.g. gcc/-S, or MSVS/Fa) — paulsm4, Oct 15 '20 at 19:39
@EugeneSh. I guess you're right. I just put the calls in a for loop from 0 to 100000 and averaged their times. Result was average of 0.002412ms for imperative and average of 0.002421ms for functional — Dylan Turner, Oct 15 '20 at 19:43

score 0 · Answer 1 · answered Oct 15 '20 at 19:46

As per comments, I ran the code using:

double imperative_time_sum = 0, functional_sum_time = 0;
for(int i = 0; i < 100000; i++) {
    const clock_t start_imp = clock();
    max_depth(args[1]);
    const clock_t end_imp = clock();
    max_depth_functional_fast(args[1], 0, 0);
    const clock_t end_func = clock();

    imperative_time_sum +=
        1000 * (double) (end_imp - start_imp) / CLOCKS_PER_SEC;
    functional_sum_time +=
        1000 * (double) (end_func - end_imp) / CLOCKS_PER_SEC;
}
printf("Average imperative: %fms\n", imperative_time_sum / 100000);
printf("Average functional: %fms\n", functional_sum_time / 100000);

Which produced results:

Average imperative: 0.002412ms
Average functional: 0.002421ms

Although I reran the program upwards of 100 times before, I hadn't run anywhere close to 100000 times. After that, the times were very close to eachother.

I'll accept this as an answer in two days when it lets me. – Dylan Turner Oct 15 '20 at 19:50 — Dylan Turner, Oct 15 '20 at 19:50

Why is functional-styled code so much faster than imperative code in C?

1 Answers1