Efficient way of approaching the Subset Sum Problem with very large input sets

Question

The problem I am facing: I need to find a way to deal with very large sets (3 to 10000000) of positive and negative ints, this seemed relatively impossible based off of previous experiments. However, I received hope when I found a Algorithm on github that is really efficient.

However, I really need to adjust it to work with positive and negative numbers... but I am struggling, I know the unsigned int's should be int. but that's all I've got so far.

#include<stdio.h>
#include<stdlib.h>
#include<time.h>

short* flag;
int N, K, correspond = 0;
unsigned int* check, X = 0;
clock_t t1, t2;

void init() {
    int i, j;

    printf("N=");
    scanf_s("%d", &N);

    check = malloc(sizeof(unsigned int) * N);
    if (check == NULL) {
        perror("Out of memory");
        exit(-1);
    }

    srand((unsigned)time(NULL));
    printf("\n///check list///\n");
    for (i = 0; i < N; i++) {
        check[i] = rand() % 1000000 + 1;
        printf("%uyen ", check[i]);
    }
    printf("\n");

    K = rand() % N;

    flag = malloc(sizeof(short) * N);
    for (i = 0; i < N; i++)flag[i] = 0; 
    i = 0;
    while (i <= K) {
        j = rand() % N;
        if (flag[j] == 0) {
            flag[j] = 1;
            X = X + check[j];
            i++;
        }
    }
    printf("\nX=%uyen\n", X);
}

void swap(int j, int k) {
    unsigned int tmp;

    tmp = check[j];
    check[j] = check[k];
    check[k] = tmp;
}

int partition(int left, int right) {
    int j = left, k = right;
    unsigned int v;

    v = check[(left + right) / 2];
    do {
        while (check[j] > v) j++;
        while (v > check[k]) k--;
        swap(j, k);
    } while (check[j] != check[k]);

    return j;
}

void quicksort(int left, int right) {
    int j;

    if (left < right) {
        j = partition(left, right);
        quicksort(left, j - 1);
        quicksort(j + 1, right);
    }
}

void func(unsigned int sum, int i) {
    int j, k, t = 0;

    if (sum == X) {
        correspond = 1;
        t2 = clock();
        double record = (double)(t2 - t1) / CLOCKS_PER_SEC;

        printf("\nAnswer : ");
        for (k = 0; k < N; k++) {
            if (flag[k] == 1) {
                if (t == 0) t = 1;
                else if (t == 1) printf("+");
                printf("%u", check[k]);
            }
        }
        printf("\n\nThinking time : %f sec . \n", record);
        if (record <= 60) printf("Triumph!\n");
        else printf("Failure...\n");
        return;
    }
    else if (sum < X) {
        for (j = i + 1; (j <= N) && (correspond == 0); j++) {
            flag[j] = 1;
            func(sum + check[j], j);
        }
    }
    flag[i] = 0;
    return;
}

int main() {
    int i;

    init();
    t1 = clock();
    for (i = 0; i < N; i++)flag[i] = 0;
    quicksort(0, N);
    func(0, 0);

    return 0;
}

EDITS: Thanks for all of your inputs, it does help to get some constructive criticism. To start off here is the to the Github Repo https://github.com/parthnan/SubsetSum-BacktrackAlgorithm credit goes to Parth Shirish Nandedkar. The name of this Algorithm is "Amortized O(n) algorithm based on Recursive Backtracking" I am not really sure why it would be called "Amortized" as this would mean it divides the input array into multiple subset and use polynomial-time algorithm on each one.

**I have tried to fix the issues mentioned by ** "chux - Reinstate Monica"... please let me know if I did it incorrectly.

Asides: a) why don't you use array index `[0]`? b) you can use `calloc` instead of iterating through the allocation. — Weather Vane, Feb 03 '23 at 22:55
What problem exactly are you trying to solve? Can you describe what do you mean by "approaching the Subset Sum Problem"? — Franciszek Malinka, Feb 03 '23 at 23:33
There is no question in your post, you did not say what algorithm you found on GitHub or provide a URL to it, and there are no comments in the code you posted. Edit the post to give a URL to the origin of this code (and credit the author), to state what this code is attempting to do (not just that it is supposed to solve the Subset Sum Problem but what the algorithm is and what the parts of the code are doing specifically), and to ask a specific question. — Eric Postpischil, Feb 04 '23 at 00:19
Talon Van Vuuren, Tip: rather than spend effort to coordinate the pointer and type as with `flag = malloc(sizeof(short) * N);`, consider the easier to code, review and maintain `flag = malloc(sizeof flag[0] * N);` — chux - Reinstate Monica, Feb 04 '23 at 13:50

chux - Reinstate Monica · Answer 1 · 2023-02-04T00:50:48.443

seemed relatively impossible based off of previous experiments.

First fix known problems.

At least these problems:

Out of range access

flag = malloc(sizeof(short) * (N + 1)); allocates such that code can access flag[0] to flag[N].

for (k = 1; k <= N + 1; k++) { if (flag[k] == 1) { attempts to access flag[N+1]. Result undefined behavior (UB).

Mis-matched printf

warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'unsigned int' [-Wformat=]

More undefined behavior (UB).

printf("%ldyen ", check[i]);
printf("\nX=%ldyen\n", X);

Allocation success

Since the goal is "very large sets (3 to 10000000)", code definitely should check allocation success to save debug time.

check = malloc(sizeof(unsigned int) * (N + 1));
if (check == NULL) {
  perror("Out of memory");
  exit -1;
}

Side issues:

Code uses non-idiomatic array access

Code indexes [1...N]. More common to use [0...N-1].

Heavy use of global variables

More common to local variables, passing data as needed on function arguments.

"deal with ... positive and negative ints"

Fix that before posting - or if not important, no need to mention it here.

Please see my edits on the post, I hope it is a little better now. — Talon Van Vuuren, Feb 04 '23 at 08:28

Efficient way of approaching the Subset Sum Problem with very large input sets

1 Answers1