0

I have created a program which counts how many times a string in a list has been found and prints that number on the screen and saves it in an int *arr. However, when there are two same strings , the count result is obviously printed & stored twice in the output/list. My question is this: Can i check if a word has been found twice and if so, then free that memory block and use realloc() to reallocate memory for the whole int *arr ? Here's my sortedCount() method which does what i stated above so far:

void sortedCount(int N) {
    int *wordCount;
    int i = 0;
    wordCount = malloc(N * sizeof(int));
    for(i = 0; i < N; i++) {
        wordCount[i] = count(N,wordList[i],1);
    }
    /* free mem */
    free(wordCount);
    return;
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
Stelios Papamichail
  • 955
  • 2
  • 19
  • 57
  • I'm sorry, but I am not able to match your description and code. – Sourav Ghosh Jan 22 '19 at 10:46
  • @SouravGhosh sorry english is not my strongest skill. Let me try again: forget about the example above and let's say that i have an array like this: `int *ptrArray;`. Now, let's say that it has a size of `malloc(N * sizeof(int));` but i now want it to be smaller by one int (removing one element) e.g: `malloc((N-1) * sizeof(int));` . Can i do that using `realloc()`? – Stelios Papamichail Jan 22 '19 at 11:10
  • @SteliosPapamichail short answer: yes. This may be helpful: https://en.cppreference.com/w/c/memory/realloc – woz Jan 22 '19 at 11:26
  • @woz ok thank you, and if i was to do `free(ptrArray[i])` and then use `realloc()` would that free up the memory for the `i`th element? I mean is there danger of mem leaks, should i be careful of something? – Stelios Papamichail Jan 22 '19 at 11:28
  • @SteliosPapamichail, regarding the memory leak possibility, I think this [question](https://stackoverflow.com/questions/9071566/is-it-safe-to-use-realloc) explains how you can use `realloc()` safely. – woz Jan 22 '19 at 11:40

1 Answers1

2

Let's say you have a dynamically allocated array of words words:

char  **word;
size_t  words;

If you want to know the number of unique words, and the number of times they repeat in the array, you can use a simplified version of a disjoint-set data structure and an array of counts.

The idea is that we have two arrays of words elements each:

size_t *rootword;
size_t *occurrences;

The rootword array contains the index of the first occurrence of that word, and occurrences array contains the number of occurrences for each first occurrence of a word.

For example, if words = 5, and word = { "foo", "bar", "foo", "foo", "bar" }, then rootword = { 0, 1, 0, 0, 1 } and occurrences = { 3, 2, 0, 0, 0 }.

To fill in the rootword and occurrences arrays, you first initialize the two arrays to "all words are unique and occur exactly once" state:

    for (i = 0; i < words; i++) {
        rootword[i] = i;
        occurrences[i] = 1;
    }

Next, you use a double loop. Outer loop loops over unique words, skipping the duplicates. We detect duplicates by setting their occurrence count to zero. The inner loop is over the words we don't know if are unique or not, and pick off the duplicates of the currently unique word:

    for (i = 0; i < words; i++) {

        if (occurrences[i] < 1)
            continue;

        for (j = i + 1; j < words; j++)
            if (occurrences[j] == 1 && strcmp(word[i], word[j]) == 0) {
                /* word[j] is a duplicate of word[i]. */
                occurrences[i]++;
                rootword[j] = i;
                occurrences[j] = 0;
            }
    }

In the inner loop, we obviously ignore words that are already known to be duplicates (and j only iterates over words where occurrences[j] can be only 0 or 1). This also speeds up the inner loop for later root words, because we only compare candidate words, not those words we've already found a root word for.

Let's examine what happens in the loops with word = { "foo", "bar", "foo", "foo", "bar" } input.

 i ╷ j ╷ rootword  ╷ occurrences ╷ description
───┼───┼───────────┼─────────────┼──────────────────
   │   │ 0 1 2 3 4 │ 1 1 1 1 1   │ initial values
───┼───┼───────────┼─────────────┼──────────────────
 0 │ 1 │           │             │ "foo" != "bar".
 0 │ 2 │     0     │ 2   0       │ "foo" == "foo".
 0 │ 3 │       0   │ 3     0     │ "foo" == "foo".
 0 │ 4 │           │             │ "foo" != "bar".
───┼───┼───────────┼─────────────┼──────────────────
 1 │ 2 │           │             │ occurrences[2] == 0.
 1 │ 3 │           │             │ occurrences[3] == 0.
 1 │ 4 │         1 │   2     0   │ "bar" == "bar".
───┼───┼───────────┼─────────────┼──────────────────
 2 │   │           │             │ j loop skipped, occurrences[2] == 0.
───┼───┼───────────┼─────────────┼──────────────────
 3 │   │           │             │ j loop skipped, occurrences[3] == 0.
───┼───┼───────────┼─────────────┼──────────────────
 4 │   │           │             │ j loop skipped, occurrences[4] == 0.
───┼───┼───────────┼─────────────┼──────────────────
   │   │ 0 1 0 0 1 │ 3 2 0 0 0   │ final state after loops.  
Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • thank you so much for this answer i think this is what i want to recreate. One question before i give your answer another look, shouldn't the inner loop be `for(j = i+1; j < words; j++) {}` ? – Stelios Papamichail Jan 22 '19 at 11:51
  • 1
    @SteliosPapamichail: Good catch! Yes, indeed the inner loop should be `for (j = i + 1; j < words; j++) { ... }`. Now fixed. Thanks for noticing that! – Nominal Animal Jan 22 '19 at 11:54
  • awesome, i also have another question regarding the use of the final state of `rootword` and `occurences` for my program. Now in order for me to put those unique words in a separate array for e.g: sorting, would i have to loop over the elements of `rootword` and `occurences` and check 1) if i haven't seen the `rootword[i]` element before, add `word[rootword[i]]` to the list of unique words and 2) if `occurences[i] > 0` at the same time? (Let me know if i didn't state my question good enough so that i can try again) – Stelios Papamichail Jan 22 '19 at 12:02
  • 1
    @SteliosPapamichail: No, you'd just look at `occurrences[i]`. If it is zero, word `i` is a duplicate. – Nominal Animal Jan 22 '19 at 12:52
  • got it, thank you so so much, your answer was straight forward and easy to understand, thank you once again. Btw, can i ask you another question via chat about a problem that i encountered when recreating the program? – Stelios Papamichail Jan 22 '19 at 12:58
  • 1
    @SteliosPapamichail: I do not do chats, sorry. You can ask me via email, though, if you like; the address is shown on my [home page](https://www.nominal-animal.net/). – Nominal Animal Jan 22 '19 at 13:12
  • no problem, just sent you an email :) . Thanks for everything! – Stelios Papamichail Jan 22 '19 at 13:17