How to check for repeated characters within a string in c

Question

I'm trying to create a program that checks for repeated characters within the command line argument's string. The string is suppose to contain only 26 characters, and all characters have to be alphabetical. However, there cannot be any repeated characters within the string, each alphabetical character must appear only once. I figured out the first two sections of the program but I cant figure out how to check for repeated characters. I could really use some help as well as explanations.

#include <stdio.h>
#include <cs50.h>
#include <string.h>
#include <ctype.h>

int main (int argc, string argv[])
{
    if (argc != 2)
    {
        printf("Usage: ./substitution key\n");
        return 1;
    }
    else
    {
        int len = strlen(argv[1]);
        if (len != 26)
        {
            printf("Key must contain 26 characters.\n");
            return 1;
        }
        else
        {
            for (int i = 0; i < len; i++)
            {
                if (!isalpha(argv[1][i]))
                {
                    printf("Usage: ./substitution key\n");
                    return 1;
                }
            }
        }
    }
}

You need 26 counters, one for each letter of the alphabet, in order to count how often each letter occured. After performing this counting, you must verify that all 26 counters have the value 1 (which means that every letter occured exactly once). In order to allocate these 26 counters, you can either declare 26 individual local counter variables or one single array of 26 elements. — Andreas Wenzel, Jul 03 '20 at 23:23
Keep a Map and initialize it with zero. For every character check if it is previously found or not. If not found increase the value to 1, otherwise this char is repeated. — Asif Mujtaba, Jul 03 '20 at 23:24
@AsifMujtaba: If you are referring to `std::map`, please note that this question is tagged C, not C++. Of course, it is possible to implement a map in C too, but that seems like an excessive amount of work for this simple problem. — Andreas Wenzel, Jul 03 '20 at 23:27
It can be done by an array actually. My logic is to use hashMap. Thanks for the correction. — Asif Mujtaba, Jul 03 '20 at 23:31
@Patrick Dankyi Are upper case and lower case letters as for example 'a' and 'A' different letters? — Vlad from Moscow, Jul 03 '20 at 23:39

score 5 · Answer 1 · answered Jul 04 '20 at 01:09

Here are the basics of a solution:

When starting, initialize a table of flags: char Seen[UCHAR_MAX+1] = {0};. In this array, Seen[c] will be true (non-zero) if and only if character c has been seen already. (To get UCHAR_MAX, #include <limits.h>.)

When processing each character, copy it to an unsigned char: unsigned char c = argv[1][i];. Then convert it to uppercase: c = toupper(c);. Then test whether it has been seen already:

if (Seen[c])
    Report error...

If it is new, remember that it has been seen: Seen[c] = 1;.

That is all that is necessary.

Notes:

If it is known that A to Z are contiguous in the character set, as they are in ASCII, then char Seen[UCHAR_MAX+1] can be reduced to char Seen['Z'-'A'+1] and indexed using Seen[c-'A']. (Actually, a weaker condition suffices: A is the lowest uppercase character in value and Z is the greatest.)
Do not be tempted to use unsigned char c = toupper(argv[1][i]), because toupper is defined for unsigned char values, and a char value may be out of bounds (negative).
In an esoteric C implementation where a char is as wide as an int, none of which are known to exist, UCHAR_MAX+1 would evaluate to zero. However, the compiler should warn if this happens.

I just coded this up to see. Very elegant. Reminds me of python dictionaries. Thumbsup! Thanks for sharing! — ChumbiChubaGo, Jul 04 '20 at 04:51

score 5 · Answer 2 · answered Jan 24 '21 at 20:18

I'm assuming this was from cs50, I'm on this problem right now... What I did was I used a simple boolean function. I used two loops, the first loop was to represent the first character of reference and inside the nested loop I looped through all the other chars and returned false if i == k.

bool no_repeat(string arg_letters)
{
   for (int i = 0, j = strlen(arg_letters); i < j; i++)
   {
      for (int k = i+1; k < j; k++)
      {
         if (arg_letters[i] == arg_letters[k])
         {
            return false;
         }

      }
   }
   return true;
}

nolongtail · Answer 3 · 2021-01-20T11:24:32.403

1

From the question, the expected input is 26 characters with each appears only 1 time.

According to Andreas Wenzel's idea, i add counters for each character. in case any character missing or duplicate will be spotted. Here's my solution:

int main(int argc, char *argv[])
{
    const int OFFSET = 'A';

    if (argc != 2)
    {
        return 1;
    }
    if (strlen(argv[1]) != 26)
    {
        printf("Key must contain 26 characters.\n");
        return 1;
    }

    unsigned char *key = argv[1];
    int count[26] = {0}; // array of all zero

    // Check for any invalid character 
    for (int i = 0, n = strlen(key); i < n; i++)
    {
        if (!isalpha(key[i]))
        {
            printf("Key contains invalid characters.\n");
            return 1;
        }
        count[toupper(key[i]) - OFFSET]++;
    }
    // check for duplicate character of 
    for (int i = 0; i < 26; i++)
    {
        if (count[i] != 1) 
        {
            printf("Key shouldn't contain duplicate characters\n");
            return 1;
        }
    }
)

edited Jan 20 '21 at 11:24

answered Jan 13 '21 at 10:38

nolongtail

11
3

Note that for non-default [locales](https://en.cppreference.com/w/c/locale/setlocale), the function `isalpha` may return nonzero for characters that are not in the range `'a'` to `'z'` or `'A'` to `Z`'. For example, when the locale is set to some European languages, `isalpha` will return nonzero for the character code for `'ü'`, which has the value `252` in the character set I am using. The function `toupper` will probably change this character to a `'Ü'`, which has the value `220` in the character set I am using. With this value, you will be accessing `count` out of bounds. – Andreas Wenzel Jan 13 '21 at 15:29
It is also worth noting that the ISO C standard does not guarantee that the letters `'A'` to `'Z'` are stored contiguously. It only makes such a guarantee for the numbers `'0'` to `'9'`. Therefore, your program will work with character sets such as ASCII, which stores the letters contiguously, but it won't work with for example [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC), which does not do this. – Andreas Wenzel Jan 13 '21 at 17:09
@AndreasWenzel Thanks a lot for your comments!! I have edited the answer according to your suggestions. – nolongtail Jan 20 '21 at 11:25
I have now deleted my comments that you have addressed, because they are no longer relevant (because you have fixed the issues). I did not delete two comments, because they are still relevant. However, as I already stated in my deleted comments, these remaining issues are probably not worth fixing. It is probably safe to assume that your program will only be used in ASCII-compatible character encodings, so you can assume that `'A'` to `'Z'` is stored contiguously. The issue with the locale is probably also not worth fixing, if you are sure that you are using the default locale. – Andreas Wenzel Jan 25 '21 at 08:14

chqrlie · Answer 4 · 2021-05-04T16:55:54.147

You can use strchr() to detect if a given character is duplicated in the rest of the string:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("Usage: ./substitution key\n");
        return 1;
    } else {
        char *key = argv[1];
        int len = strlen(key]);
        if (len != 26) {
            printf("Key must contain 26 characters.\n");
            return 1;
        } else {
            for (int i = 0; i < len; i++) {
                unsigned char c = key[i];
                if (!isalpha(c)) {
                    printf("The key must contain only letters.\n");
                    return 1;
                }
                if (strchr(key + i + 1, c) {
                    printf("The key must not contain duplicate letters.\n");
                    return 1;
                }
            }
        }
    }
    return 0;
}

Alternately, you can use an array of indicators to perform the check in a single pass. This method is preferred if you want to ignore the character case.

#include <ctype.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("Usage: ./substitution key\n");
        return 1;
    } else {
        char *key = argv[1];
        unsigned char seen['Z' + 1 - 'A'] = { 0 };
        int i;
        for (i = 0; key[i] != '\0'; i++) {
            unsigned char c = key[i];
            if (!isalpha(c)) {
                printf("The key must contain only letters.\n");
                return 1;
            }
            if (seen[toupper(c) - 'A']++ != 0) {
                printf("The key must not contain duplicate letters.\n");
                return 1;
            }
        }
        if (i != 26) {
            printf("Key must contain all 26 letters.\n");
            return 1;
        }
    }
    return 0;
}

This solution has a [time complexity](https://en.wikipedia.org/wiki/Time_complexity) of `O(n^2)` whereas it can be solved in `O(n)`, as [this solution](https://stackoverflow.com/a/65700191/12149471) demonstrates. — Andreas Wenzel, May 04 '21 at 09:24
@AndreasWenzel: The time complexity is not really quadratic: the total number of instructions is bounded as the input string has exactly 26 characters: more elaborate solutions that may use fewer instructions are possible, but not necessarily worth the effort... Note that the question is somewhat ambiguous regarding case and the solution you are pointing to is not portable to systems where letters are not consecutive in the execution character set. On a system that uses EBCDIC this code has undefined behavior. — chqrlie, May 04 '21 at 15:32
Yes, you are right that the solution I pointed to does not work with EBCDIC (actually I had already pointed this out myself in a comment to that answer). However, [this other solution](https://stackoverflow.com/a/62724288/12149471) would work with EBCDIC and also has a time complexity of `O(n)`, but it requires 256 bytes of data that must be initialized to 0. Therefore, you are probably right that using a `O(n)` algorithm is probably not worth it, when `n` is limited to 26. — Andreas Wenzel, May 04 '21 at 16:12
@AndreasWenzel: added an alternate portable method with a single pass. Definitely better :) — chqrlie, May 04 '21 at 16:57
@AndreasWenzel: I use a small array, still making assumptions about the target character set, but I'm afraid the C Standard does not mandate that `A` be the smallest uppercase letter and `Z` the largest... the code works for ASCII and EBCDIC, but a full 256 byte array seems required for exotic character sets, which would not fit the OP's requirements anyway. — chqrlie, May 04 '21 at 17:04

eejakobowski · Answer 5 · 2020-07-05T21:36:02.510

0

Your can first sort the characters in the string so that any repeated letters will be right next to each other. So after sorting, scan through your sorted string with a variable holding your currently scanned character. If the next character you scan is the same as the last one that you scanned, you have a repeat character

edited Jul 05 '20 at 21:36

answered Jul 03 '20 at 23:18

eejakobowski

105
5

3

"Characters have to be in alphabetical order." - What are you referring to? The first command line argument? Or the ASCII table? – Andreas Wenzel Jul 04 '20 at 00:00
1

Sorting takes O(n log n) time but the problem in the question can be solved in O(n) time. – Eric Postpischil Jul 04 '20 at 00:10
eejakobowski - Where does the OP say they have to be sorted in alphabetical order? He simply states that it the input must be alphabetical and nonrepeating. – ChumbiChubaGo Jul 04 '20 at 04:57
Eric - Yes I saw your answer above, tried it myself, and am pleasantly surprised. – ChumbiChubaGo Jul 04 '20 at 04:58
@ChumbiChubaGo I misread "all characters have to be alphabetical" as "output must be in alphabetical order." Regardless, this is still a valid solution and would require the least amount of work – eejakobowski Jul 05 '20 at 21:34

score 0 · Answer 6 · answered Jul 03 '20 at 23:28

You can consider to use strchr(). I get an example from https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_72/rtref/strchr.htm

#include <stdio.h>
#include <string.h>

#define SIZE 40

int main(void) {
  char buffer1[SIZE] = "computer program";
  char * ptr;
  int    ch = 'p';

  ptr = strchr(buffer1, ch);
  printf("The first occurrence of %c in '%s' is '%s'\n",
          ch, buffer1, ptr );

}

/*****************  Output should be similar to:  *****************
The first occurrence of p in 'computer program' is 'puter program'
*/

Using `strchr` is wasteful; it takes O(n^2) time, but the problem in the question can be solved in O(n) time. — Eric Postpischil, Jul 04 '20 at 00:11

score 0 · Answer 7 · answered Jul 03 '20 at 23:30

0

There is many ways to do it, but the one I like is to first sort the string and then go through it and delete or copy the string to another string/array and u can do that by

s = "bedezdbe"

after u sort it

s = "bbddeeez"

And to check it

if(s[i-1]!=s[i])

answered Jul 03 '20 at 23:30

Loai

40
8

1

Sorting takes O(n log n) time but the problem in the question can be solved in O(n) time. – Eric Postpischil Jul 04 '20 at 00:11
As I said "There is many ways to do it, but the one I like is" – Loai Jul 04 '20 at 00:45

ChumbiChubaGo · Answer 8 · 2020-07-04T04:26:01.410

I'm assuming you are not yet in Datastructures and Algorithms. If so I would do a binary insertion sort and when you find a repeat you just printf your error message and return a 1 like you have done for the rest of the errors.

That said, that might be over your head right now. So in that case, just go with a bruteforce method without regard for optimization.

Start with the first character, loop through the rest of the string to see if it matches.
If it matches, error message
if there is no match, then repeat for the next character.

The code below checks if there are any repeating characters by this brute force method mentioned. You'll need to adapt it to your code of course.

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool itrepeats(char c, char *restofstring, int sublen){
    for(int j=0; j<sublen; j++){
        if(c==*restofstring) return true;
        restofstring++;
    }
    return false;
}

int main (int argc, char *argv[])
{
    int len = strlen(argv[1]);
    char *substr = argv[1];
    
    //loop through all characters
    for(int i=0;i<len;i++){
        //get a character
        char c = argv[1][i];
        //start of substring after c
        substr++;
        //length of that substring
        int substrlen = len-(i+1);
        //check for c in rest of string
        if(itrepeats(c, substr, substrlen)){
            printf("%c repeats, not allowed!\n",c);
            return 1;
        }
        else printf("all good!\n");
    }
}

score 0 · Answer 9 · answered May 04 '21 at 05:00

I've Done that before so here is my solution to this problem: Every time we found the current argv[1]'s character in the string we gonna add 1 to the counter

in normal cases where the argv[1] is normal, counter should be 26 at the end.

but if there is a repeated character, counter should be more than 26 in the end.

int counter = 0;
for (int i = 0; i < len; i++)
        {
            char argu = argv[1][i];
            // make sure that the key doesnt accept duplicated characters in key.
            for (int k = 0; k < len; k++)
            {
                if (argu == argv[1][k])
                {
                    counter++;
                }
                if (counter > 26)
                {
                    printf("Opps! no duplicated characters!\n");
                    return 1;
                }
            }
        }

The question states that no duplicated characters is the desired outcome. Therefore, it does not seem appropriate that to print `"Oops"` in that case, as that implies than an error occurred. Is `"Opps"` supposed to mean `"Oops"`? — Andreas Wenzel, May 04 '21 at 08:59
This solution is rather inefficient, as it requires a total of 676 (26^2) comparisons ([time complexity](https://en.wikipedia.org/wiki/Time_complexity) of `O(n^2)`), whereas [this solution](https://stackoverflow.com/a/65700191/12149471) only requires 26 comparisons (time complexity of `O(n)`). — Andreas Wenzel, May 04 '21 at 09:28

score 0 · Answer 10 · answered May 22 '22 at 02:03

0

#include<stdio.h>
int main(){
char  string[3] = {'c','j','j'};
for(int i = 0;i<3;i++){
    for(int j= i+1;j<3;j++){
        if(string[i] == string[j]){
            printf("position %d char %c\n",i,string[i]);
        }
    }
}

}

answered May 22 '22 at 02:03

Clerdson Juca

1
1

1

Your answer would be of higher quality if you added some text to your answer, in order to explain it. – Andreas Wenzel May 22 '22 at 03:54

How to check for repeated characters within a string in c

10 Answers10

Linked