-1

Im a beginner in programming and stuff, i want to solve this problem here

(Spam Scanner) Spam (or junk e-mail) costs U.S. organizations billions of dollars a year in spam-prevention software, equipment, network resources, bandwidth, and lost productivity Research online some of the most common spam e-mail messages and words, and check your own junk e-mail folder. Create a list of 30 words and phrases commonly found in spam messages. Write a program in which the user enters an e-mail message. Read the message into a large character array and ensure that the program does not attempt to insert characters past the end of the array. Then scan the message for each of the 30 keywords or phrases. For each occurrence of one of these within the message, add a point to the message’s “spam score.” Next, rate the likelihood that the message is spam, based on the number of points it received

I tried write my code like this

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void find_string(char *emailSearch);
const char spam[][30] = {
"congratulation",
"free",
"100%",
"earn",
"million",
"click",
"here",
"instant",
"limited",
"urgent",
"winner",
"selected",
"bargain",
"deal",
"debt",
"lifetime",
"cheap",
"easy",
"bonus",
"credit",
"bullshit",
"scam",
"junk",
"spam",
"passwords",
"invest",
"bulk",
"exclusive",
"win",
"sign"};

int main(){
char email[1000];
    printf("Enter your short email message: \n");
    fgets(email, 80, stdin);
    email[strlen(email)-1] = '\0';
    find_string(email);
    return 0;
    }

void find_string(char *emailSearch){
int i = 0;
    while(emailSearch[i]){
        (tolower(emailSearch[i]));
        i++;
    }
    if(strstr(emailSearch,spam)){
        printf("Your email message is considered spam!");
    }
    else{
        printf("Your email is not spam!");
    }
}

I tried inputing words in the spam array, but the output still printing "Your email is not spam!". Anyone can fix this?

MJee
  • 11
  • 3
  • Give example of input – TheRyuTeam Oct 11 '22 at 07:33
  • 1
    "spam" is not a `char*` It is a `char**`... You are badly attempting to find a haystack in a needle... You need to test each "forbidden word" against the message, counting how many times you find them... (There's no hope for a message that is "easy, easy, easy" returning 3 occurrences, is there...) One or two "forbidden" words does not make the message a strong candidate as being spam... – Fe2O3 Oct 11 '22 at 07:49
  • You only read at most 79 bytes of your email (but allocate 1000). #define some constants. – Allan Wind Oct 11 '22 at 07:50
  • Actually, a pretty superficial definition, merely finding presence of a word or two... "urgent, winner, selected" are spam-speak. `strstr()` will find "sign" in the word "signature" and may flag an email from your lawyer as spam... Too superficial... – Fe2O3 Oct 11 '22 at 08:08
  • 1
    Please note that for a professional program, you would sort the spam strings in alphabetic order. And rather than using `strstr`, you'd take each word from the mail and do a binary search (`bsearch()` for example) on the sorted strings. Alternatively you could also use a hash table, in case the number of words is huge. – Lundin Oct 11 '22 at 08:39

2 Answers2

3

The main issue that you need to iterate over each of your spam words and search for that in your text. If you have strcasestr() use that instead of strtolower(email):

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define LEN 79

const char *spam[] = {
    "congratulation",
//  ...
};

char *strtolower(char *s) {
    size_t n = strlen(s);
    for(int i = 0; i < n; i++) {
        s[i] = tolower(s[i]);
    }
    return s;
}

void find_string(char *emailSearch){
    for(int i = 0; i < sizeof spam / sizeof *spam; i++) {
        if(strstr(emailSearch, spam[i])) {
            printf("Your email message is considered spam!\n");
            return;
        }
    }
    printf("Your email is not spam!\n");
}

int main(){
    char email[LEN+1];
    printf("Enter your short email message: \n");
    fgets(email, LEN+1, stdin);
    find_string(strtolower(email));
    return 0;
}

The next step would be to split your email into words so the spam word "here" will not cause the unrelated email word "there" to be treated as spam. You can now use strcmp() to compare the email and spam list of words. If you sort your spam list, you could use bsearch() instead of linear search. Alternatively consider using a hash table for your spam list.

The following step after that is implement some type of stemming so "congratulations" would again be considered spam because the root word "congratulation" is on the spam list.

Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • 2
    Dear Allan, I'd like to give you **credit** for this lovely answer, but this comment will be flagged as spam and sent off to the bit bucket. (A little harsh, no? One word and it's bye-bye message from Gramma talking about her **win** at Bingo at the Lodge...) `:-)` – Fe2O3 Oct 11 '22 at 08:02
1

For my critique of the OP code, please refer to comments below the OP question.

As I pointed out in those comments, this is a weak spam detection scheme. strstr() is pretty indiscriminate, happy to match any sequence of characters if it can. Eg: it will find the word "town" in the word "boatowner". There'll be a lot of false postitives.

Anyway, since @Allan and I have such a good time at this, here's an adaptation of a search routine written for an SO question just a few hours ago (https://stackoverflow.com/a/74022127/17592432). You be the judge.

#include <ctype.h>
#include <stdio.h>
#include <string.h>

const char *blacklist[] = {
    "congratulation",
    "free",     "100%",     "earn",     "million",  "click",    "here",
    "instant",  "limited",  "urgent",   "winner",   "selected", "bargain",
    "deal",     "debt",     "lifetime", "cheap",    "easy",     "bonus",
    "credit",   "bullshit", "scam",     "junk",     "spam",     "passwords",
    "invest",   "bulk",     "exclusive","win",      "sign"
};

int rate( char *str ) {
    for( char *p = str; *p; p++ ) *p = (char)tolower( (unsigned char)*p );

    int cnt = 0;
    for( size_t i = 0; i < sizeof blacklist/sizeof blacklist[0]; i++ )
        for( const char *p=str, *bl=blacklist[i]; (p = strstr(p, bl) ) != NULL; p++, cnt++ )
            printf( "'%s' ", bl );

    return cnt;
}

int main(void) {
    char email_1[] =
        "dear gramma,\n"
        "today i selected a puppy and a fish for my birthday\n"
        "how are you? are your investments showing signs of improving?\n"
        "and what' the deal with my instant gratification?\n"
        "i don't want to have to earn my million during my lifetime.\n"
        "i want yours. it's kinda urgent!\n"
        "love, your kid's kid\n";
    char email_2[] =
        "Dear ex-Subscriber,\n"
        "We want you back as a valued customer.\n"
        "Changing your mind and renewing now, in response to this email,\n"
        "will allow us to alert you to more opportunities \n"
        "to purchase crap from us and our suppliers.\n"
        "Don't waste a moment. We can click again if you'll just click reply.\n";

    char *emails[] = {email_1, email_2};

    for( size_t i = 0; i < sizeof emails/sizeof emails[0]; i++ ) {
        puts( "-----------------------------------");
        puts( emails[i] );
        int rating = rate( emails[i] );
        printf( "\n***Rating %d - ", rating );

        if( rating > 8 )
            puts( "Email message is considered spam\n" );
        else
            puts("Email is not spam!\n");
    }

    return 0;
}
-----------------------------------
dear gramma,
today i selected a puppy and a fish for my birthday
how are you? are your investments showing signs of improving?
and what' the deal with my instant gratification?
i don't want to have to earn my million during my lifetime.
i want yours. it's kinda urgent!
love, your kid's kid

'earn' 'million' 'instant' 'urgent' 'selected' 'deal' 'lifetime' 'invest' 'win' 'sign'
***Rating 10 - Email message is considered spam

-----------------------------------
Dear ex-Subscriber,
We want you back as a valued customer.
Changing your mind and renewing now, in response to this email,
will allow us to alert you to more opportunities
to purchase crap from us and our suppliers.
Don't waste a moment. We can click again if you'll just click reply.

'click' 'click' 'win'
***Rating 3 - Email is not spam!

Those two blacklisted occurrences of win come from showing and renewing in the two messages... strstr() - the matching is not sophisticated. Improving this is left as an exercise for the reader.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Fe2O3
  • 6,077
  • 2
  • 4
  • 20
  • You forgot to to convert `*p` to `unsigned char` before passing to `tolower()` - that can blow up when used with a negative value. – Toby Speight Oct 11 '22 at 10:55
  • Oh, you also forgot to include ``, too. – Toby Speight Oct 11 '22 at 10:55
  • @TobySpeight wrt: tolower( unsigned char )... If you examine the OP code, you will see that this is beginner level and not likely to encounter anything but 7-bit ASCII... Yes, the parameter is/should-be unsigned... Baby steps. wrt: ``, I forgot nothing. The functions prototypes are also in ``... Next time frame your comments as questions and I'll be glad to explain things to you in a nicer tone... – Fe2O3 Oct 11 '22 at 10:59
  • @TobySpeight The focus on this project uses `strstr()` as its _sniffer dog_. The parameters to `strstr()` are `(signed) char*`... What do you make of this??? – Fe2O3 Oct 11 '22 at 11:06
  • @Fe203, why do you think that `tolower()` is (portably) defined by including ``? I find nothing to support that in either section 7.4.2.1 or section 7.24; would you kindly point me to the location that specifies that the prototype must also be in `` as you claim? – Toby Speight Oct 11 '22 at 12:03
  • @TobySpeight You are always free to write your own answer to any SO question, quoting from standards and other references (to the befuddled beginners who won't understand most of the terminology.) – Fe2O3 Oct 12 '22 at 02:09
  • Sadly, it seems your promise to explain your reasoning when asked direct questions was somewhat exaggerated. Goodbye. – Toby Speight Oct 15 '22 at 07:08
  • @TobySpeight Let me refresh your memory, The OP begins, "Im a beginner in programming and stuff..." **THAT** is my reasoning, including an invitation for you to post an answer full of jargon that will be gibberish to the OP... Do you object to every answer that uses gcc/clang extensions as being "non-portable"... Take your pedantry elsewhere, please... – Fe2O3 Oct 15 '22 at 23:15
  • @TobySpeight So your "Goodbye" was not final. I'm leaving your (harmless) edits and look forward to seeing you "cleaning up" the answers of others to meet your requirements. The edit comment "Fix the bugs" is odd. Sprinkling a few 'const' and 'size_t' declarations indicates the changes you made have nothing to do with "bug fixing", and were made because of some other motivation... Who can say... – Fe2O3 Oct 24 '22 at 20:24
  • Au contraire. Those things fix the compiler diagnostics (one error and many warnings - including attempts to modify const objects). Which are especially important to get right when writing code that may be copied by beginners. – Toby Speight Oct 25 '22 at 05:37