2

I apologize in advance if the title seems a bit off. I was having trouble deciding what exactly I should name it. Anyway, basically what I am doing now is completely homework that deals with low-level I/Os. For my one assignment, I have given two .txt files, one that includes a list of email addresses and another that includes a list members who no longer was to be on an email list. What I have to do is delete the emails of the members from the second list. Additionally, there may be some nasty surprises in the .txt files. I have to clean-up the emails and take out any unwanted punctuation after the emails, such as semi-colons, commas and spaces. Furthermore, I need to lowercase all of the text. I'm struggling with this problem in more ways than one (I'm not entirely sure how to get my file to write what I need it to in my output), but right now my main concern is outputting the unsubscribe message in the correct order. Sortrow doesn't seem to work.

Here are some test cases:

Test Cases
unsubscribe('Grand Prix Mailing List.txt', ...
              'Unsubscribe from Grand Prix.txt')
     => output file named 'Grand Prix Mailing List_updated.txt' that looks
        like 'Grand Prix Mailing List_updated_soln.txt'
     => output file named 'Unsubscribe from Grand Prix_messages.txt' that 
        looks like 'Unsubscribe from Grand Prix_messages_soln.txt'

The original mailing list

Grand Prix Mailing List:
MPLUMBER3@gatech.edu, 
lplumber3@gatech.edu 
Ttoadstool3@gatech.edu;
bkoopa3@gatech.edu
ppeach3@gatech.edu,
ydinosaur3@gatech.edu
kBOO3@gatech.edu
WBadguy3@gatech.edu;
FKong3@gatech.edu
dkong3@gatech.edu
dbones3@gatech.edu

People who are like nope:

MARIO PLUMBER; 
bowser koopa 
Luigi Plumber,
Donkey Kong 
King BOO;
Princess Peach

What it's supposed to look like afterwards:

ttoadstool3@gatech.edu
ydinosaur3@gatech.edu
wbadguy3@gatech.edu
fkong3@gatech.edu
dbones3@gatech.edu

My file output:

Mario, you have been unsubscribed from the Grand Prix mailing list.
Luigi, you have been unsubscribed from the Grand Prix mailing list.
Bowser, you have been unsubscribed from the Grand Prix mailing list.
Princess, you have been unsubscribed from the Grand Prix mailing list.
King, you have been unsubscribed from the Grand Prix mailing list.
Donkey, you have been unsubscribed from the Grand Prix mailing list.

So Amro has been kind enough to provide a solution, though it's a little above what I know right now. My main issue now is that when I output the unsubscribe message, I need it to be in the same order as the original email list. For instance, while Bowser was on the complaining list before Luigi, in the unsubscribe message, Luigi needs to come before him.

Here is my original code:

function[] = unsubscribe(email_ids, member_emails)
    Old_list = fopen(email_ids, 'r'); %// opens my email list
    Old_Members = fopen(member_emails, 'r'); %// Opens up the names of people who want to unsubscribe
    emails = fgets(Old_list); %// Reads first line of emails
    member_emails = [member_emails]; %// Creates an array to populate
while ischar(emails) %// Starts my while loop
%// Pulls out a line in the email
    emails = fgets(Old_list);
%// Quits when it sees this jerk
    if emails == -1
        break;
    end

%// I go in to clean stuff up here, but it doesn't do any of it. It's still in the while loop though, so I am not sure where the error is
proper_emails = lower(member_emails); %// This is supposed to lowercase the emails, but it's not working
unwanted = findstr(member_emails, ' ,;');
member_emails(unwanted) = '';
member_emails = [member_emails, emails];
end

while ischar(Old_Members) %// Does the same for the members who want to unsubscribe
    names = fgetl(member_emails);
    if emails == -1
        break
    end
proper_emails = lower(names); %// Lowercases everything
unwanted = findstr(names, ' ,;');
names(unwanted) = '';
end

Complainers = find(emails);

New_List = fopen('Test2', 'w'); %// Creates a file to be written to
fprintf(New_List, '%s', member_emails); %// Writes to it
Sorry_Message = fopen('Test.txt', 'w');
fprintf(Sorry_Message, '%s', Complainers);

%// Had an issue with these, so I commented them out temporarily
%// fclose(New_List);
%// fclose(Sorry_Message);
%// fclose(email_ids); 
%// fclose(members);

end
Jessica Marie
  • 293
  • 5
  • 16
  • Do you have a list of names that go with each e-mail address? Your solution addresses each person by their first name, but the sample input you have provided does not include them. – rayryeng Oct 14 '14 at 02:41
  • @rayryeng I knew I was forgetting something. Added it in sorry! – Jessica Marie Oct 14 '14 at 03:45
  • Still a bit unclear. What is the format of the text file? Does the name appear first, followed by the e-mail address? Your sample input is still confusing. – rayryeng Oct 14 '14 at 05:12
  • There's the email list, then a separate list of email addresses. – Jessica Marie Oct 14 '14 at 17:24

1 Answers1

3

Below is my implementation for the problem. The code is commented at each step and should be easy to understand. I'm using regular expressions when I can because this is the sort of thing they're good at... Also note that I don't have any loops in the code :)

unsubscribe.m

function unsubscribe(mailinglist_file, names_file)

    %%
    % read list of names of those who want to unsubscribe
    names = read_file(names_file);

    % break names into first/last parts
    first_last = regexp(names, '(\w+)\s+(\w+)', 'tokens', 'once');
    first_last = vertcat(first_last{:});

    % build email handles (combination of initials + name + domain)
    emails_exclude = strcat(cellfun(@(str) str(1), first_last(:,1)), ...
        first_last(:,2), '3@gatech.edu');

    %%
    % read emails in mailing list
    emails = read_file(mailinglist_file);

    % update emails by removing those who wish to unsubscribe
    emails(ismember(emails, emails_exclude)) = [];

    %%
    % write updated mailing list
    [~,fName,fExt] = fileparts(mailinglist_file);
    fid = fopen([fName '_updated' fExt], 'wt');
    fprintf(fid, '%s\n', emails{:});
    fclose(fid);

    % write list of names removed
    % capilaize first letter of first name
    first_names = cellfun(@(str) [upper(str(1)) str(2:end)], ...
        first_last(:,1), 'UniformOutput',false);
    msg = strcat(first_names, ...
        ', you have been unsubscribed from the mailing list.');
    fid = fopen([fName '_messages' fExt], 'wt');
    fprintf(fid, '%s\n', msg{:});
    fclose(fid);

end

function C = read_file(filename)
    % read lines from file into a cell-array of strings
    fid = fopen(filename, 'rt');
    C = textscan(fid, '%s', 'Delimiter','');
    fclose(fid);

    % clean up lines by removing trailing punctuation
    C = lower(regexprep(C{1}, '[,;\s]+$', ''));
end

Given the following text files:

list.txt

MPLUMBER3@gatech.edu, 
lplumber3@gatech.edu 
Ttoadstool3@gatech.edu;
bkoopa3@gatech.edu
ppeach3@gatech.edu,
ydinosaur3@gatech.edu
kBOO3@gatech.edu
WBadguy3@gatech.edu;
FKong3@gatech.edu
dkong3@gatech.edu
dbones3@gatech.edu

names.txt

MARIO PLUMBER; 
bowser koopa 
Luigi Plumber,
Donkey Kong 
King BOO;
Princess Peach

Here is what I get when running the code:

>> unsubscribe('list.txt', 'names.txt')

list_messages.txt

Mario, you have been unsubscribed from the mailing list.
Bowser, you have been unsubscribed from the mailing list.
Luigi, you have been unsubscribed from the mailing list.
Donkey, you have been unsubscribed from the mailing list.
King, you have been unsubscribed from the mailing list.
Princess, you have been unsubscribed from the mailing list.

list_updated.txt

ttoadstool3@gatech.edu
ydinosaur3@gatech.edu
wbadguy3@gatech.edu
fkong3@gatech.edu
dbones3@gatech.edu
Amro
  • 123,847
  • 25
  • 243
  • 454
  • That looks like a whole bunch of stuff I don't know how to use. 'Regex' being one of them. I don't know what the '\w+' thingies are. I've never used fileparts either. We've only been taught like fprintf or just to open and write to it. I tested it anyway to see if I could possibly use the structure of it and change parts of it into what I know how to do, but it errors out on line twelve " emails_exclude = strcat(cellfun(@(str) str(1), first_last(:,1)), first_last(:,2), '3@gatech.edu');" It says the matrix dimensions are exceeded. I appreciate the help btw. – Jessica Marie Oct 14 '14 at 14:29
  • 2
    @JessicaMarie: `regexp` or regular expressions are of those things you really want to learn. They can be very useful when dealing with text in semi-free form. There's a nice tutorial online you could try: http://regexone.com/. The line you've highlighted builds the email address from first/last name: I extract the first letter from the first name, then append the last name and the domain part. `strcat` can handle mixed input with both cell arrays and plain characters. I suggest you run the code in the debugger and step through it line by line, while inspecting the variables at each step.. – Amro Oct 14 '14 at 15:16
  • `fileparts` is a function used to separate the path/name/extension of a full file name. So given an input like `C:\path\to\file.txt`, it would separate it into `C:\path\to` as path, `file` as name, and `.txt` as extension. I've used this function to get the original mailing list filename and append the `_updated` part to it. – Amro Oct 14 '14 at 15:21
  • +1 - Always love your posts Amro :). Also, thanks for that link to `regexone`. It's time that I brush up on my `regex` as a lot of the questions here can be eloquently solved using it. – rayryeng Oct 14 '14 at 16:54
  • @Amro I was debugging it and attempting to alter it using what I know, but the line I highlighted, the emails_exclude line, errors out. Somehow you indexed out of bounds, but I am not entirely sure how you did, primarily because I don't understand the format. – Jessica Marie Oct 14 '14 at 17:23
  • @JessicaMarie: if I had to guess, I would say you probably have a list of names that didn't respect the format of `firstName LastName` (plus any trailing punctuation), one per line. If that's the case, you'll have to adjust the regular expression to match all possible cases... Besides the regexp used here is really simple, you could replace it with something else; the idea is to beak the line into first and last name parts which are space delimited (note that I'm applying it to a cell-array of strings, so you have to deal with that with a loop if you need) – Amro Oct 14 '14 at 17:33
  • Nope .-. It's the exact file I copy-pasted into here. So I'm not too sure what's going on. I tried breaking it up with an strtok, but that complained as well. Maybe I missed something. And I vaguely get the regex expression, it's the strcat that looks weird to me. – Jessica Marie Oct 14 '14 at 17:42
  • @rayryeng: thanks. if you want to test your regex skills, I remember seeing some interesting problems on [Cody](http://www.mathworks.com/matlabcentral/cody/) with really clever solutions using regular expressions. – Amro Oct 14 '14 at 17:42
  • @JessicaMarie: I just read your question edit, and I see that you included a header line at the beginning of mailing list file (like `Grand Prix Mailing List:`). Is that part of the file as well? – Amro Oct 14 '14 at 17:44
  • No. I just did that to explain what text I had beneath the header. They're all just a list of names. no extra spaces in-between them or anything. One first and last name per line. – Jessica Marie Oct 14 '14 at 17:46
  • @JessicaMarie: it's hard to tell what's wrong in your case.. I suggest you inspect the variables involved just before the offending line (for example, `first_last` should be a Nx2 cell array of strings, where the 1st column are the first names, and the 2nd column contains the last names).. – Amro Oct 14 '14 at 17:52
  • I found the error. Our input order is differently. It appears to work correctly, except it doesn't write to the 'Unsubscribe from Grand Prix_messages.txt' file, but I think I can fix that myself. Thank you very much :) – Jessica Marie Oct 14 '14 at 17:54
  • Apparently I can't figure it out. Any suggestions/tips? I feel like this solution is just too over my head to properly manipulate. – Jessica Marie Oct 14 '14 at 18:10
  • @JessicaMarie: ok how about now? – Amro Oct 14 '14 at 18:18
  • I swear I did that same thing, just wrong somewhere. That works, than you :) Just one more question before I debug the heck out of the code, how do I change the message solution so that it returns the "...X has been unsubscribed" in the same order as the original mailing list. For instance, it should say that Luigi has unsubscribed from the list, before it says it to Bowser. I was thinking something along cell indexing. I swear I hate the homework assignments they give us. We barely go over half the crap we need for the problems. – Jessica Marie Oct 14 '14 at 18:27