9

I'm working on an application where you're able to subscribe to a newsletter and choose which categories you want to subscribe to. There's two different sets of categories: cities and categories.

Upon sending emails (which is a scheduled tast), I have to look at which cities and which categories a subscriber has subscribed to, before sending the email. I.e., if I have subscribed to "London" and "Manchester" as my cities of choice and have chosen "Food", "Cloth" and "Electronics" as my categories, I will only get the newsletters that relates to these.

The structure is as follows:

On every newsitem in Umbraco CMS there is a commaseparated string of cities and categories (effectively, these are stored as node ids since cities and categories are nodes in Umbraco aswell) When I subscribe to one or more city and one or more category, I store the city and category nodeids in the database in custom tables. My relational mapping looks like this:

enter image description here

And the whole structure looks like this:

enter image description here

To me, this seems like two sets of 1 - 1..* relations (one subscriber to one or many cities and one subscriber to one or many categories)

To find which emails to send who which subscriber, my code looks like this:

private bool shouldBeAdded = false;

// Dictionary containing the subscribers e-mail address and a list of news nodes which should be sent
Dictionary<string, List<Node>> result = new Dictionary<string, List<Node>>();

foreach(var subscriber in subscribers)
{
    // List of Umbraco CMS nodes to store which nodes the html should come from
    List<Node> nodesToSend = new List<Node> nodesToSend();

    // Loop through the news
    foreach(var newsItem in news)
    {
        // The news item has a comma-separated string of related cities
        foreach (string cityNodeId in newsItem.GetProperty("cities").Value.Split(','))
        {
            // Check if the subscriber has subscribed to the city
            if(subscriber.CityNodeIds.Contains(Convert.ToInt32(cityNodeId)))
            {
                 shouldBeAdded = true;
            }
        }

        // The news item has a comma-separated string of related categories
        foreach (string categoryNodeId in newsItem.GetProperty("categories").Value.Split(','))
        {
            // Check if the subscriber has subscribed to the category
            if(subscriber.CategoryNodeIds.Contains(Convert.ToInt32(categoryNodeId)))
            {
                shouldBeAdded = true;
            }
        }
    }

    // Store in list
    if (shouldBeAdded)
    {
        nodesToSend.Add(newsItem);
    }

    // Add it to the dictionary
    if (nodesToSend.Count > 0)
    {
        result.Add(subscriber.Email, nodesToSend);
    }
}

// Ensure that we process the request only if there are any subscribers to send mails to
if (result.Count > 0)
{
    foreach (var res in result)
    {
        // Finally, create/merge the markup for the newsletter and send it as an email.
    } 
}

While this works, I'm a bit concerned about performance when a certain amount of subscribers is reached since we're into three nested foreach loops. Also, remembering my old teachers preaches: "for every for loop there is a better structure"

So, I would like your oppinion on the above solution, is there anything that can be improved here with the given structure? And will it cause performance problems over time?

Any help/hint is greatly appreciated! :-)

Thanks in advance.

Solution

So after a few good hours of debugging and fumblin' around I finally came up with something that works (initially, it looked like my original code worked, but it didn't)

Sadly, I couldn't get it to work with any LINQ queries I tried, so I went back to the "ol' school' way of iterating ;-) The final algorithm looks like this:

private bool shouldBeAdded = false;

// Dictionary containing the subscribers e-mail address and a list of news nodes which should be sent
Dictionary<string, List<Node>> result = new Dictionary<string, List<Node>>();

foreach(var subscriber in subscribers)
{
    // List of Umbraco CMS nodes to store which nodes the html should come from
    List<Node> nodesToSend = new List<Node> nodesToSend();

    // Loop through the news
    foreach(var newsItem in news)
    {
         foreach (string cityNodeId in newsItem.GetProperty("cities").Value.Split(','))
         {
             // Check if the subscriber has subscribed to the city
             if (subscriber.CityNodeIds.Contains(Convert.ToInt32(cityNodeId)))
             {
                 // If a city matches, we have a base case
                 nodesToSend.Add(newsItem);
             }
         }

         foreach (string categoryNodeId in newsItem.GetProperty("categories").Value.Split(','))
         {
             // Check if the subscriber has subscribed to the category
             if (subscriber.CategoryNodeIds.Contains(Convert.ToInt32(categoryNodeId)))
             {
                 shouldBeAdded = true;

                 // News item matched and will be sent. Stop the loop.
                 break;
             }
             else
             {
                 shouldBeAdded = false;
             }
         }

         if (!shouldBeAdded)
         {
             // The news item did not match both a city and a category and should not be sent
             nodesToSend.Remove(newsItem);
         }
    }

    if (nodesToSend.Count > 0)
    {
        result.Add(subscriber.Email, nodesToSend);
    }  
}

// Ensure that we process the request only if there are any subscribers to send mails to
if (result.Count > 0)
{
    foreach (var res in result)
    { 
        // StringBuilder to build markup for newsletter
        StringBuilder sb = new StringBuilder();

        // Build markup
        foreach (var newsItem in res.Value)
        {
            // build the markup here
        }

        // Email logic here
    }
}
bomortensen
  • 3,346
  • 10
  • 53
  • 74
  • 1
    I have to say I don't know anything about Umbraco but I marked this question up as it is a *model* of how to go about asking such a question. – deadlyvices Mar 08 '12 at 09:24
  • Thanks deadlyvices :) I'm aware that the above code example can (and will!) be refactored to more than one method. – bomortensen Mar 08 '12 at 09:30

2 Answers2

2

First you can break you inner foreach as soon as shouldBeAdde = true.

You also could use LINQ, but I'm not sure if it will be faster (but you could use .AsParallel to make it multithreaded easily):

var nodesToSend = from n in news
                 where n.GetProperties("cities").Value.Split(',')
                     .Any(c => subscriber.CityNodeIds.Contains(Convert.ToInt32(c)) &&
                 n.GetProperties("categories").Value.Split(',')
                     .Any(c => subscriber.CategoryNodeIds.Contains(Convert.ToInt32(c))
                 select n;

The complete think would then come down to (incl. parallel):

Dictionary<string, IEnumerable<Node>> result = new Dictionary<string, IEnumerable<Node>>();
foreach(var subscriber in subscribers)
{
    var nodesToSend = from n in news.AsParallel()
        where n.GetProperties("cities").Value.Split(',')
                .Any(c => subscriber.CityNodeIds.Contains(Convert.ToInt32(c)) &&
            n.GetProperties("categories").Value.Split(',')
                .Any(c => subscriber.CategoryNodeIds.Contains(Convert.ToInt32(c))
        select n;

    if (nodesToSend.Count > 0)
        result.Add(subscriber.Email, nodesToSend);
}

if (result.Count > 0)
{
    foreach (var res in result)
    {
        // Finally, create/merge the markup for the newsletter and send it as an email.
    } 
}
Christoph Fink
  • 22,727
  • 9
  • 68
  • 113
  • Hi chrfin, thanks a lot! This approach looks solid. I tried it with the asparallel method, but unfortunately got a nullpointer exception on the first check: where n.GetProperties("cities").Value.Split(',') .Any(c => subscriber.CityNodeIds.Contains(Convert.ToInt32(c)) However, without the asparallel method, everything works as it should! :) Thanks again! – bomortensen Mar 08 '12 at 10:15
  • Hi chrfin, tried to fool a bit more around with your linq query and it turns out that it's the categories check that decides which news to send. Is there any way to only get the news items where cities and categories matches? :) – bomortensen Mar 08 '12 at 11:24
  • Yes, actually it should already do that as it can be logicaly translated in "where any of the cities are in city nodes AND any of the categories in the category nodes". You can just fiddle around with the query if you need different results (think of it as a SQL query)... – Christoph Fink Mar 08 '12 at 11:55
  • Just checked your code again: You have a "OR" there, so maybe change the `&&` to `||` to get ""where any of the cities are in city nodes OR any of the categories in the category nodes". – Christoph Fink Mar 08 '12 at 11:59
0

I don't think you will run into performance problems anytime soon. I would leave it like you have it now and only try to optimize after you ran into a real performance problem and have used a profiler to verify that these loops are the problem. Currently, it looks like you are doing premature optimization.

Having said that, the following might be a possible optimization:

You could store the relation from city to subscriber in a dictionary with the city as the key and the subscribers for this city as the value of the dictionary stored as a HashSet<T>. And you can do the same for category to subscriber.

Now when you send your newsletter you can iterate over the news items you can retrieve the subscribers for the cities using the dictionary and you can retrieve the subscribers for the categories using the dictionary. Now you need to intersect the city subscribers HashSet with the category subscribers HashSet and as a result you will have all matching subscribers for the news item.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • Hi Daniel, thanks a lot for your input! :) I'll definitely try your suggestion aswell! The more suggestions the merrier ;) – bomortensen Mar 08 '12 at 10:14