3

I'm working on an app that uses RavenDB on the back end. It's my first time using Raven, and I'm struggling with Map/Reduce.

I have been reading the doc's, but unfortunately I'm not getting anywhere in the process.

Basically I have thousands of documents like this.

{
  .....
  "Severity": {
    "Code": 6,
    "Data": "Info"
  },
  "Facility": {
    "Code": 16,
    "Data": "Local Use 0 (local0)"
  },
  .....
}

And out of it, I need to make a single query with output that looks like this.

{"Severity": [
    {"Emergency":0},
    {"Alert":0},
    {"Critical":0},
    {"Error":0},
    {"Warning":0},
    {"Notice":0},
    {"Info":2711},
    {"Debug":410}
],
"Facility": [
    {"Kernel Messages":0},
    {"User-Level Messages":0},
    {"Mail System":0},
    {"System Daemons":0},
    {"Security/Authorization Messages":0},
    {"Internal Syslogd Messages":0},
    {"Line Printer Subsystem":2711},
    {"Network News Subsystem":410},
    ....
    {"Local Use 0 (local0)": 2574},
    ...
]}

Whereby the "Key" in the Severity/Facility Array is the Data portion of the above json data, and the "value" in the Severity/Facility Array is the document Countfor each Code type.

Example:
Using the above data as a guideline,

There are 2711 documents in my database with an Info severity.
There are 410 documents in my database with a Debug severity.
There are 2574 documents in my database with a local0 facility.
etc...


What I'd like to do is generate the appropriate indexes when the app starts up (or check if they already exist), but I don't even know where to begin.

note: the app needs to generate the index, it's not enough to just manually write it into the RavenDB Web UI.

Chase Florell
  • 46,378
  • 57
  • 186
  • 376
  • Do the `Code` properties need to have any affect on the index? – Matt Johnson-Pint Feb 25 '13 at 02:50
  • The `Code` and the `Data` always match. IE: `Code:6` = `Data:Info` every time. – Chase Florell Feb 25 '13 at 03:03
  • Ok, but you're not including them in the output, so they're essentially irrelevant to this task, right? Or is there risk of collision that two different codes will have the same data string? – Matt Johnson-Pint Feb 25 '13 at 03:07
  • Also, you are showing zeros in your sample results. That wouldn't be possible unless you have some third document or set of documents that list all of the different codes. In other words, if it's not in any of the documents, how would you get it into the results? – Matt Johnson-Pint Feb 25 '13 at 03:09
  • I am working up a solution. If you DO have a doc or docs that have all of the code/data pairs - let me know. I can optimize on that. – Matt Johnson-Pint Feb 25 '13 at 03:13
  • It's all Syslog data. http://en.wikipedia.org/wiki/Syslog#Facility_Levels. the upper data is what I currently have in my database. The lower block is what I'd like to get out. The value should always be a document.count of the key. – Chase Florell Feb 25 '13 at 03:15
  • If you look at the question edit history, you can see the full version of my document (if it helps for any reason) – Chase Florell Feb 25 '13 at 03:17
  • Do you need zero-count results like `{"Kernel Messages":0}` - and if so, where will the data strings come from? Can I just return codes and counts and you can populate the data strings later? – Matt Johnson-Pint Feb 25 '13 at 03:22
  • I believe that will work. Every pie chart will have a legend of every data string(8 for severity, and 24 for facility), but if the count is zero, it won't plot on the chart anyways. – Chase Florell Feb 25 '13 at 03:24

1 Answers1

4

You will need to combine several techniques to achieve this, but it is quite doable.

Here is an index that should work well for you.

public class MyIndex : AbstractMultiMapIndexCreationTask<MyIndex.ReduceResult>
{
    public class ReduceResult
    {
        public string Source { get; set; }
        public string Code { get; set; }
        public string Data { get; set; }
        public int Count { get; set; }
    }

    public MyIndex()
    {
        AddMap<MyDoc>(docs => from doc in docs
                              select new
                                     {
                                         Source = "Severity",
                                         doc.Severity.Code,
                                         doc.Severity.Data,
                                         Count = 1
                                     });

        AddMap<MyDoc>(docs => from doc in docs
                              select new
                                     {
                                         Source = "Facility",
                                         doc.Facility.Code,
                                         doc.Facility.Data,
                                         Count = 1
                                     });

        Reduce = results => from result in results
                            group result by new { result.Source, result.Code }
                            into g
                            select new
                            {
                                g.Key.Source,
                                g.Key.Code,
                                g.First().Data,
                                Count = g.Sum(x => x.Count)
                            };

        TransformResults = (database, results) =>
                           from result in results
                           group result by 0
                           into g
                           select new
                           {
                               Severity = g.Where(x => x.Source == "Severity")
                                           .ToDictionary(x => x.Data, x => x.Count),
                               Facility = g.Where(x => x.Source == "Facility")
                                           .ToDictionary(x => x.Data, x => x.Count)
                           };
    }
}

You also need a container class for the transformed result:

public class MyDocCounts
{
    public IDictionary<string, int> Severity { get; set; }
    public IDictionary<string, int> Facility { get; set; }
}

You would query it like this:

var result = session.Query<MyIndex.ReduceResult, MyIndex>()
                    .As<MyDocCounts>()
                    .ToList().First();

The .ToList() may seem redundant, but it's necessary because we are grouping in the transform.

A complete unit test is here. The output of which looks like this:

{
  "Severity": {
    "AAA": 20,
    "BBB": 20,
    "CCC": 20,
    "DDD": 20,
    "EEE": 20
  },
  "Facility": {
    "FFF": 20,
    "GGG": 20,
    "HHH": 20,
    "III": 20,
    "JJJ": 20
  }
}
Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • Ok, so I've had a chance to try this out, however, I'm wondering how I call it so that the index is built on application start. Currently running the query throws the following exception `There is no index named: MyIndex` – Chase Florell Feb 25 '13 at 22:14
  • nm, I got it... `IndexCreation.CreateIndexes(typeof(SyslogDocumentCountIndex).Assembly, DataDocumentStore.DocumentStore);` – Chase Florell Feb 25 '13 at 22:23
  • 1
    That's fine, but it will create all indexes that you have defined anywhere in your assembly. See the unit test I referenced if you want to create just the single index. – Matt Johnson-Pint Feb 25 '13 at 22:28
  • So what is the difference between what I posted above and this `documentStore.ExecuteIndex(new SyslogDocumentCountIndex());`? – Chase Florell Feb 25 '13 at 22:51
  • When unit testing, you typically just add the one index. You might have several defined in your assembly that aren't related to the test you're doing. In production, you typically do it your way. It makes no difference to the raven server. – Matt Johnson-Pint Feb 25 '13 at 23:46