binning information based on criteria using C++

Question

I have my information that looks like this

No.      ID        DATE_EVENT   TIME_EVENT    EVENT   CODE
102995   018159871 07/08/2014   09:01:57      9008    1111
20398    018159871 07/08/2014   09:01:58      1000    1402
105541   018159871 07/08/2014   09:01:58      9210    1111
63492    018253609 07/08/2014   09:54:26      9008    905
37552    018253609 07/08/2014   09:54:45      9008    1111
9627     018253609 07/08/2014   09:54:48      9210    1111
112700   018253609 07/08/2014   09:54:48      1000    1402
50555    018253609 07/08/2014   09:55:56      1000    1401
63634    018253609 07/08/2014   09:55:56      9210    1111 
34551    018330948 07/08/2014   09:21:51      9008    905
47252    018330948 07/08/2014   09:22:15      9008    1111
3975     018330948 07/08/2014   09:22:17      1000    1402
24196    018330948 07/08/2014   09:22:17      9210    1111
111150   018342571 07/08/2014   09:40:08      9008    905
17119    018342571 07/08/2014   09:40:19      9008    1111
18658    018342571 07/08/2014   09:40:21      9210    1111
25654    018342571 07/08/2014   09:40:21      1000    1402

As you can see the information is sorted by time and ID. What I would like to be able to do is count the amount of time spent on 9008 905 & 9008 1111 before going onto whatever next

and I am reading it in like this

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>

using namespace std;

vector<string> &SplitString(const string &s, char delim, vector<string> &elems)
{
    stringstream ss(s);
    string item;

    while (getline(ss, item, delim))
    {
        elems.push_back(item);
    }

    return elems;
}

int main(int argc, const char * argv[])
{

    ifstream CustJ("/Users/Rr/p/B/Sample 1.txt");

    string str;
    string elements;

    CustJ.seekg(0, ios::end);
    str.reserve(CustJ.tellg());
    CustJ.seekg(0, ios::beg);

    str.assign((istreambuf_iterator<char>(CustJ)),
               istreambuf_iterator<char>());    

    if (str.length() > 0)
    {

        vector<string> lines;
        SplitString(str, '\n', lines);

        vector<vector<string> > LineElements;

        for (auto it : lines)
        {

            vector<string> elementsInLine;

            SplitString(it, ',', elementsInLine);

            LineElements.push_back(elementsInLine);
         }

        //this displays each element in an organized fashion

        //for each line
        for (auto it : LineElements)
        {
            //for each element IN that line
            for (auto i : it)
            {
                //if it is not the last element in the line, then insert comma 
                if (i != it.back())
                    std::cout << i << ',';
                else
                    std::cout << i;//last element does not get a trailing comma
            }
            //the end of the line
            std::cout << '\n';
        }
    }
    else
    {
        std::cout << "File Is empty" << std::endl;
        return 1;
    }



    system("PAUSE");
    return 0;
}

I am not sure if this is the best way to approach this problem.

Thanks.

Define what "bin the rows" means. You should know best if you read the information correctly because you have the original text file and the output of the program, we have neither. — nwp, Aug 22 '14 at 10:49
Once we know what you really want to do, do you really need to use C++? Check out python/ruby/even C#+LINQ for that task. — Dmitry Ledentsov, Aug 22 '14 at 14:54
Thanks @Dmitry. I edited my question, hopefully to make things clearer, and am looking into the possibility of using Ruby. — Taylrl, Aug 24 '14 at 12:41
@Taylrl, this is not a simple question then. You need to think how to partition your problem into workable steps. The code doesn't matter, really. I'll show some code soon, but it's basically making a full solution to your problem. — Dmitry Ledentsov, Aug 24 '14 at 12:56
I'll **assume**, the duration is the timedate difference to the _next_ event, i.e. `25654 018342571 07/08/2014 09:40:21 1000 1402` is unterminated — Dmitry Ledentsov, Aug 24 '14 at 13:02

Dmitry Ledentsov · Answer 1 · 2014-08-25T07:47:48.100

You've reformulated the question, which made it much more understandable. The code is not the most important thing here, in my view. What you have to do is decompose the whole task into workable items, which would make the task solvable.

There might be a super elegant answer in languages other than C++ - in Perl, Python, Ruby. I'll write an answer in C#, since the typical infrastructure (IDE) might be of help to you, and LINQ (language integrated query) is your friend in such tasks.

No guarantees on the correctness of the code, since there are too many parts of the answer to your question. The code is not robust as it will throw exceptions in many places if the input is inappropriate, etc. It's up to you to define the error handling strategy. In any case, you might want to reimplement that in a different language.

The first component is the input from file. In declarative form:

var lines = File
    .ReadAllLines("input.txt", Encoding.Default)
    .Skip(1);

We'll need to calculate time-spans from adjacent date-times, hence we pair them:

var event_tuples = lines
    .Zip(lines.Skip(1), (start, end) => new { Start = start, End = end });

Then we can structure the data for further clearer query:

var entries = event_tuples
    .Select(x => {
        var start_data = x.Start.ParseColumns();
        var end_data = x.End.ParseColumns();
        var duration = end_data.ToDateTime() - start_data.ToDateTime();

        return new
        {
            No=start_data[0],
            Id=start_data[1],
            Duration = duration,
            Event = start_data[4],
            Code = start_data[5]
        };
    })
;

Here you can see the use of the previous structured query output: .Start and .End. More on ParseColumns and ToDateTime later.

Now to your example query:

count the amount of time spent on 9008 905 & 9008 1111 First find the corresponding events

var query = entries
    .Where(x => x.Event == "9008"
                && new[] { "905", "1111" }.Contains(x.Code))
;

Console.WriteLine("{0} events found",query.Count());

and then, calculate total duration:

var total_duration = query
    .Select(x=>x.Duration)
    .Aggregate((a, b) => a + b);

Console.WriteLine("total duration: {0}", total_duration);

As you see, there are quite a lot of concerns here: file input, parsing strings, date-time parsing, querying, aggregating. Each requires special care. What you most definitely don't want to do is spend time on low-level detail, such as end-line handling. Consider working with appropriate tools at the highest sufficient level of abstraction.

Back to ParseColumns and ToDateTime. I've written them as Extension Methods that are the basis of LINQ and help writing declarative code, even their use might be speculative here. In other languages there are other mechanisms that would allow such readability.

The example, problem-specific implementations here:

static class Extensions {
    public static string[] ParseColumns(this String line)
    {
        return line.Split(new char[] { ' ' },
                          StringSplitOptions.RemoveEmptyEntries);
    }

    public static DateTime ToDateTime(this String[] line)
    {
        const string datetime_format = "dd/MM/yyyy H:mm:ss";
        return DateTime.ParseExact(
            line[2] + " " + line[3], 
            datetime_format, 
            CultureInfo.InvariantCulture
        );
    }
}

This is partly hiding some uglier parts of the code that 'just make it work' for this example. If the software you're writing is going to be used and later extended, such parts will find their way somewhere else in the code, preferrably, behind abstractions.

If you stick to C++, you'll probably want to take a look at cpplinq.

runnable at rextester

Extra reading: Martin Fowler: Collection Pipeline

binning information based on criteria using C++

1 Answers1