0

I have n excel files in a directory that stores employees' activities per day in a month. They have three columns: date, activity and category. I don't need the category.

I want to basically read the n excel files and output a word document that effectively puts all activities per employee ordered by date, for example:

Day 1 to day 5:

Day one:
Employee #1:
-Task a
-Task b
-Task c

Employee #2:
-Task a
-Task b
-Task c
...

Day 2: ...
...
...
Day 7 to day 11:
... ...

I want to know what data structure i could use to effectively hold this information so i can easily write the document i want. Right now, i am using an array of string to hold each excel line, and store all of them in a List, which then i store in a dictionary for each employee, the key being the username of each employee.

While i think these data structures are efficient themselves, they aren't too friendly for my main objective which is to print the data ordered per day for each employee, so maybe the key itself should be the date.

To recap: current data structure being used:

Dictionary<string,List<string[]>> dictActividades = new     Dictionary<string,List<string[]>>();

All lines in each employee's excel file are stored in a List. I really don't need the dictionary features because i could be reading everything back in order. I could be printing as soon as i am reading a employee's excel file but i would have to read the n excel files at once (although n is very small)

Thoughts?

EDIT: here's what i currently have:

        string directorioActividades = @"\\mar-fp01\mar_tecnologia$\Coordinacion de Apoyo a Usuarios\Informes\" + 
            fechaInicio.Year.ToString() + "\\" + fechaInicio.Year.ToString() + "-" + 
            fechaInicio.Month.ToString().PadLeft(2, '0');

        string[] archivos = Directory.GetFiles(directorioActividades, "*.xlsx");
        Dictionary<string,List<string[]>> dictActividades = new Dictionary<string,List<string[]>>();
        for (int j = 0; j < archivos.Length; j++)
        {
            List<string[]> actividades = new List<string[]>();
            string nombreArchivo = Path.GetFileNameWithoutExtension(archivos[j]);
            String excelConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;" +
                "Data Source=" + archivos[j] + ";" +
                "Extended Properties=Excel 8.0; ";
            using (OleDbConnection con = new OleDbConnection(excelConnectionString))
            {
                OleDbCommand command = new OleDbCommand("Select * From [Actividades$]", con);
                con.Open();

                OleDbDataReader dr = command.ExecuteReader();
                int cantidadcolumnas = dr.FieldCount;

                string tipodatos = null;
                string[] filaDatos = new string[cantidadcolumnas];
                while (dr.Read())
                {
                    for (int k = 0; k < cantidadcolumnas; k++)
                    {
                        tipodatos = dr.GetFieldType(k).ToString();
                        if (tipodatos == "System.Int32")
                        {
                            filaDatos[k] = dr.GetInt32(k).ToString();
                        }
                        if (tipodatos == "System.String")
                        {
                            filaDatos[k] = dr.GetString(k);
                        }
                        if (tipodatos == "System.DateTime")
                        {
                            filaDatos[k] = dr.GetDateTime(k).ToShortDateString();
                        }
                    }
                    actividades.Add(filaDatos);
                }//while dr.read
            }
            dictActividades.Add(nombreArchivo, actividades);
        }//for archivos

While this code is quite short and uses the fewest data structures i can think of, printing is quite difficult because the key is the employee's username and not the date and the code should print every activity by every employee by date as the example in the format posted above

fjleon
  • 341
  • 1
  • 3
  • 10

2 Answers2

2

Performing logic on this data is going to be very unintuitive if you don't organize it semantically and include the logic in the models. Consider the quote:

"Smart data structures and dumb code works a lot better than the other way around."
- Eric Raymond, The Cathedral and the Bazaar

Your described structure contains all of the information you need:

Day one:
Employee #1:
-Task a
-Task b
-Task c

So you start with a Task object:

class Task
{
    // fields which describe a Task
}

(You might want to name it something slightly different to avoid confusion with the built-in Task type. Maybe Job or JobTask or something like that?)

An Employee has a collection of Tasks:

class Employee
{
    public IList<Task> Tasks { get; set; }
}

And a Day has a collection of Employees:

class Day
{
    public IList<Employee> Employees { get; set; }
}

(If this domain extends beyond just this one operation then you might even name things a bit differently. Technically these are "Employees" so much as they are "EmployeeTaskLogs" for example. The size and complexity of the domain will guide any such naming. But definitely put some thought into the most sensible names, that's the whole point of this exercise.)

Then your top-level consuming code would just have a collection of Days:

var days = new List<Day>();

You can further refine the business logic by creating custom collection types. For example, if you want to take advantage of a Dictionary of days:

class DayDictionary<T> : IDictionary<T> where T : Day
{
    // implement IDictionary<T> here
}

This is where you'd include logic for things like ensuring that any given collection has exactly 5 Day objects, for example.

Once you've defined your model structures, using them becomes trivial and the semantics of them become obvious.

David
  • 208,112
  • 36
  • 198
  • 279
  • while i understand the basis of your answer (thank you!), my feeling is that it's a bit complex for what i am trying to do. Remember that my data input are excel files, one per employee and each one has rows with a date, description of activity and category (which i don't need). I already have the logic set that prints the week titles and week days, the only thing missing would be to print the activities from all employees daily in a efficient manner – fjleon Nov 13 '15 at 19:44
  • @fjleon: Then perhaps your top-level model is Employee and each one has a list of Days which contains lists of Activities. Structure it in whatever way makes sense for the domain being modeled. But once you have this structure, querying it becomes extremely simple. For example, if you have a list of Employee objects with Days properties and want to get all of the activities for a given Day: `employees.SelectMany(e => e.Days).Where(d => d.Day = someDateValue)` Or if you need the Employee data as well, loop over the Employees list manually and individually query the Days collections. – David Nov 13 '15 at 19:50
2

I would suggest a simple class like this

class EmployeeActivity
{
    public string Employee { get; set; }
    public DateTime Date { get; set; }
    public string Activity { get; set; }
}

an no special data structure - just List<EmployeeActivity>. Once populated, you can use LINQ to perform ordering/grouping needed.

Imagine instead of you dictionary you populate what I'm suggesting

var empoyeeActivies = new List<EmployeeActivity>();
// Iterate excel files like in your code and populate the list

Now you can transform it to the way shown in your example with

var result = employeeActivities
    .GroupBy(a => a.Date, (date, dateActivities) => new
    {
        Date = date,
        DateActivities = dateActivities
            .GroupBy(a => a.EmployeeName, (employeeName, employeeActivities) => new
            {
                EmployeeName = employeeName,
                Activities = empoyeeActivities.OrderBy(a => a.Activity)
            })
            .OrderBy(a => a.EmployeeName)
    })       
    .OrderBy(a => a.Date);
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • that class only would store the excel's rows which i am basically already doing in a string[3] (i don't need the date stored as date since i am not calculating anything date related). I will be updating my question with my current implementation to clarify – fjleon Nov 13 '15 at 19:47
  • As you wish. My point was that your excel file structure does not contain the data the way you need it for this report of whatever it is, so no special data structure is needed - just a flat list with values. – Ivan Stoev Nov 13 '15 at 19:51
  • @fjleon The update just proves my point. First, strongly typed class is better than array (which is also a class FYI). Second, whether you are calculating something or not, dates are better to be stored as dates, not strings. That helps sorting, grouping etc. And the reports are easily produced with LINQ queries - in fact this what LINQ is designed for, not for fancy things that you can do with normal code. – Ivan Stoev Nov 13 '15 at 20:13
  • i have always thought that LINQ is kinda cheating, i don't know any language that can query its data and apply sql-like methods to it. The solution to this issue would be indeed much harder with LINQ. Anyway, i just tried your solution and it isn't sorting correctly - it outputs every activity by employee-date instead of date-employee (meaning i want to print all activities done on day A, by all employees in order). I will try messing with the LINQ query myself – fjleon Nov 17 '15 at 16:17
  • @fjleon If you take a look at some other my posts, you'll see that I'm strongly against using LINQ for the tasks that can be accomplished with a normal code. But here it perfectly fits. Anyway, can I cooperate further - I mean, by idea I'm sorting first by Date, then inside the date by Employee Name, then inside employee - by activity. How it should be? – Ivan Stoev Nov 17 '15 at 16:24
  • Ok, so i had to basically invert your lines concerning EmployeeName and Activities and now it sorts correctly first by date then by employee name. Thanks! – fjleon Nov 17 '15 at 18:22
  • @fjleon Nice. So you see how easy it is to get whatever best suits your needs w/o rewriting the whole code - you might become soon a LINQ fan :-) You welcome, and happy coding! – Ivan Stoev Nov 17 '15 at 18:32
  • also had to remove the orderby(a => a.Activity) because it was ordering activities and i wanted them to appear in the order there were created – fjleon Nov 17 '15 at 20:32
  • i'm having a small issue, since the LINQ variable is type var, i cannot pass it to another method. Searching stackoverflow i learn that the proper way to solve this is to actually create a new class and actually make the LINQ query create the result as a type of this class, however i don't know how to code the class itself. The debugger says that var result is of type: { Fecha = {01-10-2015 12:00:00 a.m.}, FechaActividades = {System.Linq.OrderedEnumerable<<>f__AnonymousType1, string>, string>} }. – fjleon Nov 19 '15 at 12:55
  • @fjleon I'll help you doing that, but need somehow to move the conversation to chat in order to be able to post code. – Ivan Stoev Nov 19 '15 at 13:04
  • @fjleon Please post another question with your current code adjustments and a requirement of not using anonymous projection. – Ivan Stoev Nov 19 '15 at 13:11
  • i created http://stackoverflow.com/questions/33805790/how-to-pass-a-linq-result-to-a-method from what i understand the solution doesn't necessarily need to remove anonymous projection so i don't really care, the only thing i need is to be able to pass the LINQ result to another method so the second method can work with it just as if there were no additional methods. Thanks! – fjleon Nov 19 '15 at 13:53