0

I have a csv file with contents in the following format:

CSE110, Mon, 1:00 PM, Fri, 1:00 PM
CSE114, Mon, 8:00 AM, Wed, 8:00 AM, Fri, 8:00 AM

which is basically course name followed by it's timings.

what's the best data structure to parse and store this data?

I tried using named tuples as follows:

CourseTimes = namedtuple('CourseTimes', 'course_name, day, start_time ')

But a single course can be scheduled on multiple days and time as shown for cse114 above. This can only be decided at run-time. How to handle this?

or else, Can I make use of Dictionary or List?

I am trying to solve a scheduling problem to assign TAs to courses. I might have to compare times to check for any collisions in the future

Also to complicate things up, the input file has other data as well which I need to parse. Basically the following is the format.

//Course times
CSE110, Mon, 1:00 PM, Fri, 1:00 PM
CSE114, Mon, 8:00 AM, Wed, 8:00 AM, Fri, 8:00 AM
....

//Course recitation times
CSE306, Mon, 2:30 PM
CSE307, Fri, 4:00 PM
...

//class strength
CSE101, 44, yes
CSE101, 115, yes
...

I need store all this in separate data structures I suppose. What could be the right reg-ex patterns for each of the category?

raghu
  • 131
  • 3
  • 13
  • 1
    You need to structure your data to suit what you are going to *do* with it. – Martijn Pieters Mar 18 '15 at 21:18
  • 1
    The right data structure depends on what you want to do with the data. If you just want to print the data, then one big string is all you need. If you need to sort or count or do something else, then those operations inform your choice of data structure. – unutbu Mar 18 '15 at 21:18
  • 1
    Why not use a `dictionary`? – Mazdak Mar 18 '15 at 21:19
  • I am trying to solve a scheduling problem to assign TAs to courses. I might have to compare times to check for any collisions in the future. – raghu Mar 18 '15 at 21:23
  • So given what as input (e.g. a course name? a date?), what information do you need to retrieve? It pays to be thorough when enumerating the operations needed here. – unutbu Mar 18 '15 at 21:26

2 Answers2

2

Start with noting a few things about your data:

  1. You have a number of unique strings (the courses)
  2. After each course, there is a number of strings (the times the class meets per week)

With that, you have a series of unique keys that each have a number of values.

Sounds like a dictionary to me.

To get that data into a dictionary, start with reading the file. Next, you can either use regular expressions to select each [day], [hour]:[minutes] [AM/PM] section or plain old string.split() to break the line into sections by the commas. The course string is the key into the dictionary with the rest of the line as a tuple or list of values. Move onto the next line.

Celeo
  • 5,583
  • 8
  • 39
  • 41
1
{
    'CSE110': {'Mon': ['8: 00 AM'], 'Wed': ['8: 00 AM'], 'Fri': ['8: 00 AM'], 
    'CSE110': {'Mon': ['1: 00 PM'], 'Fri': ['1: 00 PM']}
}

A dictionary of this form. A course can have multiple slots for the same day.

When you read the csv file, you create for the course and that day(if it doesnt already exists) and assign it a single element list for the timing. If the value for the course and day is already present, you just append to the existing list. This means that course has more than one timings on the same day.

You don't need a regex to find the category of the input line. The first and second types that you have(i.e. single day and multiple days) can be found like

l = line.split(', ')
try:
    n = int(l[1]) # n = strength
except:
    #continue adding to dictionary since second element in the list is not an integer
hyades
  • 3,110
  • 1
  • 17
  • 36
  • sounds good. Also, what could be the right reg-ex pattern to find pattern like these? Please check the edit in the main post. Thanks – raghu Mar 18 '15 at 21:54