2

I am currently working on a small project (Python 3.6.2) for automated timetable creation based on input I crawl from my college website. For creating the timetable and storing the appointments (and later reorganizing them, to find "better" schedules) I currently use Pandas DataFrame structure and am not fully satisfied. I want appointments to occupy multiple cells.

08:55:00    
09:40:00    
09:50:00    Mod, Spez, Sem
11:30:00    
11:40:00    
13:20:00    
13:30:00    Systemnahe und parallele Programmierung - Ü 01
15:10:00    
15:20:00    
16:00:00    
16:05:00    
16:15:00    Modellierung, Spezifikation und Semantik - Ü 02
17:00:00    
17:55:00    

The first appointment for example takes place from 09:50 to 11:40 and should therefore occupy these slots. A quick solution would be to just put the name of this appointment in every timeslot it occupies but this wouldn't feel and look very clean. I want to export it to Excel when I'm finished. So my question is, how do I accomplish this or am I wrong with using DataFrames all along?

salah2
  • 23
  • 2
  • Your question is rather unclear. You present something you're not satisfied with, but don't give much guidance what you would be satisfied with, nor do you give much information about the application. Also, you seem to be using Series rather than DataFrame. If you repeat things in each applicable row, after you export it to Excel you can probably create a macro to merge cells, but that's more of an Excel question than Pandas. – Acccumulation Nov 21 '17 at 19:32
  • @Paul H That seems to be assuming that every time slot will be filled, and that every blank time slot should be filled by the most recent filled time slot. – Acccumulation Nov 21 '17 at 19:34
  • Seems like a safe assumption: *The first appointment for example takes place from 09:50 to 11:40 and should therefore occupy these slots* – Paul H Nov 21 '17 at 19:37
  • I don't think you can do this with Dataframes. As far as I know, each row needs to be meaningful in its own right. – jbcoe Nov 21 '17 at 19:47

1 Answers1

1

You need to first forward-fill your data, then group by then appoint, and finally summarize the time column:

from io import StringIO
import pandas

raw = StringIO("""\
08:55:00;
09:40:00;
09:50:00;Mod, Spez, Sem
11:30:00;
11:40:00;
13:20:00;
13:30:00;Systemnahe und parallele Programmierung - Ü 01
15:10:00;
15:20:00;
16:00:00;
16:05:00;
16:15:00;Modellierung, Spezifikation und Semantik - Ü 02
17:00:00;
17:55:00;
""")

df = (
    pandas.read_table(raw, sep=';', header=None, names=['time', 'appt'], parse_dates=['time'])
        .fillna(method='ffill')
        .assign(offset=lambda df: df['appt'].shift(-1))
        .query('appt == offset')
        .groupby('appt')['time']
        .describe()[['first', 'last']]
        .rename(columns={'first': 'begin', 'last': 'end'})
        .sort_values(by=['begin'])
        .reset_index()
)

And that give me:

                                              appt                begin                  end
0                                   Mod, Spez, Sem  2017-11-21 09:50:00  2017-11-21 11:40:00
1   Systemnahe und parallele Programmierung - Ü 01  2017-11-21 13:30:00  2017-11-21 16:00:00
2  Modellierung, Spezifikation und Semantik - Ü 02  2017-11-21 16:15:00  2017-11-21 17:00:00
Paul H
  • 65,268
  • 20
  • 159
  • 136
  • Thank you, that helped me a lot. Seems like I was thinking about the wrong solution for my problem, rather than merging slots arranging them like you did seems to be the better solution. – salah2 Nov 21 '17 at 20:17