0

First, I would like to apologize in advance. I just started learning this a few months ago, so I need stuff broken down completely. I have a project using python and datajoint (makes sql code shorter to write) where I need to create an airport that has at least 7 airports, different planes and what not. Then I need to populate the tables with passenger reservations. Here is what I have so far.

        @schema 
        class Seat(dj.Lookup):
            definition = """
            aircraft_seat : varchar(25)
            """
            contents = [["F_Airbus_1A"],["F_Airbus_1B"],["F_Airbus_2A"],["F_Airbus_2B"],["F_Airbus_3A"], 
            ["F_Airbus_3B"],["F_Airbus_4A"],["F_Airbus_4B"],["F_Airbus_5A"],["F_Airbus_5B"], 
            ["B_Airbus_6A"],["B_Airbus_6B"],["B_Airbus_6C"],["B_Airbus_6D"],["B_Airbus_7A"], 
            ["B_Airbus_7B"],["B_Airbus_7C"],["B_Airbus_7D"],["B_Airbus_8A"],["B_Airbus_8B"], 
            ["B_Airbus_8C"],["B_Airbus_8D"],["B_Airbus_9A"],["B_Airbus_9B"],

This keeps going leaving me with a total of 144 seats on each plane.

        @schema
        class Flight(dj.Manual):
            definition = """
            flight_no  : int
            ---
            economy_price : decimal(6,2)
            departure : datetime
            arrival : datetime 
            ---
            origin_code : int
            dest_code : int
            """ 
        @schema
        class Passenger(dj.Manual):
            definition = """
            passenger_id : int
            ---
            full_name : varchar(40)
            ssn : varchar(20)
            """
        @schema
        class Reservation(dj.Manual):
            definition = """
            -> Flight
            -> Seat
            ---
            -> Passenger
    
            """

Then I populate flights and passengers:

        Flight.insert((dict(flight_no = i,
                    economy_price = round(random.randint(100, 1000), 2), 
                    departure = faker.date_time_this_month(),
                    arrival = faker.date_time_this_month(),
                    origin_code = random.randint(1,7),
                    dest_code = random.randint(1,7)))
             for i in range(315))
        Passenger.insert(((dict(passenger_id=i, full_name=faker.name(), 
                   ssn = faker.ssn()))
             for i in range(10000)), skip_duplicates = True)

Lastly I create the transaction:

        def reserve(passenger_id, origin_code, dest_code, departure):
            with dj.conn().transaction:
             available_seats = ((Seat * Flight - Reservation) & Passenger & 
             {'passenger_id':passenger_id}).fetch(as_dict=True)
        try:
            choice = random.choice(available_seats)
        except IndexError:
            raise IndexError(f'Sorry, no seats available for {departure}')
        name = (Passenger & {'passenger_id': passenger_id}).fetch1('full_name')
        print('Success. Reserving seat {aircraft_seat} at ticket_price {economy_price} for 
        {name}'.format(name=name, **choice))
        Reservation.insert1(dict(choice, passenger_id=passenger_id), ignore_extra_fields=True)
  
     
       reserve(random.randint(1,1000), random.randint(1,7), 
       random.randint(1,7),random.choice('departure'))
       
       Output[]: Success. Reserving seat E_Yak242_24A at ticket_price 410.00 for Cynthia Erickson

       Reservation()
       Output[]: flight_no      aircraft_seat      passenger_id

             66           B_Yak242_7A           441

So I am required to have 10.5 flights a day with the planes at least 75% full which leaves me needing over 30000 reservations. Is there a way to do this like 50 at a time? I have been searching for an answer and have not been able to find a solution. Thank you.

  • The most efficient way to do this would be to use multi-threading, but it is a rather complex topic that cannot be included in just a single answer. Maybe you can start by looking the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module first, the idea would be you initialize multiple processes, that each will process a transaction at once. – kennysliding Dec 19 '20 at 04:31

1 Answers1

1

One of the maintainers for DataJoint here. First off, I'd like to say thanks for trying out DataJoint; curious as to how you found out about the project.

Forewarning, this will be a long post but I feel it is a good opportunity to clear up a few things. Regarding the problem in question, not sure if I fully understand the nature of your problem but let me follow on several points. I recommend reading this answer in its entirety before determining how best to proceed for your case.

TL;DR: Compute tables are your friend.

Multi-threading

Since it has come up in the comments it is worth addressing that as of 2021-01-06, DataJoint is not completely thread-safe (at least from the perspective of sharing connections). It is unfortunate but it is mainly due to a standing issue with PyMySQL which is a principal dependency of DataJoint. That said, if you initiate a new connection on each thread or process you should not run into any issues. However, this is an expensive workaround and can't be combined with transactions since they require that operations be conducted within a single connection. Speaking of which...

Compute Tables and Job Reservation

Compute tables is one noticeable omission from your above attempt at a solution. Compute tables provide a mechanism to associate its entities to those in an upstream parent table with addional processing prior to insert (defined in a make method in your Compute table class) where it may be inoked by calling the populate method which calls the make method for each new entry. Calls to your make method are transaction-constrained and should achieve what you are looking for. See here in the docs for more details in its use.

Also, for additional performance gains, there is another feature called Job Reservation which provides a means to pool together multiple workers to process large data sets (using populate) in an organized, distributed manner. I don't feel it is required here but worth mentioning and ultimately up to how you view the results below. You may find out more on this feature here in our docs.

Schema Design

Based on my understanding of your initial design, I have some suggestions how we can improve the flow of the data to increase clarity, performance, and also to provide specific examples on how we can use the power of Compute tables. Running as illustrated below on my local setup, I was able to process your requirement of 30k reservations in 29m54s with 2 different plane model types, 7 airports, 10k possible passengers, 550 available flights. Minimum 75% seating capacity was not verified only because I didn't see you attempt this yet, though if you see how I am assigning seats you will notice that it is almost there. :)

Disclaimer: I should note that the below design is still a large oversimplification of the actual real-world challenge to orchestrate proper travel reservations. Considerable assumptions were taken mainly for the benefit of education as opposed to submitting a full, drop-in solution. As such, I have explicitly chosen to avoid using longblob for the below solution so that it is easier to follow along. In reality, a proper solution would likely include more advanced topics for further performance gains e.g. longblob, _update, etc.

That said, let's begin by considering the following:

import datajoint as dj # conda install -c conda-forge datajoint or pip install datajoint
import random
from faker import Faker # pip install Faker
faker = Faker()
Faker.seed(0) # Pin down randomizer between runs

schema = dj.Schema('commercial_airtravel') # instantiate a workable database

@schema 
class Plane(dj.Lookup):
    definition = """
    # Defines manufacturable plane model types
    plane_type    : varchar(25) # Name of plane model
    ---
    plane_rows    : int         # Number of rows in plane model i.e. range(1, plane_rows + 1)
    plane_columns : int         # Number of columns in plane model; to extract letter we will need these indices
    """
    contents = [('B_Airbus', 37, 4), ('F_Airbus', 40, 5)] # Since new entries to this table should happen infrequently, this is a good candidate for a Lookup table

@schema 
class Airport(dj.Lookup):
    definition = """
    # Defines airport locations that can serve as origin or destination
    airport_code : int         # Airport's unique identifier
    ---
    airport_city : varchar(25) # Airport's city
    """
    contents = [(i, faker.city()) for i in range(1, 8)] # Also a good candidate for Lookup table

@schema 
class Passenger(dj.Manual):
    definition = """
    # Defines users who have registered accounts with airline i.e. passenger
    passenger_id : serial      # Passenger's unique identifier; serial simply means an auto-incremented, unsigned bigint
    ---
    full_name    : varchar(40) # Passenger's full name
    ssn          : varchar(20) # Passenger's Social Security Number
    """

Passenger.insert((dict(full_name=faker.name(),
                       ssn = faker.ssn()) for _ in range(10000))) # Insert a random set of passengers

@schema
class Flight(dj.Manual):
    definition = """
    # Defines specific planes assigned to a route
    flight_id            : serial                      # Flight's unique identifier
    ---
    -> Plane                                           # Flight's plane model specs; this will simply create a relation to Plane table but not have the constraint of uniqueness
    flight_economy_price : decimal(6,2)                # Flight's fare price
    flight_departure     : datetime                    # Flight's departure time
    flight_arrival       : datetime                    # Flight's arrival time
    -> Airport.proj(flight_origin_code='airport_code') # Flight's origin; by using proj in this way we may rename the relation in this table
    -> Airport.proj(flight_dest_code='airport_code')   # Flight's destination
    """

plane_types = Plane().fetch('plane_type') # Fetch available plane model types
Flight.insert((dict(plane_type = random.choice(plane_types),
                    flight_economy_price = round(random.randint(100, 1000), 2), 
                    flight_departure = faker.date_time_this_month(),
                    flight_arrival = faker.date_time_this_month(),
                    flight_origin_code = random.randint(1, 7),
                    flight_dest_code = random.randint(1, 7))
               for _ in range(550))) # Insert a random set of flights; for simplicity we are not verifying that flight_departure < flight_arrival

@schema
class BookingRequest(dj.Manual):
    definition = """
    # Defines one-way booking requests initiated by passengers
    booking_id : serial                                # Booking Request's unique identifier
    ---
    -> Passenger                                       # Passenger who made request
    -> Airport.proj(flight_origin_code='airport_code') # Booking Request's desired origin
    -> Airport.proj(flight_dest_code='airport_code')   # Booking Request's desired destination
    """

BookingRequest.insert((dict(passenger_id = random.randint(1, 10000),
                            flight_origin_code = random.randint(1, 7),
                            flight_dest_code = random.randint(1, 7))
                       for i in range(30000))) # Insert a random set of booking requests

@schema
class Reservation(dj.Computed):
    definition = """
    # Defines booked reservations 
    -> BookingRequest              # Association to booking request
    ---
    flight_id        : int         # Flight's unique identifier
    reservation_seat : varchar(25) # Reservation's assigned seat
    """
    def make(self, key):
        # Determine booking request's details
        full_name, flight_origin_code, flight_dest_code = (BookingRequest * Passenger & key).fetch1('full_name',
                                                                                                    'flight_origin_code',
                                                                                                    'flight_dest_code')
        # Determine possible flights to satisfy booking
        possible_flights = (Flight * Plane *
                            Airport.proj(flight_dest_city='airport_city',
                                         flight_dest_code='airport_code') &
                            dict(flight_origin_code=flight_origin_code,
                                 flight_dest_code=flight_dest_code)).fetch('flight_id',
                                                                           'plane_rows',
                                                                           'plane_columns',
                                                                           'flight_economy_price',
                                                                           'flight_dest_city',
                                                                           as_dict=True)
        # Iterate until we find a vacant flight and extract details
        for flight_meta in possible_flights:
            # Determine seat capacity
            all_seats = set((f'{r}{l}' for rows, letters in zip(*[[[n if i==0 else chr(n + 64) 
                                                                    for n in range(1, el + 1)]]
                                                                  for i, el in enumerate((flight_meta['plane_rows'],
                                                                                          flight_meta['plane_columns']))])
                             for r in rows
                             for l in letters))
            # Determine unavailable seats
            taken_seats = set((Reservation & dict(flight_id=flight_meta['flight_id'])).fetch('reservation_seat'))
            try:
                # Randomly choose one of the available seats
                reserved_seat = random.choice(list(all_seats - taken_seats))
                # You may uncomment the below line if you wish to print the success message per processed record
                # print(f'Success. Reserving seat {reserved_seat} at ticket_price {flight_meta["flight_economy_price"]} for {full_name}.')
                # Insert new reservation
                self.insert1(dict(key, flight_id=flight_meta['flight_id'], reservation_seat=reserved_seat))
                return
            except IndexError:
                pass
        raise IndexError(f'Sorry, no seats available departing to {flight_meta["flight_dest_city"]}')
    
Reservation.populate(display_progress=True) # This is how we process new booking requests to assign a reservation; you may invoke this as often as necessary

Syntax and Convention Nits

Lastly, just some minor feedback in your provided code. Regarding table definitions, you should only use --- once in the definition to identify a clear distinction between primary key attributes and secondary attributes (See your Flight table). Unexpectedly, this did not throw an error in your case but should have done so. I will file an issue since this appears to be a bug.

Though transaction is exposed on dj.conn(), it is quite rare to need to invoke it directly. DataJoint provides the benefit of handling this internally to reduce the management overhead of this from the user. However, the option is still available should it be needed for corner-cases. For your case, I would avoid invoking it directly and reccomend using Computed (or also Imported) tables instead.

Raphael Guzman
  • 220
  • 2
  • 7