3

I need do display a distance matrix on my web-page for all the nearby locations for a city.

I would like to fetch all this data from web-service and save in my DB in advance. I am trying to figure out the best relational DB design to save such a data.

I want to avoid redundant data and also a design which gives optimal performance.

I know relation DB is not the best option for this but that is something I can not help at this point.

Question: So what is the best DB schema design to store such info. I would need to query DB providing just one city and I would have to display a matrix of 5 or 10 closest cities.

Travel time is not that important, I am concerned about distance mainly.

A matrix of this kind minus the duration

AJ.
  • 2,561
  • 9
  • 46
  • 81
  • I think you're going to have to store the distance for each pair of cities. It's not possible to calculate travel time between cities from any simple data point since that depends on the roads between them. – Ramon Sep 11 '12 at 05:26
  • It's very hard to answer your type of question AJ. It's very subjective, doesn't really have a perfect answer and depends on a lot of factors, etc... – Erik Sep 11 '12 at 05:31
  • I am looking for the best approach or the most logical one. I am just worried about scalability aspect. – AJ. Sep 11 '12 at 05:34

2 Answers2

3

For the sake of performance, and assuming you are using InnoDB, I'd probably denormalize the data a bit, like this:

CREATE TABLE CITY (
    CITY_ID INT PRIMARY KEY
);

CREATE TABLE CITY_DISTANCE (
    CITY1_ID INT,
    CITY2_ID INT,
    DISTANCE NUMERIC NOT NULL,
    PRIMARY KEY (CITY1_ID, DISTANCE, CITY2_ID),
    FOREIGN KEY (CITY1_ID) REFERENCES CITY (CITY_ID),
    FOREIGN KEY (CITY2_ID) REFERENCES CITY (CITY_ID)
);

Each pair of cities has 2 rows in CITY_DISTANCE containing the same DISTANCE (one for each direction). This could obviously make it very big and could lead to data inconsistencies (the database will not defend itself from non-matching DISTANCE values between same cities), and the DISTANCE doesn't logically belong to the PK, but bear with me...

InnoDB tables are clustered, which means that by declaring the PK in this particular way we put the whole table in a B-Tree that is particularly suited for a query like this:

SELECT CITY2_ID, DISTANCE
FROM CITY_DISTANCE
WHERE CITY1_ID = 1
ORDER BY DISTANCE
LIMIT 5

This query returns the closest 5 cities to the city identified by 1, and can be satisfied by a simple range scan on the B-Tree mentioned above:

id  select_type table           type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      CITY_DISTANCE   ref     PRIMARY         PRIMARY 4       const   6       "Using where; Using index"

BTW, the InnoDB will automatically create one more index (on CITY2_ID) because of the second FK, which will also include the CITY1_ID and DISTANCE because secondary indexes in clustered tables must cover PK. You might be able to exploit that to avoid duplicated DISTANCEs (explicitly create index on {CITY2_ID, DISTANCE, CITY1_ID} and let FK reuse it, and CHECK (CITY1_ID < CITY2_ID)), but MySQL query optimizer is probably not smart enough to deal with the query that would be required on such a structure.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
0

The easiest way would be to store a pair of cities along with the distance and any other data you want to be able to display. I'd store the cities themselves in a separate table, and only store two keys and the distance information in a distance table.

I you're sure you only want to display the 5 or 10 closest at most, you can start with only adding those records. That means for N cities you will only get N*10 records in the database which should be quite scalable.

Even with larger number of records, the performance should be good if you add proper indexes.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • What is the benefit of storing cities in different DB? Wouldn't this mean I will have to use join or nested query while fetching data. I will first have to get the city index and then fetch the distances using the index. Also I need to be able to just query on the basis of one city. I need 10X10 matrix data based on one city name. – AJ. Sep 11 '12 at 06:17
  • 1
    Different table, not different DB. Yes, you have to join. It's called normalization. http://en.wikipedia.org/wiki/Database_normalization – GolezTrol Sep 11 '12 at 06:36