Here is how these two approaches will be physically represented in the database:

Let us analyze both approaches...
Approach 1 (both directions stored in the table):
- PRO: Simpler queries.
- CON: Data can be corrupted by inserting/updating/deleting only one direction.
- MINOR PRO: Doesn't require additional constraints to ensure a friendship cannot be duplicated.
- Further analysis needed:
- TIE: One index covers both directions, so you don't need a secondary index.
- TIE: Storage requirements.
- TIE: Performance.
Approach 2 (only one direction stored in the table):
- CON: More complicated queries.
- PRO: Can't corrupt the data by forgetting to handle the opposite direction, since there is no opposite direction.
- MINOR CON: Requires
CHECK(UID < FriendID)
, so a same friendship can never be represented in two different ways, and the key on (UID, FriendID)
can do its job.
- Further analysis needed:
- TIE: Two indexes are necessary to cover both directions of querying (composite index on
{UID, FriendID}
and composite index on {FriendID, UID}
).
- TIE: Storage requirements.
- TIE: Performance.
The point 1 is of special interest. MySQL/InnoDB always clusters data, and secondary indexes can be expensive in clustered tables (see "Disadvantages of clustering" in this article), so it might seem as if the secondary index in approach 2 would eat-up all the advantages of fewer rows. However, the secondary index contains the exact same fields as the primary (only in the opposite order) so there is no storage overhead in this particular case. There is also no pointer to table heap (since there is no table heap), so it's probably even cheaper storage-wise that a normal heap-based index. And assuming the query is covered with the index, there won't be a double-lookup normally associated with a secondary index in a clustered table either. So, this is basically a tie (neither approach 1 nor approach 2 has significant advantage).
The point 2 is related to the point 1: it doesn't matter whether we will have a B-Tree of N values or two B-Trees, each with N/2 values. So this is also a tie: both approaches will use-up approximately same amount of storage.
The same reasoning applies to point 3: whether we search one larger B-Tree or 2 smaller ones, doesn't make much of a difference, so this is also a tie.
So, for the robustness, and despite somewhat uglier queries and a need for additional CHECK
, I'd go with the approach 2.