Avoid data redundancy in a friends table

Question

I am trying to design a very simple table that stores the data of friends in a community.

Therefor I store the userId of 2 friends respectively.

Goal

User wants to load his/her friends list.

t_friends Option 1:

Query

SELECT * FROM t_friend WHRE user_id = 10

10 is the current userId of the user which is looking for his friends list and he has 1 friend userId(20)

This way userId (10) finds his friend (20) BUT what if userId(20) were looking for his friends? The query is coupled with userId.

That leads me to another design that contains redundant data:

t_friends option 2:

userId (10) loads now:

SELECT * FROM t_friend WHRE user_id=10

Similiar to that the query for userId(20) would then be:

SELECT * FROM t_friend WHRE user_id=20

But what about the redundancy? That leads me then to that query using table design option 1:

SELECT * FROM t_friend WHERE user_id=10 OR friend_id=10

I have a feeling that there is a smarter way to solve that. Do You have any experiences with that structure?

Thanks

What about redundancy? What you did is perfectly fine and there isn't a smarter way, no matter what anyone will write about it. You query for friends of a user. Your data is normalized and everything is just perfect. If you can achieve anything by attempting to "optimize" it is breaking the whole thing, creating unnecessary complex queries and making a system that's difficult to maintain. No matter who or what experience they have with SQL, when people see the table you created - everything makes perfect sense at a first glance. TL;DR: don't change anything. — Mjh, Aug 28 '15 at 13:46
Thank you @Mjh. but this way we have then double as much data. Instead of having e.g. 10GB you will have 20GB to scan and so on. — B001ᛦ, Aug 28 '15 at 13:49
@bub The final query you used is perfectly fine, and it seems to me the problem is being over-thought. One row per relationship, use the OR query, done. You could throw an IF in there to avoid it in the php: `SELECT IF(user_id=10, friend_id, user_id) AS target_id FROM t_friend WHERE user_id=10 OR friend_id=10` — Chris Baker, Aug 28 '15 at 14:02
You're solving a problem of having too much data by implementing a complex process of fetching the relation through a "witty" system - it just can't be done. Also, if you take into account the storage requirement - 10 or 20 gb are really small amounts. If your columns are 4 byte integers, then per record you spend 12 bytes which equals to `833, 333, 333` rows, before any compression kicks in. Don't solve the storage problem by breaking the logic and ruining how databases work. Also @ChrisBaker provided an excellent solution that you can implement to remedy the redundancy. — Mjh, Aug 28 '15 at 14:05
Another thing, especially if you are concerned about the physical data size of the table, remove the ID field and make the primary key be a composite of user_id and friend_id. There's no need for an arbitrary autoincrement number there, people cannot be friends with one another more than once, so the primary key is naturally a combination of the two user ids. — Chris Baker, Aug 28 '15 at 14:32
@ChrisBaker - due to how InnoDB works, your suggestion is actually very, very bad suggestion. InnoDB organizes the physical location of records by primary key - or if you will, it clusters by PK. To efficiently do that, each PK has to be a number larger than previous, hence `auto_increment` which fits the role perfectly. Doing a compound key from two mentioned columns will force tree rebalancing and physical record reordering - and that kills performance heavily. Every table should have an auto_increment. MySQL even creates a hidden one if its omitted. — Mjh, Aug 28 '15 at 14:40
To add on @MjH s comment I use the primary key for a chronical ordering when I want the whole list of friends. Surely a timestamp could do the same. — B001ᛦ, Aug 28 '15 at 14:46
@Mjh With InnoDB that may be the case. You'd have a larger BTREE, but if you aren't using that composite for primary, you'd probably index both those columns anyway, more indexes mean poorer insert performance. Range scans over the primary would be a little faster with the clustering, so less disk I/O. I am not so sure about the "virtual" autoincrement, it is my understanding that only happens in InnoDB when there's no PK and no unique. On the disk space, you're right: I suppose I assumed MyISAM, since InnoDB wasn't mentioned. InnoDb *is* the default > 5.5, so maybe that was a bad assumption. — Chris Baker, Aug 28 '15 at 15:32

score 2 · Answer 1 · answered Aug 28 '15 at 13:54

2

I think this is the only way to store data about the relationships. When you are storing the relationship try to store the min value as the userId and max value as the friendId. make both values altogether unique and you will not get any duplicate values. When you search for users use something like below

SELECT * FROM t_friend WHERE user_id=10 OR friend_id=10

answered Aug 28 '15 at 13:54

Chatura Dilan

1,502
1
16
29

Thanks. Why the min and max separation? – B001ᛦ Aug 28 '15 at 13:55
Form that you do not want to do additional search when inserting values – Chatura Dilan Aug 28 '15 at 13:56

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Add field friendship_key:

ALTER TABLE t_friend ADD friendship_key decimal(22,11);

CREATE UNIQUE INDEX friendship_key_unique ON t_friend (friendship_key);

And php part:

$friends = [$userId, $friendId];
$key = min($friends).'.'.max($friends);

$q = "SELECT * FROM t_friend WHERE friendship_key = ".$key;

insert:

$friends = [$userId, $friendId];
$key = min($friends).'.'.max($friends);

$q = "INSERT INTO t_friend (friendship_key, userId, friendId) VALUES (".implode(',', [$key, $userId, $friendId]).")";

Instead of using VARCHAR for friendship key I've used decimal to minimize data for relation key.

To keep it simple just create functions:

function insertFriendship($user1, $user2) {
    $key = min([$user1, $user2]).'.'.max([$user1, $user2]);
    $q = "INSERT INTO t_friend (friendship_key, userId, friendId) VALUES (".implode(',', [$key, $user1, $user2]).")";
    mysql_query($q);
}

function getFriendsOf($user) {
    $q = "SELECT * FROM t_friends WHERE ".$user." IN (userId, friendId)";
    return mysql_query($q);
}

function areFriends($user1, $user2) {
    $key = min([$user1, $user2]).'.'.max([$user1, $user2]);
    $q = "SELECT 1 FROM t_friends WHERE friendship_key = ".$key." LIMIT 1";
    $q = mysql_query($q);
    return (mysql_num_rows($q)>0);
}

Thank you for your effort :) I have to implement your solution and once it works I will accept. Thank you — B001ᛦ, Aug 28 '15 at 14:31

score -1 · Answer 3 · answered Aug 28 '15 at 13:44

-1

You may want to use the following query which verify if your user is not already a friend of another first :

INSERT INTO t_friend (userId, friendId)
SELECT 10, 20
WHERE NOT EXISTS (  SELECT userId
                    FROM t_friend 
                    WHERE userId = 20
                    AND friendId = 10)

Thanks to this (french) topic about redundancy verification here.

answered Aug 28 '15 at 13:44

Anwar

4,162
4
41
62

Thanks. But this doesn't help since you don't know the friends ID. But your query assumes to know a userid with 20. – B001ᛦ Aug 28 '15 at 13:46
So you should add another precision about how do you are inserting your values, I guess you know the id of both of the users before inserting the row, right ? – Anwar Aug 28 '15 at 13:49
Yes I know both of the ids when I insert.But what do you mean by another precision? – B001ᛦ Aug 28 '15 at 13:51
1

Your solution is prone to concurrency and therefore not safe for use. In SQL world, we don't check if something exists before insert. We just insert and we place `unique` constraints if we don't want duplicates for particular column value combinations. MySQL even lets us use `INSERT IGNORE` and `INSERT INTO ... ON DUPLICATE KEY UPDATE`. The website you linked should not be trusted either, too many inexperienced people with no clue suggesting solutions they can't test. – Mjh Aug 28 '15 at 14:13

Avoid data redundancy in a friends table

3 Answers3