ideal way to split up multiple values field into rows in database?

Question

I have a song database consist of fields like song title, artist, lyrist. Some time a song can have multiple artists or multiple lyrists.

I've learned from database normalization that I should only have one value for one field and I should split them up into multiple table like song-artist table, song-lyrist table with song_id being the primary key connecting them.

song table

song_id   |   title   |   date
  1            abc    |   2017

song-artist table

song_id   |   artist
  1            John
  1            Joe

But with this setup it feels like my whole song table has been disjointed. Is there a prettier way to split them?

It depends. Is your "songs" table read- or write-heavy? How is it queried? — johnsyweb, Feb 07 '17 at 08:39
you want to put all the information about song in one table? Are you OK with the table being large and having redudant data in it? What is your plan to handle changes such as if song title changes or artists name is changed? — Chetan, Feb 07 '17 at 08:45
@ChetanRanpariya With the current setup, an artist name change can already be troublesome. — Marc Compte, Feb 07 '17 at 08:50
Correct @MarcCompte The best way is to have separate tables for Song, Artists and Lyricists and have mapping table to link them. This will follow the normalization rule, reduce redundancy and changing values will be easier. — Chetan, Feb 07 '17 at 09:01
I'm using phpmyadmin, and they are for read mostly. The song tile and artist name are unlikely to change, maybe the artist name, but there's only a few of them who changes their name through the years, so I can live with that. — reddy, Feb 07 '17 at 09:11

score 1 · Accepted Answer · edited May 23 '17 at 12:17

But with this setup it feels like my whole song table has been disjointed. Is there a prettier way to split them?

Pretty is a very subjective term.

SETUP 1

In terms of the traditional Database Relational Model, the "prettier" setup for an N:M relationship such as this would be a normalized one, such as:

SONG (id, title, date)
PERSON (id, name)
SONG_ARTIST (song, person)
SONG_LYRICIST (song, person)

Example:

SONG
 ID       |   title   |   date
  1       |    abc    |   2017

PERSON
 ID       |    name
  1       |    John
  2       |    Mary

SONG_ARTIST
 SONG     |   person
  1       |     1
  1       |     2

SONG_LYRICIST
 SONG     |   person
  1       |     1

This is the traditional setup for an N:M relationship, which reduces 1) the size required to store the data, 2) the risks of redundancy and 3) makes it easier to ensure the referential integrity.

1) If artist John writes many songs, in your setup you type John as many times. That field cell is a string field. It really depends on the length of the field, but usually a string field requires more bytes in the disk than an integer, so repeating text fields usually requires more disk space than repeating integer fields.

2) One of the risks of redundancy has to do with data input. If you have to enter a string many times, at some point you may be misspelling it and, thus, creating a "new" artist. Another risk has to do with data maintenance. Say, for instance, you realize you typed the name of an artist wrong. That artist wrote 10 songs and his/her name appears 10 times in your database. You will have to change it 10 times, and in most cases this job will need to be done manually (more time and risks).

With the traditional relational setup you only type he name of the artist once. If you spelled it wrong it will be spelled wrong everywhere, but if you change it it will automatically change for all of them.

3) a rigid structure has its difficulties, but the relation between 1 person and his/her songs is not prone to interpretations. It may have been entered wrong, but there is no doubt which songs wrote which artist. The system will even be able to differentiate between two artists named the same. Thanks to this you can apply rules to ensure the referential integrity (such as "delete in SONG_ARTIST any reference to a particular person when I remove it from the table PERSON")

Even though you say you can live with name changes, I strongly suggest you to have the people in their own table and reference them when relating them to songs.

SETUP 1.1

From the example above, if you want to add information for instance about the bands/groups (or any other information) the first thing you need to do is analyze the relationship between this entity and every other entity in your database.

Assuming an initial basic definition of the table BAND such as this:

BAND
 ID       |   title 
  1       |  TheBand

Let's start with the most simple part:

SONG. 1 song belongs to 1 band but 1 band may have many songs (1:N)

To relate a band with its songs (1:N) we only need to add the band_id as a foreign key in the table song.

SONG
 ID       |   title   |   date    |    band
  1       |    abc    |   2017    |      1

Only by doing this you can list all songs from a band.

SELECT song.id, song.title FROM song, band 
WHERE song.band=band.id AND band.id = 1

And, since we know the musicians for each song, we can also list all musicians or lyricists involved in a band.

SELECT person.id, person.name, song.title 
FROM song, band, song_artist, person 
WHERE song.band=band.id AND song_artist.song=song.id 
AND person.id=song_artist.person AND band.id = 1

You may decide this is all your application needs to know: "who has ever been involved in any song from band X".

You may, otherwise, want to take into account that bands often invite other musicians to play a particular song, but those are not really part of the band. If you think your application needs to be able to differentiate between who is just collaborating in a band and who belongs to the core of the band, then you need to define a direct relationship between people and band.

PERSON. 1 person may be a core component in many bands and 1 band may have many core components (N:M).

As you know, an N:M relationship, in the Relational Model, must be implemented by using a third table, which will put together the bands and the people acting as core components.

Another issue arises, as the core components of a particular band is not something static and may vary with time. You could solve this by adding a start date and end date to the table BAND_CORE_COMPONENT, so you know, for each person in the band, when did he/she start and when he/she finished and you can ask the database questions such as: "who were the core components of band X in january 2012?".

BAND
 ID       |   title 
  1       |  TheBand

SONG
 ID       |   title   |   date    |    band
  1       |    abc    |   2017    |      1

PERSON
 ID       |    name
  1       |    John
  2       |    Mary

SONG_ARTIST
 SONG     |   person
  1       |     1
  1       |     2

SONG_LYRICIST
 SONG     |   person
  1       |     1

BAND_CORE_COMPONENTS
 BAND     |   person   |   started    |    ended
  1       |     2      |  2010-01-01  |  2016-06-01
  1       |     1      |  2012-01-01  |    *null*

Here you know that Mary used to be a core component of TheBand from early 2010 until mid 2016. We also know John entered later (in 2012) and is still part of TheBand. We also know that John participated as a lyricist and muscian in the song abc from TheBand and did it as a core component (because the song dates from 2017 and John is currently still a core component). In the same song Mary participated as a collaborator because the song dates from 2017 and she was not a core component of TheBand by that time.

SETUP 2

That being said, the most popular and current Relational DB Systems, such as MySQL or PostgreSQL in their latest versions, incorporate some new types that help you deal with N:M relations in a different way and reduce the amount of tables needed in your setup.

The JSON type (MySQL 5.7.8 and above, PostgreSQL 9.2 and above) could be used to store the relationships in the SONG table.

SONG
 ID       |   title   |   date    |              artists
  1       |    abc    |   2017    |  {"lyrics": [1], "music": [1,2]}

PERSON
 ID       |    name
  1       |    John
  2       |    Mary

Or even:

SONG
 ID       |   title   |   date    |              artists
  1       |    abc    |   2017    |  {"lyrics": [1], "music": {"voice": [1], "guitar": [2]}}

PERSON
 ID       |    name
  1       |    John
  2       |    Mary

This has similar advantages as the other setup (reducing redundancy and keeping referential integrity, not so sure about disk usage) but seems a bit easier to read.

It introduces a new risk to manage: as you can see the field artists allows you to store any JSON in it, so the JSON structure may be different in different rows and if that happens then the structural integrity of the data will be broken and your application will have to deal with this.

The following examples store the same information, but use completely different JSON structures.

SONG
 ID       |   title   |   date    |              artists
  1       |    abc    |   2017    |  {"lyrics": [1], "music": {"voice": [1], "guitar": [2]}}
  2       |    def    |   2016    |  {"lyrics": [1], "music": [{"person": 1, "instrument": "voice"}, {"person": 2, "instrument": "guitar"}]}

More on the JSON type in MySQL: Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?

Thanks for the explanation, I think I'll stick with setup 1 then to stay with the rules, and since also my host provider isn't on MySQL 5.7. — reddy, Feb 08 '17 at 03:33
ok I have more questions popping up while splitting the values. The song is going to have "groups" being the singing artist as well as individual singers. So I created a "group" table with `group_id` , `group_name` , `person_id` , and the "song_artist" table is going to have `song_id` `person_id` and `group_id` , with one of the latter 2 being NULL depending on the song data. Is this okay? Or should I split them even further into 2 tables "song_artist" and "song_artist_group"? — reddy, Feb 08 '17 at 04:09
Not really. You need first define a clear statement of what do you want the database to store. Do you want to know who has ever participated in any song from band A? or do you also want to know who are the core components of each band? And then analyse the relationship between band and the other entities involved. Check `SETUP 1.1` in my answer as an example. — Marc Compte, Feb 08 '17 at 11:40

ideal way to split up multiple values field into rows in database?

1 Answers1