But with this setup it feels like my whole song table has been
disjointed. Is there a prettier way to split them?
Pretty is a very subjective term.
SETUP 1
In terms of the traditional Database Relational Model, the "prettier" setup for an N:M relationship such as this would be a normalized one, such as:
SONG (id, title, date)
PERSON (id, name)
SONG_ARTIST (song, person)
SONG_LYRICIST (song, person)
Example:
SONG
ID | title | date
1 | abc | 2017
PERSON
ID | name
1 | John
2 | Mary
SONG_ARTIST
SONG | person
1 | 1
1 | 2
SONG_LYRICIST
SONG | person
1 | 1
This is the traditional setup for an N:M relationship, which reduces 1) the size required to store the data, 2) the risks of redundancy and 3) makes it easier to ensure the referential integrity.
1) If artist John writes many songs, in your setup you type John as many times. That field cell is a string field. It really depends on the length of the field, but usually a string field requires more bytes in the disk than an integer, so repeating text fields usually requires more disk space than repeating integer fields.
2) One of the risks of redundancy has to do with data input. If you have to enter a string many times, at some point you may be misspelling it and, thus, creating a "new" artist. Another risk has to do with data maintenance. Say, for instance, you realize you typed the name of an artist wrong. That artist wrote 10 songs and his/her name appears 10 times in your database. You will have to change it 10 times, and in most cases this job will need to be done manually (more time and risks).
With the traditional relational setup you only type he name of the artist once. If you spelled it wrong it will be spelled wrong everywhere, but if you change it it will automatically change for all of them.
3) a rigid structure has its difficulties, but the relation between 1 person and his/her songs is not prone to interpretations. It may have been entered wrong, but there is no doubt which songs wrote which artist. The system will even be able to differentiate between two artists named the same. Thanks to this you can apply rules to ensure the referential integrity (such as "delete in SONG_ARTIST any reference to a particular person when I remove it from the table PERSON")
Even though you say you can live with name changes, I strongly suggest you to have the people in their own table and reference them when relating them to songs.
SETUP 1.1
From the example above, if you want to add information for instance about the bands/groups (or any other information) the first thing you need to do is analyze the relationship between this entity and every other entity in your database.
Assuming an initial basic definition of the table BAND such as this:
BAND
ID | title
1 | TheBand
Let's start with the most simple part:
- SONG. 1 song belongs to 1 band but 1 band may have many songs (1:N)
To relate a band with its songs (1:N) we only need to add the band_id as a foreign key in the table song.
SONG
ID | title | date | band
1 | abc | 2017 | 1
Only by doing this you can list all songs from a band.
SELECT song.id, song.title FROM song, band
WHERE song.band=band.id AND band.id = 1
And, since we know the musicians for each song, we can also list all musicians or lyricists involved in a band.
SELECT person.id, person.name, song.title
FROM song, band, song_artist, person
WHERE song.band=band.id AND song_artist.song=song.id
AND person.id=song_artist.person AND band.id = 1
You may decide this is all your application needs to know: "who has ever been involved in any song from band X".
You may, otherwise, want to take into account that bands often invite other musicians to play a particular song, but those are not really part of the band. If you think your application needs to be able to differentiate between who is just collaborating in a band and who belongs to the core of the band, then you need to define a direct relationship between people and band.
- PERSON. 1 person may be a core component in many bands and 1 band may have many core components (N:M).
As you know, an N:M relationship, in the Relational Model, must be implemented by using a third table, which will put together the bands and the people acting as core components.
Another issue arises, as the core components of a particular band is not something static and may vary with time. You could solve this by adding a start date and end date to the table BAND_CORE_COMPONENT, so you know, for each person in the band, when did he/she start and when he/she finished and you can ask the database questions such as: "who were the core components of band X in january 2012?".
BAND
ID | title
1 | TheBand
SONG
ID | title | date | band
1 | abc | 2017 | 1
PERSON
ID | name
1 | John
2 | Mary
SONG_ARTIST
SONG | person
1 | 1
1 | 2
SONG_LYRICIST
SONG | person
1 | 1
BAND_CORE_COMPONENTS
BAND | person | started | ended
1 | 2 | 2010-01-01 | 2016-06-01
1 | 1 | 2012-01-01 | *null*
Here you know that Mary used to be a core component of TheBand from early 2010 until mid 2016. We also know John entered later (in 2012) and is still part of TheBand. We also know that John participated as a lyricist and muscian in the song abc from TheBand and did it as a core component (because the song dates from 2017 and John is currently still a core component). In the same song Mary participated as a collaborator because the song dates from 2017 and she was not a core component of TheBand by that time.
SETUP 2
That being said, the most popular and current Relational DB Systems, such as MySQL or PostgreSQL in their latest versions, incorporate some new types that help you deal with N:M relations in a different way and reduce the amount of tables needed in your setup.
The JSON type (MySQL 5.7.8 and above, PostgreSQL 9.2 and above) could be used to store the relationships in the SONG table.
SONG
ID | title | date | artists
1 | abc | 2017 | {"lyrics": [1], "music": [1,2]}
PERSON
ID | name
1 | John
2 | Mary
Or even:
SONG
ID | title | date | artists
1 | abc | 2017 | {"lyrics": [1], "music": {"voice": [1], "guitar": [2]}}
PERSON
ID | name
1 | John
2 | Mary
This has similar advantages as the other setup (reducing redundancy and keeping referential integrity, not so sure about disk usage) but seems a bit easier to read.
It introduces a new risk to manage: as you can see the field artists
allows you to store any JSON in it, so the JSON structure may be different in different rows and if that happens then the structural integrity of the data will be broken and your application will have to deal with this.
The following examples store the same information, but use completely different JSON structures.
SONG
ID | title | date | artists
1 | abc | 2017 | {"lyrics": [1], "music": {"voice": [1], "guitar": [2]}}
2 | def | 2016 | {"lyrics": [1], "music": [{"person": 1, "instrument": "voice"}, {"person": 2, "instrument": "guitar"}]}
More on the JSON type in MySQL: Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?