1

I have an array of song titles, coming from this subreddit, looking like this:

[
  "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
  "Julia Jacklin - Pressure to Party [Rock] (2019)",
  "The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
  "Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
  "$uicideboy$ - Death",
  "SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
  "Blab - afdosafhsd (2000)",
  "Something strange and badly formatted without any artist [Classical]",
  "シロとクロ「ミッドナイトにグッドナイト」(Goodnight to Midnight - Shirotokuro) - (Official Music Video) [Indie/Alternative]",
  "Victor Love - Irrationality (feat. Spiritual Front) [Industrial Rock/Cyberpunk]"
  ...
]

I am trying to parse the title and artist from them but am really struggling with regex.

I tried splitting it using "-" but it's really annoying to only get the artist afterwards.

I tried using regex too but I can't really get something working properly. This is what I had for the artist: /(?<= -{1,2} )[\S ]*(?= \[|\( )/i and this for the title: /[\S ]*(?= -{1,2} )/i.

Every entry is a song title. Before the song title could be the song's artist followed by one or two (or maybe 3?) dashes. Then the genres could be added in square brackets and/or the release date in parentheses. I do not expect perfect accuracy, some formats might be weird, in those cases, I would rather have artist be undefined than some strange parsing.

For exemple:

[
  { title: "MYTCH", artist: "Lophelia" },
  { title: "Pressure to Party", artist: "Julia Jacklin" },
  { title: "I'm Going Home", artist: "The homeless Gospel Choir" },
  { title: "The last night of the world", artist: "Lea Salonga and Simon Bowman" },
  { title: "Death", artist: "$uicideboy$" },
  { title: "Joni Mitchell Tapes", artist: "SNFU" },
  { title: "afdosafhsd", artist: "Blab" },
  { title: "Something strange and badly formatted without any artist" },
  { title: "Goodnight to midnight", artist: "shirotokuro" }, // Probably impossible with some kind of AI
  { title: "Irrationality" artist: "Victor Love" }
]
Hugo
  • 349
  • 3
  • 6
  • 23
  • Would using an object be better for this? – NewToJS May 10 '19 at 20:44
  • 1
    Where is your data from? Part of this process could be sanitizing the data, which is honestly like 90% of a data scientists job. – Adam LeBlanc May 10 '19 at 20:44
  • You say `I expect everything before the dashes to be the title`, but it looks like some of these lines start with artists. For example: `Julia Jacklin - Pressure to Party` Is `Julia Jacklin` the title? `Pressure to Party` seems more like a title here. – Mark May 10 '19 at 20:48
  • 1
    Try starting by breaking down the structure of each line in natural language... "An entry is a Song-Title followed by one or two dashes followed by an Artist, _optionally_ followed by genre(in-brackets), ..." Is any entry _only_ present along with some other entry? — and why one _**or**_ two dashes? Is there some meaning that can be distinguished by `-` vs `--`? Once the structure is stated clearly it's usually easy to turn that into a regex. – Stephen P May 10 '19 at 22:02
  • @MarkMeyer You're right, I hadn't noticed that. That changes a lot of things. – Hugo May 11 '19 at 14:52

3 Answers3

1

To achieve expected result, use below option 1. For Titles, use substr from position of indexOf of '- '(extra space) and Check for ' [' and if there is no index of ' [', then substring length is used

v.substring(v.indexOf('- ')+1, v.indexOf(' [') !== -1? v.indexOf(' [') : v.length).trim()
  1. For Artists, use substr with position 0 and indexOf of '-'

    v.substr(0, v.indexOf('-')).trim()})

Working code for reference

let arr = [
  "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
  "Julia Jacklin - Pressure to Party [Rock] (2019)",
  "The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
  "Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
  "$uicideboy$ - Death",
  "SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
  "Blab - afdosafhsd (2000)",
  "Something strange and badly formatted without any artist [Classical]"
]

let result = arr.reduce((acc, v) => {
  acc.push({
    title: v.substring(v.indexOf('- ')+1, v.indexOf(' [') !== -1? v.indexOf(' [') : (v.indexOf(' (') !== -1? v.indexOf(' (') : v.length)).trim(), 
    artist: v.substr(0, v.indexOf('-')).trim()})
  return acc
}, [])

console.log(result)

codepen - https://codepen.io/nagasai/pen/zQKRXj?editors=1010

Naga Sai A
  • 10,771
  • 1
  • 21
  • 40
1

You can do something like this:

const songs = [
  "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
  "Julia Jacklin - Pressure to Party [Rock] (2019)",
  "The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
  "Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
  "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz]",
  "Death - $uicideboy$",
  "SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
  "Title - Aritst (2000)",
  "Something strange and badly formatted without any artist [Classical]",
];

const trailingRgx = /\s*((\[[^\]]+\])|(\(\d+\))).*$/;

const details = songs.map(song => {
  const splitted = song.split(/\s+\-+\s+/);
  let title = splitted[0];
  let artist = splitted[1];
  if (splitted.length >= 2) {
    artist = artist.replace(trailingRgx, '');
  } else {
    title = title.replace(trailingRgx, '');
  }
  return {
    title,
    artist
  }
});
console.log(details);
Titus
  • 22,031
  • 1
  • 23
  • 33
1

You can use this regex that captures the title and artist part as you described in your post.

^([^-[\]()\n]+)-* *([^[\]()\n]*)

Regex Demo (deliberately shown in PCRE flavor to preserve group colors for visual appeal but it works in Javascript flavor too)

JS Code demo,

const songs = ["Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
"Julia Jacklin - Pressure to Party [Rock] (2019)",
"The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
"Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
"Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz]",
"Death - $uicideboy$",
"SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
"Title - Aritst (2000)",
"Something strange and badly formatted without any artist [Classical]"]

songs.forEach(song => {
  m = /^([^-[\]()\n]+)-* *([^[\]()\n]*)/.exec(song)
  console.log("Title: " + m[1] + ", Artist: " + m[2])
})
Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36