extract chords from a tab using PHP

Question

I'm loosing my hairs trying to figure out how to parse a music (text) tab using preg_match_all and PREG_OFFSET_CAPTURE.

[D#] [G#] [Fm] 
[C#] [Fm] [C#] [Fm] [C#] [Fm] 

[C]La la la la la la [Fm]la la la la [D#]

[Fm]I made this song Cause I [Bbm]love you 
[C]I made this song just for [Fm]you [D#]
[Fm]I made this song deep in [Bbm]my heart

The output I'm trying to get :

D# G# Fm 
C# Fm C# Fm C# Fm 

C                 Fm          D#
La la la la la la la la la la

Fm                       Bbm     
I made this song Cause I love you 

C                     Fm  D#
I made this song just for you 

Fm                       Bbm
I made this song deep in my heart

And in the end, I want to wrap the chords with html tags.

Notice that the spaces between chords should match exactly the position of those chords in the original input.

I started to parse the input line by line, detect chords, get their position, ... but my code is not working... There something that's wrong in my function line_extract_chords, it works not as it should.

Any ideas ?

<style>
body{
        font-family: monospace;
        white-space: pre;
</style>

<?php 

function parse_song($content){
    $lines = explode(PHP_EOL, $content); //explode lines

    foreach($lines as $key=>$line){
        $chords_line = line_extract_chords($line);
        $lines[$key] = implode("\n\r",(array)$chords_line);
    }

    return implode("\n\r",$lines);
}

function line_extract_chords($line){

    $line_chords = null; //text line with chords, used to compute offsets
    $line_chords_html = null; //line with chords links
    $found_chords = array();

    $line = html_entity_decode($line); //remove special characters (would make offset problems)

    preg_match_all("/\[([^\]]*)\]/", $line, $matches, PREG_OFFSET_CAPTURE);

    $chord_matches = array();

    if ( $matches[1] ){
        foreach($matches[1] as $key=>$chord_match){

            $chord = $chord_match[0];


            $position = $chord_match[1];
            $offset= $position;
            $offset-= 1; //left bracket
            $offset-=strlen($line_chords); //already filled line

            //previous matches
            if ($found_chords){
                $offset -= strlen(implode('',$found_chords));
                $offset -= 2*(count($found_chords)); //brackets for previous chords
            }

            $chord_html = '<a href="#">'.$chord.'</a>';

            //add spaces
            if ($offset>0){
                $line_chords.= str_repeat(" ", $offset);
                $line_chords_html.= str_repeat(" ", $offset);
            }

            $line_chords.=$chord;
            $line_chords_html.=$chord_html;
            $found_chords[] = $chord;

        }

    }

    $line = htmlentities($line); //revert html_entity_decode()

    if ($line_chords){
        $line = preg_replace('/\[([^\]]*)\]/', '', $line);
        return array($line_chords_html,$line);
    }else{
        return $line;
    }

}
?>

Stefan Dochow · Answer 1 · 2015-11-18T15:21:55.787

I would like to propose a much simpler approach. It is based on the assumption, that the input data is actually as generically parsable as you described here.

<style>
.line{
    font-family: monospace;
    white-space: pre;
    margin-bottom:0.75rem;
}

.group{
    display: inline-block;
    margin-right: 0.5rem;
}
.group .top,
.group .top{
    display: block;
}
</style>
<?php

$input = "[D#] [G#] [Fm] 
[C#] [Fm] [C#] [Fm] [C#] [Fm] 

[C]La la la la la la [Fm]la la la la [D#]

[Fm]I made this song Cause I [Bbm]love you 
[C]I made this song just for [Fm]you [D#]
[Fm]I made this song deep in [Bbm]my heart";

$output = '';

$inputLines = explode(PHP_EOL,$input);

foreach($inputLines as $line){
    $output .='<div class="line">';

    if (!strlen($line)){
        $output .= '&nbsp;';
    }
    else{
        $inputWords = explode(' ',$line);

        foreach($inputWords as $word){
            if (preg_match('/^\[(.+)\](.+)$/', $word, $parts)){
                $output .='<span class="group"><span class="top">'.$parts[1].'</span><span class="bottom">'.$parts[2].'</span></span>';
            }
            elseif(preg_match('/^\[(.+)\]$/', $word, $parts)){
                $output .='<span class="group"><span class="top">'.$parts[1].'</span><span class="bottom">&nbsp;</span></span>';
            }
            else{
                $output .='<span class="group"><span class="top">&nbsp;</span><span class="bottom">'.$word.'</span></span>';
            }
        }
    }

    $output .='</div>';

}
die ($output);

What is done here is quite simple. The script only gives meaning to the chords data by wrapping it in HTML. The positioning and representation is dne with CSS.

Also it demonstrates that you have a little error in the way your example chords translate into the example output. Fm D# in line 5 seem to be one spot off. At least I hope so.

ADD:

Why your code didnt work.

Well it actually did. What did not work was its presentation. You counted letters in one line and replaced it with spaces in the other. Two things that do not work here as you might expect:

in basic HTML multiple consecutive white-spaces get reduced down to one in the brwoser view
usually the standard font of any browser is not monospaced. Therefore there is no easy way to replace a character with a whitespace of the same width.

So what do you do about that?

By replacing with a non breaking space ( ) instead of a simple white-space you could make sure, that all your empty spaces are actually represented in the browser view. Doing it properly would mean to set white-space: pre; as a style, so the white-spaces get actually recognized.
Set a monospaced font (font-family: monospace;) to make sure your replacements line up.

There it is:

<style>
body{
        font-family: monospace;
        white-space: pre;
</style>

<?php 


function parse_song($content){
    $lines = explode(PHP_EOL, $content); //explode lines

    foreach($lines as $key=>$line){
        $chords_line = line_extract_chords($line);
        $lines[$key] = implode("\n\r",(array)$chords_line);
    }

    return implode("\n\r",$lines);
}

function line_extract_chords($line){

    $line_chords = null; //text line with chords, used to compute offsets
    $line_chords_html = null; //line with chords links
    $found_chords = array();

    $line = html_entity_decode($line); //remove special characters (would make offset problems)

    preg_match_all("/\[([^\]]*)\]/", $line, $matches, PREG_OFFSET_CAPTURE);

    $chord_matches = array();

    if ( $matches[1] ){
        foreach($matches[1] as $key=>$chord_match){

            $chord = $chord_match[0];


            $position = $chord_match[1];
            $offset= $position;
            $offset-= 1; //left bracket
            $offset-=strlen($line_chords); //already filled line

            //previous matches
            if ($found_chords){
                $offset -= strlen(implode('',$found_chords));
                $offset -= 2*(count($found_chords)); //brackets for previous chords
            }

            $chord_html = '<a href="#">'.$chord.'</a>';

            //add spaces
            if ($offset>0){
                $line_chords.= str_repeat(" ", $offset);
                $line_chords_html.= str_repeat(" ", $offset);
            }

            $line_chords.=$chord;
            $line_chords_html.=$chord_html;
            $found_chords[] = $chord;

        }

    }

    $line = htmlentities($line); //revert html_entity_decode()

    if ($line_chords){
        $line = preg_replace('/\[([^\]]*)\]/', '', $line);
        return array($line_chords_html,$line);
    }else{
        return $line;
    }

}

$input = "[D#] [G#] [Fm] 
[C#] [Fm] [C#] [Fm] [C#] [Fm] 

[C]La la la la la la [Fm]la la la la [D#]

[Fm]I made this song Cause I [Bbm]love you 
[C]I made this song just for [Fm]you [D#]
[Fm]I made this song deep in [Bbm]my heart";



die(parse_song($input));

I removed the self:: reference to make it work standalone.

So you did not actually code anything wrong here. You just messed up the presentation of your results.

Still, you end up with a meaningless, virtually unparsable (maybe for interpretation) piece of text. The step of parsing the input should focus on giving the data meaning. If that was in the way of HTML or XML markup or even JSON for example, does not matter. But you should turn the plain text into structured data.

This way you could style it easily. You could identify single parts of the whole structure or filter them out.

I started doing it like you, but (i'm not sure to remember why), I deciced to make it in pure code without CSS. It would seems cleaner in a way. I still wonder why my code is wrong... Anyway, nice approach. — gordie, Nov 18 '15 at 08:59
I dont know your actual use case here, but why would you choose doing it in pure code over transforming it into structured data and leave the presentation layer to presentation tools? — Stefan Dochow, Nov 18 '15 at 14:51
I added an explanation on what was wrong with your code....basically nothing. Have a look at my updated answer. — Stefan Dochow, Nov 18 '15 at 15:23
Hey, thanks for your answers and comments. I guess you are right here. I should " leave the presentation layer to presentation tools". Well, thanks to make me think about it ! — gordie, Nov 18 '15 at 18:28
..But if I understand correctly, your code splits every line then every word, and checks if a chord is related to a word ? What if there is several chords on a single word ? Eg. [Em]underst[A]and ? — gordie, Nov 23 '15 at 09:47
Then the code would have to be adapted. But such a use case was not part of your example. This is why I started my answer with the hint, that this code assumes the syntax to be as simple as you presented it. To enable parsing such words you either would have to add another branch to the `if` clause, that catches that scenario or you could interpret "[x]abc" just as a specific use case of a generalized structure like "[x]abc[y]def[z]ghi..." preg_split could be the right approach for that, so you end up with a foreach, evaluating those "syllables" inside the current foreach. — Stefan Dochow, Nov 23 '15 at 12:21
I would like to avoid having too much html tags in my page, and exploding sentences like this makes the HTML quite ugly, no ? Well, I did try back using my functions from the first post (added your CSS); and I can assure you it is not working. That thing is so weird ! — gordie, Nov 24 '15 at 01:00
this is not about having pretty HTML but about having semantically meaningful HTML. I thought, I pointed that out. If you have a complex context chances are, displaying it requires a certain complexity as well. — Stefan Dochow, Nov 24 '15 at 01:50
Hi Stefan, I'm really amazed how I can't make this work :) I did try to adapt your code but split chords instead of words. I guess this should be the way to achieve it then, but... I got sentences repeated. What do you think of this ? http://pastie.org/10583082 — gordie, Nov 27 '15 at 00:18

score 1 · Answer 2 · edited May 23 '17 at 11:59

Ok, I finally found out a way to make it work, based on Stefan's answer, but tweaked to split a line when the limit between chords and words is reached.

<style>
.ugs-song{
    font-family: monospace;
    white-space: pre;
    margin-bottom:0.75rem;
}

.ugs-song-line-chunk{
    display: inline-block;
}
.ugs-song-line-chunk .top,
.ugs-song-line-chunk .bottom{
    display: block;
}
</style>

<?php

function parse_song($content){

    $input_lines = explode(PHP_EOL, $content); //explode lines

    $chunks_pattern = '~ \h*
    (?|        # open a "branch reset group"
        ( \[ [^]]+ ] (?: \h* \[ [^]]+ ] )*+ ) # one or more chords in capture group 1

        ( [^[]* (?<=) )  # eventual lyrics (group 2)
      |                      # OR
        ()                   # no chords (group 1)
        ( [^[]* [^[] )   # lyrics (group 2)
    )          # close the "branch reset group"
    ~x';

    $chords_pattern = '/\[([^]]*)\]/';

    //get line chunks
    $all_lines_chunks = null;

    foreach ((array)$input_lines as $key=>$input_line){
        if (preg_match_all($chunks_pattern, $input_line, $matches, PREG_SET_ORDER)) {
            $all_lines_chunks[$key] = array_map(function($i) { return [$i[1], $i[2]]; }, $matches);
        }
    }

    foreach ((array)$all_lines_chunks as $key=>$line_chunks){
        $line_html = null;

        foreach ((array)$line_chunks as $key=>$single_line_chunk){

            $chords_html = null;
            $words_html = null;

            if ($chords_content = $single_line_chunk[0]){

                if (preg_match_all($chords_pattern, $chords_content, $matches, PREG_SET_ORDER)) {

                    $chords_content = null; //reset it

                    foreach ((array)$matches as $match){
                        $chord_str = $match[1];
                        $chords_content.= sprintf('<a class="ugs-song-chord" href="#">%s</a>',$chord_str);



                    }
                }
            }

            if (!$chords_content) $chords_content = "&nbsp;"; //force content if empty !
            $chords_html = sprintf('<span class="top">%s</span>',$chords_content);


            if (!$words_content = $single_line_chunk[1]) $words_content = "&nbsp;"; //force content if empty !
            $words_content = preg_replace('/\s(?=\S*$)/',"&nbsp;",$words_content); //replace last space by non-breaking space (span would trim a regular space)


            $words_html = sprintf('<span class="bottom">%s</span>',$words_content);

            $line_html.= sprintf('<div class="ugs-song-chunk">%s</div>',$chords_html.$words_html);
        }

        $all_lines_html[]=sprintf('<div class="ugs-song-line">%s</div>',$line_html);
    }

    return implode(PHP_EOL,$all_lines_html);

}

$input = "[C]Hush me, tou[C]ch me
[Gm]Perfume, the wind and the lea[C]ves
[C]Hush me, tou[C]ch me
[Gm]The burns, the holes in the she[C]ets";

echo parse_song($input);
?>

extract chords from a tab using PHP

2 Answers2