1

I am trying to capture specific data where a colon exists. I have tried this:

preg_match_all("/^(.+):(.+)/im", $input_lines, $output_array);

on this input data

last_name, first_name
bjorge philip: hello world    
bjorge:world
kardashian, kim
some http://hi.com ok
jim https://hey.com yes
same http://www.vim.com:2018 why
it's about 20/08/2018 1:23 pm
time is 01:20:24 now
capture my name : my name is micky mouse
mercury, freddie
I need to be:
captured
  capture me :    
 if you can
 where is  : freddie
freddie is not:
 home 

I need to capture the bjorge philip: hello world, bjorge:world, I need to be: captured, capture me : if you can, where is : freddie, freddie is not: home and capture my name : my name is micky mouse lines and exclude any line that contains either time or URL

Kal
  • 948
  • 17
  • 30

1 Answers1

1
<?php
$input_lines="last_name, first_name
bjorge philip: hello world    
bjorge:world
kardashian, kim
some http://hi.com ok
jim https://hey.com yes
same http://www.vim.com:2018 why
it's about 20/08/2018 1:23 pm
time is 01:20:24 now
capture my name : my name is micky mouse
mercury, freddie
I need to be:
captured
  capture me :    
 if you can
 where is  : freddie
freddie is not:
 home ";

preg_match_all("/^|\n(?![^:]*$|.*?https?:|.*\d:\d+)(.*?:\s*\r?\n.*|.*?:\s?.+)/",$input_lines,$output_array);
// \r? can be omitted from regex depending on system

foreach($output_array[0] as $output){
    echo $output,"<br>";
}

Regex pattern breakdown:

^|\n                     #start string from beginning of $input_lines or after any newline
    (?!                  #begin negative lookahead group
        [^:]*$           #ignore lines with no colon
        |                #OR
        .*?https?:       #ignore lines with http: or https:
        |                #OR
        .*\d:\d          #ignore lines with digit colon digit
    )                    #end negative lookahead group
    (                    #begin capture group
        .*?:\s*\r?\n.*   #capture 2 lines if 1st line has a colon then 0 or more
                         # spaces with no non-white characters before the newline
        |                #OR
        .*?:\s?.+        #capture 1 line when it contains a colon followed by
                         # 0 or 1 space then 1 or more non-white characters
    )                    #end capture group

This returns:

bjorge philip: hello world 
bjorge:world 
capture my name : my name is micky mouse 
I need to be: captured 
capture me : if you can 
where is : freddie 
freddie is not: home 

I have spent a considerable amount of time writing this solution for you. If there are no further extensions to the sample set, I hope it earns your tick of approval.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136