-4

I want to know how should I use regex to split this into an array:

input = "1254033577 2009-09-27 06:39:37 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1" 44.12.96.2      Duncan  OK  US  Hot Buys    http://www.esshopzilla.com/hotbuys/     http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=Zk5&q=ipod&aq=f&oq=&aqi=g-p1g9"

array (
  1254033577, 
  2009-09-27 06:39:37, 
  Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1, 44.12.96.2, 
  Duncan, 
  OK,
  US, 
  Hot Buys,
  http://www.esshopzilla.com/hotbuys/, 
  http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=Zk5&q=ipod&aq=f&oq=&aqi=g-p1g9"
)
Adi Inbar
  • 12,097
  • 13
  • 56
  • 69
Aakash Shah
  • 669
  • 1
  • 5
  • 12
  • 3
    Format your code or no one will take a second look at this problem. – Tobias Golbs Jul 27 '13 at 19:53
  • Actually I am new to stackoverflow and not sure how to format it :( – Aakash Shah Jul 27 '13 at 19:54
  • Even though you are new I'm pretty sure you must have seen the preview. – PeeHaa Jul 27 '13 at 19:54
  • I just formatted the code. Its the input and the output I require – Aakash Shah Jul 27 '13 at 19:57
  • Dont split, but match. – user2180613 Jul 27 '13 at 20:09
  • Is it a variable number of spaces? or a tab character? Looks like a tab to me, so use [str_getcsv()](http://www.php.net/manual/en/function.str-getcsv.php) with a "\t" separator argument – Mark Baker Jul 27 '13 at 20:17
  • @AakashShah I think "format your code" implies proper indentation, not just putting the string on the same line as the variable you're assigning it to. I formatted it for you. BTW, I was going to say that what you're asking is not realistic because there's no fixed set of criteria for which spaces delimit tokens and which are part of the token, but then Casimir et Hipployte went overboard and wrote a full set of criteria for everything you're trying to match. Wow! (Forget doing it with a split, though, for the aforementioned reason.) – Adi Inbar Jul 27 '13 at 21:04
  • possible duplicate of [Code to parse user agent string?](http://stackoverflow.com/questions/2122786/code-to-parse-user-agent-string) – Andy Lester Jul 28 '13 at 01:29

2 Answers2

3

You can try and adapt something like this:

$pattern = '~(?<id>\d++)'                                        . '\s++'
         . '(?<datetime>\d{4}-\d{2}-\d{2}\s++\d{2}:\d{2}:\d{2})' . '\s++"'
         . '(?<useragent>[^"]++)'                                . '"\s++'
         . '(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'           . '\s++'
         . '(?<name>\S++)'                                       . '\s++'
         . '(?<response>[A-Z]++)'                                . '\s++'
         . '(?<country>[A-Z]{2,3})'                              . '\s++'
         . '(?<title>(?>[^h\s]++|\s*+(?>h(?!ttp://))?|\s++)+)'   . '\s++'
         . '(?<url>\S++)'                                        . '\s++'
         . '(?<search>\S++)~';

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
    echo '<br/>id: '         . $match['id']        . '<br/>datetime: ' . $match['datetime']
       . '<br/>user agent: ' . $match['useragent'] . '<br/>ip: '       . $match['ip']
       . '<br/>name: '       . $match['name']      . '<br/>response: ' . $match['response']
       . '<br/>country: '    . $match['country']   . '<br/>title: '    . $match['title']
       . '<br/>url: '        . $match['url']       . '<br/>search: '   . $match['search'] 
       . '<br/>';
}

Notice: you can put all the fields you expect in an array and reduce the size of code.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • I the data is not consistant. There is variation in data - example 1254034963 2009-09-27 07:02:43 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1" 44.12.96.2 1 Duncan OK US Order Complete https://www.esshopzilla.com/checkout/?a=complete Electronics;Ipod - Nano - 8GB;1;190; https://www.esshopzilla.com/checkout/?a=confirm – Aakash Shah Jul 27 '13 at 23:35
  • @AakashShah: then post a question with good informations. – Casimir et Hippolyte Jul 27 '13 at 23:40
  • @CasimiretHippolyte If I were in your place, I wouldn't even bother answering his question. He doesn't show any (research) efforts and throws at us his scrapping problems (maybe illegal who knows o_O ?). I close voted it. Damn this regex is so nice and optimised, he really doesn't deserve it. – HamZa Jul 28 '13 at 02:04
0

Your problem isn't that you're trying to split a string into an array with various delimiters.

Your problem is that you're trying to do browser detection from the user agent string.

For every programming problem you have, ask yourself "Is this something that others might already have had, and that I might take advantage of their solutions?"

If so, then try Googling for the answer. In this case, I Googled for "php parse user agent". That search led me to this page on StackOverflow which led me to this function that is built in to PHP itself.

Community
  • 1
  • 1
Andy Lester
  • 91,102
  • 13
  • 100
  • 152