4

Possible Duplicate:
Splitting string array based upon digits in php?

I have a set of data that's all in one big chunk of text. It looks similar to the following;

01/02 10:45:01 test data 01/03 11:52:09 test data 01/04 18:63:05 test data 01/04 21:12:09 test data 01/04 13:10:07 test data 01/05 07:08:09 test data 01/05 10:07:08 test data 01/05 08:00:09 test data 01/06 11:01:09 test data

I'm trying to simply make this readable (see below for example), but the only thing on each of the lines that's remotely similar is that the start follows a 00/00 pattern.

01/02 10:45:01 test data 
01/03 11:52:09 test data 
01/04 18:63:05 test data 
01/04 21:12:09 test data 
01/04 13:10:07 test data 
01/05 07:08:09 test data 
01/05 10:07:08 test data 
01/05 08:00:09 test data 
01/06 11:01:09 test data 

I've gotten as far as splitting it out by matching it to a regex pattern;

$split = preg_split("/\d+\\/\d+ /", $contents, -1, PREG_SPLIT_NO_EMPTY);

And this outputs;

Array ( [0] => 
        [1] => 10:45:01 test data 
        [2] => 11:52:09 test data 
        [3] => 18:63:05 test data 
        [4] => 18:63:05 test data 
        ...and so on

But as you can see the problem is that preg_split isn't keeping the delimeter. I've tried changing the preg_split to;

$split = preg_split("/\d+\\/\d+ /", $contents, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE

However this returns the same as above, with no 00/00 at the start of the line.

Have I done something wrong or is their a better way of achieving this?

Community
  • 1
  • 1
user1942651
  • 57
  • 1
  • 6
  • are you sure there is no linefeeds in that string? Might just be your OS not handling them properly. If there really is no newline char, insert it with preg_replace before your match. – Gordon Jan 02 '13 at 11:55
  • Wrap the split pattern in a [lookbehind](http://www.regular-expressions.info/lookaround.html) and the full match will be included in the split. – DaveRandom Jan 02 '13 at 11:57
  • 2
    @DaveRandom a lookahead probably makes more sense here. – salathe Jan 02 '13 at 11:59

2 Answers2

4

You can tell preg_split() to split at any point in the string which is followed by digits-slash-digits by using a lookahead assertion.

$result = preg_split('#(?=\d+/\d+)#', $contents, -1, PREG_SPLIT_NO_EMPTY);

The PREG_SPLIT_NO_EMPTY flag is used because the very start of the string is also a point where there are three digits, so an empty split happens here. We could alter the regex to not split at the very start of the string but that would make it a little more difficult to understand at-a-glance, whereas the flag is very clear.

salathe
  • 51,324
  • 12
  • 104
  • 132
  • I had to play with it and think about it for a second but yes, lookahead makes more sense than lookbehind. +1 – DaveRandom Jan 02 '13 at 12:03
  • Answer shamelessly copied and pasted from yesterday's http://stackoverflow.com/questions/14102235/splitting-string-array-based-upon-digits-in-php/14102347#14102347 – salathe Jan 02 '13 at 12:03
  • I think stealing your own answers is acceptable :-) – DaveRandom Jan 02 '13 at 12:03
  • Thank you very much, I need to learn more about regular expressions. – user1942651 Jan 02 '13 at 12:11
  • Actually, you needn't bother with those modifiers. Just match the spaces preceding the date as well as the date itself. Be sure and do that outside the lookahead, and use `+` instead of `*`. `'%[ ]+(?=\d+/\d+)%'`. (The square brackets aren't really needed; I just think it's more readable that way.) – Alan Moore Jan 02 '13 at 13:41
  • @AlanMoore what modifiers?.. Either way, the above does what the OP wanted. – salathe Jan 02 '13 at 14:29
  • I was referring to things like `PREG_SPLIT_DELIM_CAPTURE` and `PREG_SPLIT_NO_EMPTY` and other such-like. To me, that had always looked like you were using a known-bad regex and working around its deficiencies with program logic without at least *trying* to fix the regex. But yes, your solution is good enough; that's why I didn't post an answer of my own. – Alan Moore Jan 02 '13 at 16:30
  • I don't capture the "delimiters" so I don't know what you're referring too there. As for the NO_EMPTY flag, explained above, I used it to keep the regex simple(r). Of course the regex could be altered to negate the need for NO_EMPTY, but I'd rather not subject the OP to that when they are clearly struggling. I took what the OP tried and made it work. KISS. – salathe Jan 02 '13 at 16:41
2

PHP:

<?php

$text = '01/02 10:45:01 test data 01/03 11:52:09 test data 01/04 18:63:05 test data 01/04 21:12:09 test data 01/04 13:10:07 test data 01/05 07:08:09 test data 01/05 10:07:08 test data 01/05 08:00:09 test data 01/06 11:01:09 test data';

$text = preg_replace('/(\d{2})\/(\d{2})(.*)/U', PHP_EOL . "$0", $text);

echo $text;

Output:

01/02 10:45:01 test data 
01/03 11:52:09 test data 
01/04 18:63:05 test data 
01/04 21:12:09 test data 
01/04 13:10:07 test data 
01/05 07:08:09 test data 
01/05 10:07:08 test data 
01/05 08:00:09 test data 
01/06 11:01:09 test data

Demo

Wojciech Zylinski
  • 1,995
  • 13
  • 19
  • 1
    Thanks! I didn't know I could do that. I edited my answer. EDIT: Actually, I feel stupid now. Guess I need to get some sleep. – Wojciech Zylinski Jan 02 '13 at 12:13
  • if you replace `PHP_EOL . "$0"` with `"\n$3"`, he gets the output he wanted – P1nGu1n Jan 02 '13 at 12:14
  • Nope, he wanted output WITH 01/02, 01/03 etc. – Wojciech Zylinski Jan 02 '13 at 12:22
  • Just FYI, that third group is never going to capture anything. The `/U` modifier makes the `*` non-greedy, and it never makes sense to use a non-greedy quantifier as the last thing in a regex. It will always start out by consuming the minimum number of characters it's allowed to, and there's nothing after it to force it to take more. – Alan Moore Jan 02 '13 at 16:48