5

I have bunch of strings like this:

a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc

And what I need to do is to split them up based on the hashtag position to something like this:

Array
(
    [0] => A
    [1] => AAX1AAY222
    [2] => B
    [3] => BBX4BBY555BBZ6
    [4] => C
    [5] => MMM1
    [6] => D
    [7] => ARA1
    [8] => E
    [9] => ABC
)

So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.

I've the following RegEx which works fine only when I have a numeric value in the end of each part.

Here is the RegEx set up:

preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

And it works fine with something like this:

C#mmm1D#ara1

But, if I change it to this (removing the numbers):

C#mmmD#ara

Then it will be the result, which is not good:

    Array
(
    [0] => C
    [1] => D
)

I've looked at this question and this one also, which are similar but none of them worked for me.

So, my question is why does it work only if it has followed by a number? and how I can solve it?

Here you can see some of them sample strings which I have:

a#123b#abcc#def456         // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8  // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789     // A:ABCDEF123, B:5, C:XYZ789

P.S. Strings are case-insensitive.

P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.

Why Not Using explode?

If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!

Update

Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.

Community
  • 1
  • 1
Mahdi
  • 9,247
  • 9
  • 53
  • 74
  • @senk I need to capture the character right before the # as well. – Mahdi May 16 '13 at 07:11
  • You could `explode()` and copy the last char from the previous array item. – Voitcus May 16 '13 at 07:14
  • @Voitcus could you try it and post it as an answer? I still can't figure out how you want to capture that character with explode. Thanks. – Mahdi May 16 '13 at 07:14
  • This is very confusing, can you set different separator, can you make the string something like this: "a#aax1aay222,b#bbx4bby555bbz6,c#mmm1,d#ara1,e#abc" – nacholibre May 16 '13 at 07:16
  • @nacholibre I can do it with some tricks of course; find the #, put a `,` on `-2` position, but honestly I don't like to do that ... – Mahdi May 16 '13 at 07:18

4 Answers4

6

Instead of using preg_split(), decide what you want to match instead:

  1. A set of "words" if followed by either <any-char># or <end-of-string>.

  2. A character if immediately followed by #.

    $str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
    
    preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
    

Demo

This expression uses two look-ahead assertions. The results are in $matches[0].

Update

Another way of looking at it would be this:

preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);

print_r(array_combine($matches[1], $matches[2]));

Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.

The output is this:

Array
(
    [a] => aax1aay222
    [b] => bbx4bby555bbz6
    [c] => mmm1
    [d] => ara1
    [e] => abc
)
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • that is perfect! ... thanks a lot for the help! Do you have any idea what was wrong with my regex? – Mahdi May 16 '13 at 07:27
  • `Jack`, by the way, I'm choosing `Marcus` answer as accepted because that's addressing the exact problem I was asking in the question, but thanks a lot again for your answer! – Mahdi May 16 '13 at 07:47
  • @Mahdi In most cases `preg_split()` can be turned into a `preg_match_all()` instead; in my own opinion, it makes it easier to follow the logic, even though a `preg_split()` is actually possible here. – Ja͢ck May 16 '13 at 07:47
  • you're right, but I meant because `Marcus` was exactly fixing my regex and it look much simpler I chose his answer as accepted. I think his solution is more efficient also. I still like the way you structure your regex, but I believe it's more fair if I go for his solution. – Mahdi May 16 '13 at 07:50
  • @Mahdi Efficiency should always be the last thing to look at, but Marcus's answer is perfectly fine :) – Ja͢ck May 16 '13 at 07:58
  • @Mahdi Considering a string that already starts with hashtag, Jack's answer and your own answer (with Marcus' help) give different results! In your own answer, the first item will then include the # – nl-x May 16 '13 at 08:10
  • Same caveat goes for two consecutive hash tags. – nl-x May 16 '13 at 08:16
  • @Mahdi I found another way that you may have overlooked; take a look at the update :) – Ja͢ck May 16 '13 at 08:24
  • @Jack Your update will break in both caveats I just mentioned. – nl-x May 16 '13 at 08:30
  • @nl-x My update returns an empty array both your "caveats" because those are not supposed to occur anyway, according to the question. – Ja͢ck May 16 '13 at 08:40
  • @nl-x thanks for spending time on that. Fortunately those are not the case, so I don't have any string starts with # or containing ## (which is great in my mind!) ... however, I'd still say the regex approach is much more readable. Thanks again for your efforts! – Mahdi May 16 '13 at 09:22
  • @Jack I really like the way you doing in the update, I'm really learning something here ... I wish I could up vote more than one. – Mahdi May 16 '13 at 09:25
  • @Mahdi You're welcome ... tell your friends to vote for you ;-) just kidding hehe – Ja͢ck May 16 '13 at 09:26
4

If you still want to use preg_split you can remove the + and it might work as expected:

'/([A-Z])#/i'

Since then you only match the hashtag and ONE alpha character before, and not all them.

Example: http://codepad.viper-7.com/z1kFDb

Edit: Added a case-insensitive flag i in the pattern.

Community
  • 1
  • 1
Marcus
  • 12,296
  • 5
  • 48
  • 66
  • wow, that's works pretty fine ... thanks for the explanation also! – Mahdi May 16 '13 at 07:43
  • @Mahdi You might want to use A-Za-z0-9 to get the lower/upper case working together (as in your example), and if you want to use numbers (as in your update) – nl-x May 16 '13 at 07:52
  • @Marcus update your answer so the answer's code matches the link's code – meze May 16 '13 at 07:53
1

Use explode() rather than Regexp

$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
    if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
    if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];

edit: I updated my answer to reflect better reading the questions

nl-x
  • 11,762
  • 7
  • 33
  • 61
0

You can use explode() function that will split the string except the hash signs, like stated in the answers given before.

$myArray = explode("#",$string);

For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like

$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);

All you need now is to take the last character of each string in array as another item.

$copy = array();
foreach($myArray as $item){
  $beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
  $ending = substr($item,-1); // this takes the last one
  $copy[] = $beginning;
  $copy[] = $ending;
} // end foreach

This is an example, not tested.

EDIT

Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.

Voitcus
  • 4,463
  • 4
  • 24
  • 40
  • @Jack Perhaps you need to concat the last part. This also inserts empty strings when `explode` returns a single-character, maybe they should be removed. – Voitcus May 16 '13 at 07:28
  • Thanks for the effort ... but you know, recently I'm trying to avoid tricks in programming, I believe there is always [at least] a proper solution for each problem. Check out the `Jack` answer as well :) – Mahdi May 16 '13 at 07:30
  • @Voitcus Thanks anyways for the effort :) – Mahdi May 16 '13 at 07:34