-1

I have a unicode subtitle file formatted like this :

3
00:01:40,200 --> 00:01:43,326
english part

4
00:01:43,534 --> 00:01:44,851
خط فارسی

5
00:01:45,063 --> 00:01:48,485
complex part مخلوط

6
00:01:45,063 --> 00:01:48,485
complex part مخلوط
in 2 lines

How can I extract numbers as key and the text as value

[
   [3] => english part
   [4] => خط فارسی
   [5] => complex part مخلوط
   [6] => complex part مخلوط</br>in 2 lines
]
mitra razmara
  • 745
  • 6
  • 10

1 Answers1

1

Don't use the found numbers as indices. Better use ongoing indices and key/value pairs instead.
That said, you could go for (enable multiline and verbose, m and x):

^(\d+)\R
[->\d: ,]+\R
((?:.+\R?)+)

See a demo on regex101.com.


In PHP this could be
<?php

$text = <<<END
3
00:01:40,200 --> 00:01:43,326
english part

4
00:01:43,534 --> 00:01:44,851
خط فارسی

5
00:01:45,063 --> 00:01:48,485
complex part مخلوط

6
00:01:45,063 --> 00:01:48,485
complex part مخلوط
in 2 lines
END;

$regex = <<<END
~
    ^(?P<line>\d+)\R
    [->\d: ,]+\R
    (?P<content>(?:.+\R?)+)
~mx
END;

preg_match_all($regex, $text, $matches);
print_r($matches);
?>

See another demo on ideone.com.

Jan
  • 42,290
  • 8
  • 54
  • 79