2

I need to parse some srt files and I am looking for a regex (for JAVA) that matches the times sections. What I want is to read the file line by line and if the lines are numbers or times sections skip them.

Example, given:

1
00:00:01,357 --> 00:00:03,323
You took this case
without running it by me.

2
00:00:03,359 --> 00:00:04,825
- Jessica--
- That's enough. Dump it.

I want to match the lines

00:00:03,359 --> 00:00:04,825

and

2

Thanks in advance!

Santiago
  • 379
  • 3
  • 14

3 Answers3

2

Match number:

^\d+$

Match time

^\d{2}:\d{2}:\d{2},\d{3}.*\d{2}:\d{2}:\d{2},\d{3}$

For both condition

(^\d+$)|(^\d{2}:\d{2}:\d{2},\d{3}.*\d{2}:\d{2}:\d{2},\d{3}$)

As I see in your format, number is before time so you just have to use match time to get line index and re move by index-1 and index


More clear about the regex time

^\d{2}:\d{2}:\d{2},\d{3}.*\d{2}:\d{2}:\d{2},\d{3}$

Start

^

From the begin of the text

\d{2} or [0-9]{2}

Two digit only

: or :{1} or [:]{1}

One comma : only ...

, or ,{1} or [,]{1}

One comma , only

\d{3} or [0-9]{3}

Three digit only

.*

Every thing, have or not have value are ok

The past: again check time format

$

end of text

It means from to end of that text have to match that condition

HungPV
  • 489
  • 6
  • 19
  • 1
    Drop the `{1}` quantifiers, they are redundant. Since `\d` in Java matches `[0-9]`, you can shorten the time regex to `^\d{2}:\d{2}:\d{2},\d{3}.*\d{2}:\d{2}:\d{2},\d{3}$` – nhahtdh Jul 16 '15 at 12:34
  • As for `^([0-9]+)|([0-9]{2}[:]{1}[0-9]{2}[:]{1}[0-9]{2}[,]{1}[0-9]{3}.*[0-9]{2}[:]{1}[0-9]{2}[:]{1}[0-9]{2}[,]{1}[0-9]{3})$`, the regex is wrong, since it will try the first branch and grab only the hour in the timestamp. – nhahtdh Jul 16 '15 at 12:41
  • Thank you for the first comment (+1), the second comment I dont understand because I am not good at English but I tried it and it ran. – HungPV Jul 16 '15 at 14:38
  • It only works because you just fixed it following my comment. In the previous revision, you didn't specify $ in the first alternation. – nhahtdh Jul 16 '15 at 14:56
  • Oh yes sorry for my mistake, I tested it and saw. Thank you very much – HungPV Jul 16 '15 at 15:01
0

for first line 00:00:03,359 --> 00:00:04,825 or 00:00:01,357 --> 00:00:03,323 below code may be useful.

String strLine = "00:00:01,357 --> 00:00:03,323";
System.out.println(strLine.matches("\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d --> \\d\\d:\\d\\d:\\d\\d,\\d\\d\\d"));
Santanu Sahoo
  • 1,137
  • 11
  • 29
0

You could do this to take the ending time of each subtitle:

\d{2}:\d{2}:\d{2},\d{3}$

Regex live here.

Explaining:

\d{2}:      # a two-digits number followed by a ":" character
\d{2}:      # ""
\d{2},      # a two-digits number followed by a "," character
\d{3}       # a three-digits number
$           # matching only at ending lines