3

My .srt file content like follows:

1<br>
00:00:00,000 --> 00:00:01,000 <br>
This is the first line: <br>
and it has a secondary line,<br>
it may have more lines

2<br>
00:00:01,000 --> 00:00:02,000<br>
This is the second line<br>
it may have more lines<br>

3<br>
00:00:02,000 --> 00:00:03,000<br>
This is the last line<br>
and it has a secondary line too,<br>
it may have more lines

I am using scanner but its not getting parsed properly as following:

var indexString: NSString?
scanner.scanUpToCharacters(from: CharacterSet.newlines, into: &indexString)
var startTimeString: NSString?
scanner.scanUpTo(" --> ", into: &startTimeString)
scanner.scanString("-->", into: nil)



var endTimeString: NSString?
scanner.scanUpToCharacters(from: CharacterSet.newlines, into: &endTimeString)



var textString: NSString?
scanner.scanUpTo("\n", into: &textString)
if textString != nil {
    textString = (textString?.replacingOccurrences(of: "\r\n", with: " "))! as NSString
    textString = (textString?.trimmingCharacters(in: CharacterSet.whitespaces))! as NSString
}
Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
Rama Mahapatra
  • 81
  • 1
  • 1
  • 7

1 Answers1

4

Consider using simple regex:

let pattern = "(?<index>^\\d+$)\\n^(?<startTime>\\d\\d:[0-5]\\d:[0-5]\\d,\\d{1,3}) --> (?<endTime>\\d\\d:[0-5]\\d:[0-5]\\d,\\d{1,3})$\\n(?<text>(?:^.+$\\n?)+)"  

let regex = try NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)  
let matches = regex.matches(in: srt, range: NSRange(..<srt.endIndex, in: srt))
let firstTextRange = matches[0].range(withName: "text")
let firstText = Range(firstTextRange, in: srt).flatMap { range in String(srt[range]) }

I recommend to cache regular expression.

Stimorol
  • 199
  • 10
  • Thank you Stimorol. If I am not wrong entryPattern is the pattern (regular expression specified) and srt is the srt file contet? – Rama Mahapatra May 31 '18 at 10:03
  • @RamaChandraMahapatra Yes, exactly. Fixed – Stimorol May 31 '18 at 10:45
  • Yes, Fixed, but 'range(withName:)' is only available on iOS 11.0 or newer. What can be substituted for lower versions? And also having problem when special characters in the srt file content. :) – Rama Mahapatra May 31 '18 at 11:26
  • @RamaChandraMahapatra Use range(at:) with 1, 2, 3 and 4 respectively. What problem you have with special characters? – Stimorol May 31 '18 at 11:46
  • Okay let me try, The pattern fails to match when special characters ( like " ) are in srt content. – Rama Mahapatra May 31 '18 at 11:48
  • Thank you so much again, it helped me a lot, finally I parsed all types of srt and vtt files successfully :) – Rama Mahapatra May 31 '18 at 18:01
  • @RamaChandraMahapatra No prob. But accepting or voting up answer are more designated forms of thanks here ;) Comments are up for details clarification – Stimorol May 31 '18 at 19:00