0

I have a multi-line text with time in MM:SS format with the subtitle line from the video. I want to convert the MM:SS format to ass format, i.e. 00:MM:SS,000 and output with spaced tabs. I wrote this code

text = """02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research."""
for line in text.splitlines():
    words_in_line = line.split('\t')
    for word in words_in_line:
        if ":" in word:
                ass= "00:"+word +",000"
                final_line = line.replace(word,ass)
                print(final_line)

it converts the format, but it only converts one of the times in each line, then the other on a separate line, giving an output like this

00:02:42,000    02:47   And so that Wayne Gretzky method for sort of going into the future and
02:42   00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    02:51   imagining what that future might look like, again, is a good idea for research.
02:47   00:02:51,000    imagining what that future might look like, again, is a good idea for research.

How can I change the code to get an output like this?

00:02:42,000    00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    00:02:51,000    imagining what that future might look like, again, is a good idea for research.
John Aiton
  • 85
  • 6

2 Answers2

1

Use regex sub for search and replace, \\1 corresponds to the part in brackets.

import re
text = """02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research."""
print(re.sub('(\d\d:\d\d)', '00:\\1,000', text))

You might further specify the regex, e.g. with

print(re.sub('^(\d\d:\d\d)\t(\d\d:\d\d)', '00:\\1,000   00:\\2,000', text))

to avoid wrong substitutions. Check regex101.com to find the matching one for your data.

araisch
  • 1,727
  • 4
  • 15
  • 1
    Here's hoping the subtitle text doesn't contain a time :-) Also, this will happily convert any existing `00:02:51,000` to something like `00:00:,00002:51,000`... – AKX Jan 12 '22 at 08:39
  • It's the nucleus for solving the scenario posted. Feel free to add 200 lines of code respecting anything that could might happen somewhere someday. – araisch Jan 12 '22 at 09:02
  • It doesn't take 200 lines to code to only handle the two leading fields of a line. :) – AKX Jan 12 '22 at 09:06
  • There you are. Now it is in your responsibility destroying his chance to optimize and learn himself .. :-) – araisch Jan 12 '22 at 09:22
1

Something like this seems to do the trick:

text = """
02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research.
"""


def convert_time(t):
    return f"00:{t},000"


for line in text.splitlines():
    try:
        start, end, text = line.split(None, 2)
    except ValueError:  # if the line is out of spec, just print it
        print(line)
        continue
    start = convert_time(start)
    end = convert_time(end)
    print(start, end, text, sep="\t")

The output is

00:02:42,000    00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    00:02:51,000    imagining what that future might look like, again, is a good idea for research.
AKX
  • 152,115
  • 15
  • 115
  • 172