Matching and merging text files programmatically

Question

I would like to merge an EDL (edit decision list) text file with another text file that contains subtitles. The EDL is generated from a video editing program, Final Cut Pro, while the text file is just regular text. While this particular request is for a specific end-use, I would like to learn the general method one can follow for doing this sort of processing. I am familiar with Python but am quite ok with examples in other languages so long as they are clear and easy to utilize on a UNIX/Mac workstation.

Here is an example of the first several lines of the EDL file:

TITLE: SAMPLE EDL
FCM: NON-DROP FRAME

001  GEN      V     C        00:01:03:16 00:01:04:29 01:00:03:06 01:00:04:19  
* FROM CLIP NAME:  TITLE 3D
* COMMENT: 
* FROM CLIP IS A GENERATOR

002  GEN      V     C        00:01:04:15 00:01:08:03 01:00:04:29 01:00:08:17  
* FROM CLIP NAME:  TITLE 3D
* COMMENT: 
* FROM CLIP IS A GENERATOR

003  GEN      V     C        00:01:04:15 00:01:09:05 01:00:10:19 01:00:15:09  
* FROM CLIP NAME:  TITLE 3D
* COMMENT: 
* FROM CLIP IS A GENERATOR

004  GEN      V     C        00:01:04:15 00:01:07:03 01:00:17:17 01:00:20:05  
* FROM CLIP NAME:  TITLE 3D
* COMMENT: 
* FROM CLIP IS A GENERATOR

Here is an example of the four "companion" lines from the subtitle text file:

001

If we think about climate change,

002

most of society's focused on fossil fuel combustion.

003

But what humans release on an annual basis is just one part of the carbon cycle.

004

Carbon dioxide concentrations also go up and down

Last, here is an example of the desired end result:

[00:00:03.06]
If we think about climate change,
[00:00:04.19]

[00:00:04.29]
most of society's focused on fossil fuel combustion.
[00:00:08.17]

[00:00:10.19]
But what humans release on an annual basis is just one part of the carbon cycle.
[00:00:15.09]

[00:00:17.17]
Carbon dioxide concentrations also go up and down
[00:00:20.05]

Looking at the example EDL file, the important bits of text are:

The line number i.e. 001 002 003 ...

The third and fourth columns of timecode numbers i.e.

01:00:03:06 01:00:04:19
01:00:04:29 01:00:08:17
01:00:10:19 01:00:15:09

From the subtitle text file, the line number corresponds with the line number in the EDL file. This is a one-to-one match with no offsets or gaps in the sequence. Each line of text should go into the desired end result as an entire line without line breaks.

The end result essentially sandwiches each numbered line of subtitle text between the first and second timecode numbers. The timecode numbers also need to be reformatted slightly by:

Surrounding each set in square brackets i.e. []
Making sure that the first set of numbers (the hours) are zeroed out i.e. 01:00:03:06 becomes 00:00:03:06 and 07:06:15:22 becomes 00:06:15:22
The last colon ':' (prior to the frame number) gets converted into a period '.' i.e. 00:00:03:06 becomes 00:00:03.06

And that is pretty much it. There can be about 100 to 120 lines of text in the subtitle text file and correspondingly 100 to 120 'decisions' in the EDL text file. If any further explanation is needed, please just ask. The main problem I am having is finding out how to start off on this even. While I can wrap my head around manipulating a single line of text within a single file programmatically, I'm a bit flummoxed as to how to manage many lines between multiple files.

Thanks in advance all.

consider https://pypi.python.org/pypi/edl – lofidevops Jan 12 '16 at 15:44 — lofidevops, Jan 12 '16 at 15:44

score 2 · Answer 1 · answered Jan 23 '11 at 21:02

2

Roughly this should be the plan.

Read the files
Make parser for each type of files
Store the data in useful data structures/objects
Output in appropriate format

Break up each step until, it is just a matter of writing down in code. Test at each step.

answered Jan 23 '11 at 21:02

Navi

8,580
4
34
32

Thank you Navi for your feedback. I think the very act of forming my question for here helped a bit. That said, some examples would be useful in helping to get a sense of a good method of attacking this. While the question itself is quite specific, I think that the general concepts that would be utilized to solve this could be of use to many. – durandal Jan 23 '11 at 23:32

score 0 · Answer 2 · answered Dec 14 '11 at 17:55

This is a 1 to 1 matching from 1 file to the other. Parse each file into a list of useful tokens.

One list will have the start and end time, the other will have the subtitles.

(start, end time):

01:00:03:06 01:00:04:19 -> 01:00:04:29 01:00:08:17 -> 01:00:10:19 01:00:15:09 -> ...

the other file will have:

"If we think about climate change," -> "most of .. fuel combustion" -> "But what .. carbon cycle" -> ..

Now loop through both files and merge the 2 lists (1 to 1) (possibly creating a new list). At the end write the new list to file.

Matching and merging text files programmatically

2 Answers2