1

I'm trying to make a program to repair corrupted subtitle .srt files. After doing some research, I figured out some websites which repair files with this issue are converting it to UTF-8, like subtitletools.com.

I used the method I found in this link: Storing UTF-8 string in a UnicodeString to convert the text I loaded from my .srt file. Many pages suggest this way for converting. But I didn't succeed at last. What am I doing wrong?

This is my code for converting:

procedure Tfrm_main.btn_convertClick(Sender: TObject);
var
  UnicodeStr: UnicodeString;
  UTF8Str: RawByteString;
begin    
  UTF8Str := UTF8Encode(memo_source.Text);
  SetCodePage(UTF8Str, 0, False);
  UnicodeStr := UTF8Str;    
  memo_result.Text := UnicodeStr;    
end;

The program works (the result is weird yet, but had a little difference):

Here is a screenshot of my program trying to convert a .srt file to UTF-8:

screenshot of my program trying to convert a <code>.srt</code> file to UTF-8

Armin Taghavizad
  • 1,625
  • 7
  • 35
  • 57
  • This doesn't look right at all. Can you provide an example of the input file. – David Heffernan Jan 28 '20 at 16:20
  • @DavidHeffernan I uploaded a sample file to my [One Drive](https://1drv.ms/u/s!AhlEAOF8Jw2JjkgdXgw_RsmIVKzR?e=A8AKwr), you can download it – Armin Taghavizad Jan 28 '20 at 16:32
  • While this isn't desired here on SO in this case I will go and point you to a third party tool. Why? Because there exists a open-source program called [Subtitle Workshop](http://subworkshop.sourceforge.net/index.php) which is written in Delphi 7 and can do what you want among tons of other features. So what is Subtitle Workshop? It is a fully featured subtitle editor with tons of features like video preview capabilities which is useful for synchronizing subtitles with specific video. It allows easy subtitle translation from one language to another by showing two subtitles simultaneously. ... – SilverWarior Jan 28 '20 at 16:38
  • ... It even allows you to write a pascal script to do some automated processing so something like converting a bunch of subtitles from one format to anther is a breeze. Oh and it supports pretty much any subtitle format out there. – SilverWarior Jan 28 '20 at 16:38
  • I have used Subtitle Workshop many times and it never failed me. And since it is an open source you could download its code and check how certain things are done. – SilverWarior Jan 28 '20 at 16:40
  • I don't think you have correctly diagnosed what is wrong with this file. I think you have guessed that it is something to do with UTF8 but don't know what is wrong for sure. I'd want to understand where the file came from, what wrote it, etc. – David Heffernan Jan 28 '20 at 16:46
  • @SilverWarior I'm familiar with Subtitle Workshop, but that's not the point. There are many reasons programmers make new programs everyday. I'm trying to learn and make my program, my way, This is how the programming and **developing** means and works. Btw, I think *third party tools* are not allowed in SO. – Armin Taghavizad Jan 28 '20 at 16:47
  • If it wouldn't be an open-source program written in Delphi I wouldn't point you to it. But since it is you can go and check it source code and learn from it. And that is why I mentioned it. Besides it would be nice if someone could go and port Subtitle Workshop to a modern Delphi and then make it compatible across all platforms that Delphi supports. – SilverWarior Jan 28 '20 at 16:57
  • @DavidHeffernan I provided another [file](https://1drv.ms/u/s!AhlEAOF8Jw2Jjkn8MAqpvVw_k0f5?e=WksoQk) with some screenshots, first screenshot is subtitle file before fixing by [subtitletools.com](https://subtitletools.com/convert-text-files-to-utf8-online) opened in **Notepad++**, second screenshot is after fixing so you can compare and third screenshot is same `.srt` (subtitle for the Fight Club movie, and `.srt` file is included in .zip file I shared the link) file opened and converted in my program. – Armin Taghavizad Jan 28 '20 at 17:14
  • @DavidHeffernan The .srt files are made by subtitle creator and editor programs, but problem occurs when you have a text contained file in utf format, often will be corrupted from one machine to another. so such websites and programs are used to fix it. Language of file is Persian. I diagnosed the problem for UTF Convert because all sites which are fixing the files are using **Convert to UTF-8** as title, maybe I'm wrong. – Armin Taghavizad Jan 28 '20 at 17:20
  • It sounds like you actually need to work out precisely what is going on. This is really vague. You seem to be saying "I do some stuff, and then something happens". Precision is needed. – David Heffernan Jan 28 '20 at 18:00
  • Nevermind links to off-site files on onedrive - this needs a [mcve]. Take the subtitle file and reduce it down to a short section that can be posted as text here in a question [edit]. Make sure it reproduces the problem. – J... Jan 28 '20 at 18:25
  • 1
    Are you sure that the source is corrupt (whatever that means) and not just that it's some ANSI text displayed using CP1252 instead of CP1256? – Sertac Akyuz Jan 28 '20 at 18:37
  • @SertacAkyuz I'm searching around the net yet and lately it seems it can be what you are pointing at (if I'm understanding what you mean correctly, you are referring to Codepage, right?) What can I do in that case? – Armin Taghavizad Jan 28 '20 at 18:57
  • 1
    Read the file into a byte array and convert it to a string using TEncoding. – David Heffernan Jan 28 '20 at 19:42
  • 1
    `memo_source.Lines.LoadFromFile('....', TEncoding.GetEncoding(1256));` I guess that's it. – Sertac Akyuz Jan 28 '20 at 19:50
  • @SertacAkyuz. it is it...! I really searched a lot, I can't believe it is achieved with one line of code. Great thanks, you really helped. Would you please post it as answer so I can accept it? Thanks – Armin Taghavizad Jan 28 '20 at 20:15
  • 1
    @Armin - You're welcome! Hard to search on the correct thing when the problem hasn't been identified correctly, as David has mentioned a few times already. Please go on for an answer yourself if you like. – Sertac Akyuz Jan 28 '20 at 20:54
  • 3
    @Sertac's quick example leaks the encoding instance. It's fine for an example in a comment but you will need to take a little time to get to know the encoding class. – David Heffernan Jan 28 '20 at 22:02

0 Answers0