1

I'm trying to find and replace some special chars in a file encoded in ISO-8859-1, then write the result to a new file encoded in UTF-8:

package inv

class MigrationScript {

    static main(args) {
        new MigrationScript().doStuff();
    }

    void doStuff() {
        def dumpfile = "path to input file";
        def newfileP = "path to output file"

        def file = new File(dumpfile)
        def newfile = new File(newfileP)

        def x = [
            "þ":"ş",
            "ý":"ı",
            "Þ":"Ş",
            "ð":"ğ",
            "Ý":"İ",
            "Ð":"Ğ"
        ]

        def r = file.newReader("ISO-8859-1")
        def w = newfile.newWriter("UTF-8")

        r.eachLine{
            line ->

                x.each {
                    key, value ->
                    if(line.find(key)) println "found a special char!" 
                    line = line.replaceAll(key, value);
                }

                w << line + System.lineSeparator();
        }

        w.close()
    }
}

My input file content is:

"þ": "ý": "Þ":" "ð":" "Ý":" "Ð":"

Problem is my code never finds the specified characters. The groovy script file itself is encoded in UTF-8. I'm guessing that may be the cause of the problem, but then I can't encode it in ISO-8859-1 because then I can't write "Ş" "Ğ" etc in it.

Szymon Stepniak
  • 40,216
  • 10
  • 104
  • 131
uylmz
  • 1,480
  • 2
  • 23
  • 44

1 Answers1

1

I took your code sample, run it with an input file encoded with charset ISO-8859-1 and it worked as expected. Can you double check if your input file is actually encoded with ISO-8859-1? Here is what I did:

  1. I took file content from your question and saved it (using SublimeText) to a file /tmp/test.txt using Save -> Save with Encoding -> Western (ISO 8859-1)

  2. I checked file encoding with following Linux command:

    file -i /tmp/test.txt
    /tmp/test.txt: text/plain; charset=iso-8859-1
    
  3. I set up dumpfile variable with /tmp/test.txt file and newfile variable to /tmp/test_2.txt

  4. I run your code and I saw in the console:

    found a special char!
    found a special char!
    found a special char!
    found a special char!
    found a special char!
    found a special char!
    
  5. I checked encoding of the Groovy file in IntelliJ IDEA - it was UTF-8

  6. I checked encoding of the output file:

    file -i /tmp/test_2.txt
    /tmp/test_2.txt: text/plain; charset=utf-8
    
  7. I checked the content of the output file:

    cat /tmp/test_2.txt 
    "ş": "ı": "Ş":" "ğ":" "İ":" "Ğ":"
    

I don't think it matters, but I have used the most recent Groovy 2.4.13

I'm guessing that your input file is not encoded properly. Do double check what is the encoding of the file - when I save the same content but with UTF-8 encoding, your program does not work as expected and I don't see any found a special char! entry in the console. When I display contents of ISO-8859-1 file I see something like that:

cat /tmp/test.txt 
"�": "�": "�":" "�":" "�":" "�":"% 

If I save the same content with UTF-8, I see the readable content of the file:

cat /tmp/test.txt
"þ": "ý": "Þ":" "ð":" "Ý":" "Ð":"%  

Hope it helps in finding source of the problem.

Szymon Stepniak
  • 40,216
  • 10
  • 104
  • 131