6

I've a .po file I need to copy msgid value into msgstr value if msgstr is empty.

For example

msgid "Hello"
msgstr ""

msgid "Dog"
msgstr "Cane"

Should become

msgid "Hello"
msgstr "Hello"

msgid "Dog"
msgstr "Cane"

Currently, for testing purpose, I'm working with another file, but final script will works inline.

#!/bin/bash
rm it2.po
sed $'s/^msgid.*/&\\\n---&/' it.po > it2.po
sed -i '/^msgstr/d' it2.po
sed -i 's/^---msgid/msgstr/' it2.po

This script has 2 problems (at least):

  1. copies msgid into msgstr also when msgstr is not empty;
  2. I'm pretty sure that exist a single line or a more elegant solution.

Any help would be appreciated. Thanks in advance.

assistbss
  • 527
  • 7
  • 25
  • 1
    `sed -i -E '/^msgid ".*"$/{N;s/^(msgid "(.*)"\nmsgstr )""$/\1"\2"/}' file` – Wiktor Stribiżew Jun 01 '21 at 07:46
  • 2
    The "doesn't work for lines longer than 70 characters" problem is probably not reproducible. `sed` out of the box doesn't do anything like that, though some very old `sed` implementations might have a maximum line length (though probably significantly longer). – tripleee Jun 01 '21 at 08:13
  • @tripleee you are right, the problem wasn't related to sed. It was due to xgettext, msginit, and msgmerge commands called before sed. The question has been updated – assistbss Jun 01 '21 at 10:59
  • Don't worry about `final script will works inline` - that `-i` in any command (sed, perl, ruby, gawk, whatever) is just syntactic sugar that doesn't really do inline editing, it uses a temp file behind the scenes. You can just as easily do `tmp=$(mktemp) && sed 's/old/new/' file > "$tmp" && mv "$tmp" file` if your sed or any other command doesn't have a `-i` for pseudo-inplace editing. – Ed Morton Jun 01 '21 at 11:38

7 Answers7

5

You may consider better tool gnu awk instead of sed:

awk -i inplace -v FPAT='"[^"]*"|\\S+' '$id != "" && $1 == "msgstr" && (NF==1 || $2 == "\"\"") {$2=id} $1 == "msgid" {id=$2} 1' file

msgid "Hello"
msgstr "Hello"

msgid "Dog"
msgstr "Cane"

-v FPAT='"[^"]*"|\\S+' makes a quoted string or any non-whitespace field an individual field.

A more readable form:

awk -i inplace -v FPAT='"[^"]*"|\\S+' '
$id != "" && $1 == "msgstr" && (NF==1 || $2 == "\"\"") {$2=id}
$1 == "msgid" {id=$2}
1' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
4

This might work for you (GNU sed):

sed -E 'N;s/(msgid "(.*)".*msgstr )""/\1"\2"/;P;D' file

Open a two line window and if the first line contains msgid and the second msgstr "", replace the msgstr value by the msgid value. Print/delete the first line and repeat.

potong
  • 55,640
  • 6
  • 51
  • 83
4

Since the structure of the input file is so simple and consistent, I think the following should be enough (it works with the 3 examples you've provided):

sed -zE 's/(msgid "([^"]+)"\nmsgstr ")"/\1\2"/g' your_file
  • -z makes the file be a long string of input with embedded \ns, so we don't need commands like N, D, or others, because the whole file is already in the pattern space;
  • -E lets us use (, ), and + instead of \(, \), and \+ (and also other similar things)
  • the outermost () captures msgid "Hello"\nmsgstr " (the closing " is matched but not captured);
  • the innermost () captures the first double-quoted string;
  • \1\2" concatenates the matched text (except the final ", as I noted above), with the text between the first two "s, and a closing ",
  • the flag g will apply the substitution across the whole file.

If the leading strings are not that important (e.g. they are always the same, and the lines always appear as msgid followed by msgstr), you can squeeze the command above a bit more:

sed -zE 's/(([^"]+)"\n[^\n]*")"/\1\2"/g' your_file
Enlico
  • 23,259
  • 6
  • 48
  • 102
3

You can use the hold space:

sed '
    /^msgid[\t ]*/ {
        p
        s///
        x
        d
    }
    /^msgstr[\t ]*""/ {
        x
        s/^/msgstr /
    }
' <in.po >out.po
  • if line starts with msgid
    • print it
    • delete the keyword
    • save string to hold
    • go to next line
  • else if lines starts with msgstr and has empty value
    • retrieve string from hold
    • prepend the keyword
  • implicit print
jhnc
  • 11,310
  • 1
  • 9
  • 26
3

Here's a simple sed script which keeps the latest msgid in the hold space (h) then brings it back (x) and changes it to msgstr if it sees an empty msgstr.

sed -e '/^msgid "/h' -e '/^msgstr ""/!b' \
    -e x -e 's/^msgid/msgstr/' it.po >it2.po

Notice also how you would typically combine multiple sed statements with -e rather than create a new file and then repeatedly run sed -i on it. sed is a scripting language; learn it if you want to use it.

(Some sed variants don't tolerate this arrangement; maybe combine the script into a single string with semicolons between the statements if you have trouble with this one.)

Having said that, sed is very much a write-only language. Perhaps you'd be better off with a simple Awk (or Python, or etc) solution.

awk '/^msgid "/ { s=$0; sub(/^msgid/, "", s) }
    /^msgstr ""/ { $0 = $1 s } 1' it.po >it2.po
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • or even `sed '/^msgid/h; /^msgstr ""/{x;s/id/str/}'` – jhnc Jun 01 '21 at 08:03
  • Yeah, I tend to avoid braces because different dialects have slightly different rules for how to combine them with other statements. Your formulation gives me "bad flag in substitute command" on macOS. Adding a semicolon before `}` fixes that but ... what I said. – tripleee Jun 01 '21 at 08:04
3

With GNU awk and shown samples only, we could try following.

awk -v RS='"[^"]*"|\n+' '
RT=="\n"{ next }
$0~/^msgstr/{
  if(RT=="\"\""){ $0=$0 val }
  else          { $0=$0 RT  }
}
$0~/^msgid/     { val=RT
                  $0=$0 RT  }
RT
'  Input_file


2nd solution: A slight different from above solution, above will take only 1 or 2 occurrences of " but this will work till new line comes from 1st occurrence of " in a line then following will help, again written and tested with shown samples.

awk  -v RS='"[^\n]*|\n+' '
RT=="\n"{ next }
$0~/^msgstr/{
  if(RT=="\"\""){ $0=$0 val }
  else          { $0=$0 RT  }
}
$0~/^msgid/     { val=RT
                  $0=$0 RT  }
RT
'  Input_file

Explanation: Adding detailed explanation for above.

awk  -v RS='"[^"]*"|\n+' '    ##Starting awk program from here and setting record separator as " till " comes or new lines.
RT=="\n"{ next }              ##If RT is newline then take cursor to next line.
$0~/^msgstr/{                 ##Checking if line starts from msgstr then:
  if(RT=="\"\""){ $0=$0 val } ##Checking if RT us "" then add val to current line.
  else          { $0=$0 RT  } ##Else simply add RT.
}
$0~/^msgid/     { val=RT      ##Checking if line starts from msgid then make val to RT
                  $0=$0 RT  } ##Adding RT to $0.
RT                            ##Printing line if RT is not null.
' Input_file                  ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Keep it simple and use awk, e.g. using any awk in any shell on every Unix box:

$ awk '$2~/""/{$2=p} {p=$2} 1' it.po
msgid "Hello"
msgstr "Hello"

msgid "Dog"
msgstr "Cane"

If that isn't all you need then edit your question to provide more comprehensive sample input/output including cases that that doesn't work for.

Since you have GNU sed for -i you also have or can install GNU awk for -i inplace if you want "inplace" editing, or just do tmp=$(mktemp) && awk 'script' file > "$tmp" && mv "$tmp" file like you would for any other command.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185