0

While rewriting a legacy application, I did some mass replacing of foo by bar intermixed with many manual changes. Some replacements had to be undone manually and there were already many other bars in the original code.

Now, I see that each foo replaced by bar should actually be baz.

An example:

  • old file:
    a staying "foo" and a replaced "foo" and a kept "bar"
  • new file
    a staying "foo" and a replaced "bar" and a kept "bar"
  • wanted
    a staying "foo" and a replaced "baz" and a kept "bar"

The wanted action is simple: Fix every replacement of foo by bar to baz. I wonder if there's a simple way using git or any Linux tools.

Reformulation

Maybe this single sentence formulation is clearer:

Given two versions of a file, put baz in every place where the old version contains foo and the new version contains bar.

More details

There were actually three whole words replacements by words of differing lengths like

perl -pe 's/\babc\b/pqrs/gi; s/\bdefg\b/uvw/gi; s/\bhi\b/xyz/g'
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • You just want to change the commit message, and not the actual files? – Obsidian Age May 08 '18 at 00:29
  • @ObsidianAge No, I want to change the files. And there's no single commit and I actually don't care about the history... it was a big chaos till I got a compiling version. Let's say, I made a wrong mass replacement and I need to fix it, while keeping all manual changes. – maaartinus May 08 '18 at 00:40
  • To downvoters: Is there something unclear about the question? – maaartinus May 08 '18 at 01:44
  • check this out https://stackoverflow.com/questions/471183/linux-command-line-global-search-and-replace – dommmm May 08 '18 at 01:54
  • @dommmm That's what I did: `perl -pi -e 's!foo!bar!g`. That's what I should've done: `perl -pi -e 's!foo!baz!g`. In between, I did many manual changes. I can't do simply `perl -pi -e 's!bar!baz!g` now, as not every occurrence should be replaced. – maaartinus May 08 '18 at 02:10
  • @user770 "Find & Replace" some 100000 times? No thanks. I've tried some merging, but it's futile, leaving too many conflicts to resolve manually. – maaartinus May 09 '18 at 23:02
  • Is there a single commit in which the mass replace happened, or was the mass replace over multiple commits? –  May 10 '18 at 01:12
  • Is the replaced string the exact same length of the original string? I can't think of a good way to do this with *nix commands but I can write simple code to do this... – runwuf May 10 '18 at 01:29
  • If the length is same this may be much more easier, can you confirm if the length of all `foo`, `bar` and `baz` is same? – Tarun Lalwani May 10 '18 at 07:48
  • @swalladge There were many commits, but I don't need to keep the history, so I can made them to one. – maaartinus May 10 '18 at 14:10
  • @TarunLalwani No, the length differs. And it were actually three whole words replacements like `perl -pe 's/\babc\b/pqrs/gi; s/\bdefg\b/uvw/gi; s/\bhi\b/xyz/g'`. – maaartinus May 10 '18 at 14:13

2 Answers2

3

You can make use of the --word-diff=porcelain mode of git diff (along with a sufficiently large value passed to the -U option, in order to preserve all the context between changes) and process its output with a simple enough script that will correct the wrong replacement.

--word-diff[=<mode>] Show a word diff, using the <mode> to delimit changed words. By default, words are delimited by whitespace; see --word-diff-regex below. The <mode> defaults to plain, and must be one of:

  • ...
  • porcelain: Use a special line-based format intended for script consumption. Added/removed/unchanged runs are printed in the usual unified diff format, starting with a +/-/` ` character at the beginning of the line and extending to the end of the line. Newlines in the input are represented by a tilde ~ on a line of its own.

Below you will find a prototype sed-based implementation of the above approach.

Usage:

fix_wrong_replacements path revision replacement_fix

where

  • path is the (relative) path of the file in the working tree
  • revision is the revision since which the wrong replacements that must be fixed were made
  • replacement_fix is a string of the form

    /orig_pattern/incorrect_replacement_str/correct_replacement_str/

Effects:

Assuming that the working copy of the file at path when compared to its committed revision revision contains results of replacing certain instances of orig_pattern with incorrect_replacement_str, identifies those replacements and changes them to correct_replacement_str.

Examples:

# In last two commits (and, maybe, in the working copy) some "int"s
# were incorrectly changed to "unsigned", now change those to "long"
$myname main.c HEAD~2 /int/unsigned/long/

# In the working copy of somefile.txt all "abc" case-insensitive words
# were changed to "pqrs", now change them to "xyz"
$myname somefile.txt HEAD '/[aA][bB][cC]/pqrs/xyz/'

Known limitations/issues:

  • It works for a single file. To fix all wrong replacements in a commit, commit range or local changes, must identify the list of changed files and call this script in a loop for all of them.

  • If during the original (wrong) replacement case-insensitive mode was used, then the orig_pattern part of of the replacement_fix argument must use a [aA], [bB], etc, regex atom for each letter.

  • Replacements immediately adjacent to other changes aren't handled.

  • Sometimes a superfluous blank line may be added (because of a slight inconsistency in the output of git diff --word-diff)

fix_wrong_replacements:

#!/usr/bin/env bash

myname="$(basename "$0")"

if [ $# -ne 3 ]
then
    cat<<END
Usage:

    $myname <path> <revision> <replacement_fix>

where
    - <path> is the (relative) path of the file in the working tree
    - <revision> is the revision since which the wrong replacements that
      must be fixed were made
    - <replacement_fix> is a string of the form

        /orig_pattern/incorrect_replacement_str/correct_replacement_str/

Effects:

    Assuming that the working copy of the file at <path> when compared
    to its committed revision <revision> contains results of replacing
    certain instances of <orig_pattern> with <incorrect_replacement_str>,
    identifies those replacements and changes them to <correct_replacement_str>.


Examples:

    # In last two commits (and, maybe, in the working copy) some "int"s
    # were incorrectly changed to "unsigned", now change those to "long"
    $myname main.c HEAD~2 /int/unsigned/long/

    # In the working copy of somefile.txt all "abc" case-insensitive words
    # were changed to "pqrs", now change them to "xyz"
    $myname somefile.txt HEAD '/[aA][bB][cC]/pqrs/xyz/'
END
    exit 1
fi

file="$1"
revision="$2"
s=(${3//// })
orig_pattern="${s[0]}"
incorrect_replacement="${s[1]}"
correct_replacement="${s[2]}"

pat="-$orig_pattern\n+$incorrect_replacement"

git_word_diff()
{
    git diff -U100000                                       \
             --word-diff=porcelain                          \
             --word-diff-regex='[[:alpha:]][[:alnum:]]*'    \
             "$@"
}


word_diff_file="$(mktemp)"
trap "rm $word_diff_file" EXIT

git_word_diff "$revision" -- "$file" > "$word_diff_file"
sed -n -e '
    1,5 d;

    /^-/ N;
    /\n~$/ d;
    /\n[- ]/ D;

    /^'"$pat"'$/ {x;G;s/\n'"$pat"'$/'"$correct_replacement"'/;x;d;};
    /^-.*\n+/ {s/^-.*\n+//;H;x;s/\n//;x;d;};

    /^~$/ {s/.*//;x;p;d;};

    {s/^.//;H;x;s/\n//;x;};
' "$word_diff_file" > "$file"
Leon
  • 31,443
  • 4
  • 72
  • 97
-2

To replace foo : grep -rl 'foo' . | xargs sed -i 's/foo/bar/g'

To replace bar : grep -rl 'bar' . | xargs sed -i 's/bar/baz/g'

kavita
  • 417
  • 4
  • 8