3

I have a file containing a lot of SQL statements, such as:

CREATE TABLE "USER" (
    "ID" INTEGER PRIMARY KEY,
    "NAME" CHARACTER VARYING(50) NOT NULL,
    "AGE" INTEGER NOT NULL
);

COPY "USER" (id, name, age) FROM stdin;
1   Skywalker   19
2   Kenobi      57

I want the column names in the COPY statements to be uppercased and quoted:

COPY "USER" ("ID", "NAME", "AGE") FROM stdin;

Using sed, I found the following regexp:

sed -r 's/([( ])(\w+)([,)])/\1"\U\2\E"\3/g'

It does replace the column names, but it is not selective enough, and replaces other words in the file:

~/test]$sed -r 's/([( ])(\w+)([,)])/\1"\U\2\E"\3/g' star_wars_example
CREATE TABLE "USER" (
  "ID" INTEGER PRIMARY "KEY",
  "NAME" CHARACTER VARYING("50")NOT "NULL",
  "AGE" INTEGER NOT NULL
);

COPY "USER" ("ID", "NAME", "AGE") FROM stdin;
1   Skywalker   19
2   Kenobi      57

To avoid this problem, I want sed to only apply my regexp to the lines starting with COPY and ending with FROM stdin;.

I have looked into lookahead / lookbehind, but they are not supported in sed. They seem to be supported in super-sed, but I am currently using Cygwin (Windows is mandatory here...) and it does not seem available in the package list.

Is there a way to force sed to only consider specific line?

I've considered piping my file through grep before applying sed, but other lines will then disappear from the output.

Am I missing something obvious?

It would be great if the answer was easily applicable on a default Cygwin install. I guess I could try installing super-sed on cygwin, but I'd like to know if there are more obvious ideas

Etienne Neveu
  • 12,604
  • 9
  • 36
  • 59
  • 1
    Given your example, replacing the \w+ with [a-z]+ would work but it probably still will be not selective enough for everything. – Lieven Keersmaekers May 27 '10 at 11:22
  • You are right on both case! I gave a simplified example, but my real fear is to accidentally replace some of the data read by the COPY statements, since there is a lot of data in there... – Etienne Neveu May 27 '10 at 11:40

1 Answers1

2

Since I have no sed available to me at the moment, and have never actually used grouping, this command may or may not work (at all, or as intended) =)

Try

sed -r '/^COPY /{ s/([( ])(\w+)([,)])/\1"\U\2\E"\3/g }'

If I understand the manual correctly, this will execute the substitution on any line starting with COPY.

Another approach would be to use branching. This would look a lot more complicated, but is more flexible.

Jens
  • 25,229
  • 9
  • 75
  • 117
  • Wow, thanks a lot! This worked flawlessly. While this solves the problem I had, I'm curious about the branching method: does it involve grepping the file, looking for "^COPY", and then "branching" the result either to sed or to the standard output? How would one do that? (maybe post it in another answer so people may vote on it too) – Etienne Neveu May 27 '10 at 12:50
  • 1
    @eneveu: Here you are. I edited this into my original answer. Untested, too, though. sed seems to be turing-complete. Scary. – Jens May 27 '10 at 12:56
  • @Jens: I get an error from your second example: "sed: -e expression #1, char 9: unknown command: 'C'" – Dennis Williamson May 28 '10 at 02:23
  • @Dennis: Yeah, I found it. Correcting it. – Jens May 28 '10 at 06:04
  • @Jens: Now it gives: "sed: -e expression #1, char 18: unknown option to 's'". Clue: most of the places you have spaces their should be semicolons instead. Second clue: your "branching" version is not doing anything that the other one is not except needlessly doing a null-net-effect replacement. The selector in your first example is much simpler and more direct. Third clue: Since your first replacement is anchored to the beginning of the line, the `g` option is useless (and unwanted). And fourth clue: you can do a `b` without a target label and it will branch to the end. – Dennis Williamson May 28 '10 at 10:04
  • @Dennis: Thanks. I guess writing these without being able to test them does not work out. I'll remove the branching part to avoid confusing any more people. =) – Jens May 28 '10 at 10:43