13

When a PostgreSQL pg_dump is done it inserts some comments for each element, as follows.

--
-- Name: my_table; Type: TABLE; Schema: account; Owner: user; Tablespace:
--

CREATE TABLE my_table(
    id integer
);

--
-- Name: my_seq; Type: SEQUENCE; Schema: account; Owner: user
--

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

Is it possible to force pg_dump to remove (exclude) them? I would like to receive just:

CREATE TABLE my_table(
    id integer
);

CREATE SEQUENCE my_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
Marcio Mazzucato
  • 8,841
  • 9
  • 64
  • 79

4 Answers4

19

On a UNIX type operating system, I would do it like this:

pg_dump [options] mydatabase | sed -e '/^--/d' >mydatabase.dmp

This could accidentally swallow data lines that start with --. To work around that problem, use the --inserts option of pg_dump.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • It works fine, many thanks! It replaced the comments to a blank line, is it possible to remove it? Not very important, but the file would be more clean. – Marcio Mazzucato Feb 01 '17 at 13:41
  • 2
    This will remove the line and *not* leave an empty line behind. The empty lines you see are empty in the original dump file. Use the `sed` command `/^$/d` to remove those. – Laurenz Albe Feb 01 '17 at 13:52
  • 2
    Note that with the default `pg_dump` output format, if any of your tables contains a row whose contents start with `--`, you will delete that data! (For example: `CREATE TABLE strs(s text); INSERT INTO strs(s) VALUES('-- hi');`.) – wchargin Jan 01 '22 at 08:35
  • 1
    @wchargin That is not true. Try it. – Laurenz Albe Jan 07 '22 at 07:38
  • @LaurenzAlbe: I did try it, with PostgreSQL version 12.8 installed from the Ubuntu repositories, and it does, in fact, delete that data from the dump. Demo: https://gist.github.com/wchargin/6fd1ded5c683bc2420eb59ad53aeb5b7 – wchargin Jan 08 '22 at 07:08
  • @wchargin In your code, you explicitly delete the line with `sed`, so it is no surprise if it doesn't show up at the destination. `psql` will have no problem restoring a value that srarts with `--`. – Laurenz Albe Jan 10 '22 at 07:39
  • 2
    @LaurenzAlbe: Yes, of course `psql` can restore a value that starts with `--`. The code that explicitly deletes the line with `sed` is from your answer to this question. My comment is pointing out that the `sed -e '/^--/d'` provided by your answer, which is *intended* to only delete SQL comments generated by `pg_dump`, can also delete actual data from the dump. – wchargin Jan 11 '22 at 04:35
  • 1
    @wchargin Ah, now I see. You are right, that is a flaw in my answer. You could work around it with the `--inserts` option of `pg_dump`. – Laurenz Albe Jan 11 '22 at 06:15
8

I've just submitted this patch for Postgres 11+ (still under consideration) that should allow one to dump without COMMENTS (until an ideal solution is in place), which should be a slightly better kludge than the ones we resort to using.

If there are enough voices, it may even get back-patched to Postgres 10!


[UPDATE]

This is now a feature in Postgres v11+

pg_dump --no-comments
Peter DeWeese
  • 18,141
  • 8
  • 79
  • 101
Robins Tharakan
  • 2,209
  • 19
  • 17
2

The are only 2 good reasons for removing comments from SQL:

  1. The SQL file contains formatting variables (placeholders) that need to be replaced dynamically. In this case removing comments prevents false variable detection when those are referenced in comments.

  2. The SQL file is to be minimized, to reduce the size of what needs to go through IO and into the database server.

In either cases, it implies the SQL file is now meant only for execution, and not for reading.

And specifically for PostgreSQL, there is package pg-minify which does exactly that:

  • It removes all the comments and minimizes the resulting SQL
  • It can optionally compress the SQL to its bare minimum (option compress)

complete example

const minify = require('pg-minify');
const fs = require('fs');

fs.readFile('./sqlTest.sql', 'utf8', (err, data) => {
    if (err) {
        console.log(err);
    } else {
        console.log(minify(data));
    }
});
vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • 2
    Good suggestion! But as it is NodeJS dependent, i can't use it today. @Laurenz Albe's solution fits well in my case because is more flexible, i can use native PostgreSQL and Linux commands. – Marcio Mazzucato Jun 23 '17 at 13:19
  • 2
    There's a third good reason for removing comments from the output of `pg_dump` specifically: they make it harder to read, by taking up a bunch of vertical space (six lines each, only one of which has text) and convey no information. The line comment "Name: mytable; Type: TABLE; Schema: public" is useless when it's immediately followed by `CREATE TABLE public.mytable`. I would like to remove these comments specifically to make it _more_ human-readable, not to minify it. – wchargin Jan 01 '22 at 08:31
  • Removing comments makes the schema dump consistent between different platforms, which is essential if it is checked into the version control. – Ilya Semenov Aug 07 '23 at 08:04
1

Marcio, piping is the process of taking the output of one process and feeding it directly into another to achieve a specific purpose. Let's say you were using Julia to achieve the result you need. Create a test database and play with it to ensure that you get the desired result. This Julia command would produce a backup with comments:

run(pipeline(`pg_dump -d test`,"testdump.sql"))

Here Julia is asked to dump the backup into testdump.sql so that we can check the result. Note the backticks. Then comes another command which uses the filter suggested by @LaurenzAlbe:

run(pipeline(`cat testdump.sql`,`sed -e '/^--/d'`,"testdump2.sql"))

Here we have a three part pipeline which scans the backup with the comments, strips the comments out and dumps what remains into testdump2. You can now check that the first and second files are what is required.

Once you have confidence that the solution provided by @LaurenzAlbe is correct, you can make the required substitutions to run the entire thing in one pipeline command. Of course you can do the same thing directly in a bash terminal or Python or the scripting engine of your choice.

Colin Beckingham
  • 527
  • 3
  • 11