-1

dumping a Postgres table out by sections is yielding sections that are 30GB+ in size. The files are landing on a windows 2008 server. I'm trying to count the rows in the csv to ensure I have a row count that I expect (22,725,303 to be exact). I can count the rows in the section that I expect to dump - but I am not sure if I'm getting them all.

It's a 190M row table so sections of table is the way to go.

so how can I count the rows so I know I've got the full section?

  • Copy the .csv file(s) to a unix machine and run `wc -l thefile.csv` on them? – wildplasser Nov 04 '17 at 15:09
  • Yeah, I'm not keen to move 300GB around the network. That's onerous. – Matt Coblentz Nov 04 '17 at 18:07
  • Maybe cygwin contains the file-utilities? [alternatively, you could boot from an ubuntu-usb-stick, attempt to mount your(ntfs?) disk, and run `wc -l`] ... Or you could write a small program that just counts the `'\n'` s. – wildplasser Nov 04 '17 at 19:36

1 Answers1

0

In a PL/pgSQL function, you can get the count of rows processed by the last command - since Postgres 9.3 including COPY - with:

GET DIAGNOSTICS x = ROW_COUNT;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Good call - sadly, Greenplum which is based on Postgres 8.3 might not do this. I'll give it a go. – Matt Coblentz Nov 04 '17 at 18:08
  • 1
    @MatthewCoblentz: You might have mentioned in the *question* that it's about Greenplum, which is *not* Postgres by a long shot. I doubt my answer works for Greenplum. – Erwin Brandstetter Nov 04 '17 at 19:18