Count rows in massive .csv file

Question

dumping a Postgres table out by sections is yielding sections that are 30GB+ in size. The files are landing on a windows 2008 server. I'm trying to count the rows in the csv to ensure I have a row count that I expect (22,725,303 to be exact). I can count the rows in the section that I expect to dump - but I am not sure if I'm getting them all.

It's a 190M row table so sections of table is the way to go.

so how can I count the rows so I know I've got the full section?

Copy the .csv file(s) to a unix machine and run `wc -l thefile.csv` on them? — wildplasser, Nov 04 '17 at 15:09
Yeah, I'm not keen to move 300GB around the network. That's onerous. — Matt Coblentz, Nov 04 '17 at 18:07
Maybe cygwin contains the file-utilities? [alternatively, you could boot from an ubuntu-usb-stick, attempt to mount your(ntfs?) disk, and run `wc -l`] ... Or you could write a small program that just counts the `'\n'` s. — wildplasser, Nov 04 '17 at 19:36

score 0 · Answer 1 · answered Nov 04 '17 at 06:36

0

In a PL/pgSQL function, you can get the count of rows processed by the last command - since Postgres 9.3 including COPY - with:

GET DIAGNOSTICS x = ROW_COUNT;

Get the count of rows from a COPY command

answered Nov 04 '17 at 06:36

Erwin Brandstetter

605,456
145
1,078
1,228

Good call - sadly, Greenplum which is based on Postgres 8.3 might not do this. I'll give it a go. – Matt Coblentz Nov 04 '17 at 18:08
1

@MatthewCoblentz: You might have mentioned in the *question* that it's about Greenplum, which is *not* Postgres by a long shot. I doubt my answer works for Greenplum. – Erwin Brandstetter Nov 04 '17 at 19:18

Count rows in massive .csv file

1 Answers1