1

I am trying to use PostgreSQL CopyManager copyIn functionality with COPY FROM STDIN as suggested in the docs for very fast copying from an InputStream into a database table. I am thinking of using this to continuously stream rows that are to be written to a table as and when i receive/process one. However the below quick and dirty sample code seems to be stuck on copyIn and does not write to the table.

Anyone knows what i am missing here or if my understanding is wrong?

import java.sql.*;
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.postgresql.core.BaseConnection;
import org.postgresql.copy.CopyManager;

public class PGConnectTest {

    public static void main(String[] args) {

        try {
                try (Connection connection = DriverManager.getConnection("jdbc:postgresql://XX.XX.XX.XX:9432/somedb", "someadmin", "somepassword");
                    BaseConnection pgcon = (BaseConnection)connection;
                    PipedInputStream is = new PipedInputStream();
                    BufferedReader br = new BufferedReader(new InputStreamReader(is));
                    PipedOutputStream os = new PipedOutputStream(is);
                    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(os));) {
                        ExecutorService executorService = Executors.newSingleThreadExecutor();
                        Callable callable = () -> {
                            Thread.sleep(3000);
                            String frmtStr = "%s\t{\"id\":%s, \"somefield\":\"%s\"}\n";
                            String row = null;
                            for(int i=1; i<10; i++) {
                                row = String.format(frmtStr, i, i, ("row"+i));
                                System.out.print(row);
                                bw.write(row);
                            }
                            bw.write("\n");
                            bw.flush();
                            System.out.println("WRITTEN!");
                            return true;
                        };
                        executorService.submit(callable);
                        System.out.println(connection);
                        CopyManager copyManager = new CopyManager(pgcon);
                        String copySql = "COPY dcm.testtbl FROM STDIN";
                        executorService.submit(() -> copyManager.copyIn(copySql, br));
                        Thread.sleep(10000);
                        System.out.println("QUITTING");
                } catch (Exception e) {
                    throw e;
                }
        } catch(Exception ex) {
            System.out.println(ex);
        }

    }

}

The schema of the table testtbl is below,

create table testtbl (
id  integer primary key,
jsnclm  jsonb
)

The console output is (it does NOT return and requires using CTRL+C to kill it),

C:\Users\ml410408\Documents\Useful Lookups\POSTGRESQL>java -cp ".;postgresql-42.2.18.jar" PGConnectTest
org.postgresql.jdbc.PgConnection@41975e01
1       {"id":1, "somefield":"row1"}
2       {"id":2, "somefield":"row2"}
3       {"id":3, "somefield":"row3"}
4       {"id":4, "somefield":"row4"}
5       {"id":5, "somefield":"row5"}
6       {"id":6, "somefield":"row6"}
7       {"id":7, "somefield":"row7"}
8       {"id":8, "somefield":"row8"}
9       {"id":9, "somefield":"row9"}
WRITTEN!
QUITTING

UPDATE:

Once i changed the format of the COPY sql command from the default TEXT to CSV and pass in csv records its no longer stuck but does nothing (meaning no records in the table) even though it returns unlike before.

import java.sql.*;
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.postgresql.core.BaseConnection;
import org.postgresql.copy.CopyManager;

public class PGConnectTest {

    public static void main(String[] args) {

        try {
                try (Connection connection = DriverManager.getConnection("jdbc:postgresql://XX.XX.XX.XX:9432/somedb", "someadmin", "somepassword");
                    BaseConnection pgcon = (BaseConnection)connection;
                    PipedInputStream is = new PipedInputStream();
                    BufferedReader br = new BufferedReader(new InputStreamReader(is));
                    PipedOutputStream os = new PipedOutputStream(is);
                    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(os));) {
                        ExecutorService executorService = Executors.newSingleThreadExecutor();
                        Callable callable = () -> {
                            Thread.sleep(3000);
                            String frmtStr = "%s,'{\"id\":%s,\"somefield\":\"%s\"}'\n";
                            String row = null;
                            for(int i=1; i<10; i++) {
                                row = String.format(frmtStr, i, i, ("row"+i));
                                System.out.print(row);
                                bw.write(row);
                            }
                            bw.write("\n");
                            bw.write("'\\.'\n");
                            System.out.println("'\\.'\n");
                            bw.flush();
                            os.flush();
                            System.out.println("WRITTEN!");
                            return true;
                        };
                        executorService.submit(callable);
                        System.out.println(connection);
                        CopyManager copyManager = new CopyManager(pgcon);
                        String copySql = "COPY dcm.testtbl FROM STDIN FORMAT CSV DELIMITER ','";
                        executorService.submit(() -> copyManager.copyIn(copySql, br));
                        Thread.sleep(5000);
                        System.out.println(br.ready());
                        while (br.ready()) {
                            System.out.println("LINE : " + br.readLine());
                        }
                        executorService.shutdown();
                        System.out.println("QUITTING");
                } catch (Exception e) {
                    throw e;
                }
                System.out.println("QUITTING FINALLY");
        } catch(Exception ex) {
            System.out.println(ex);
        }

    }

}

Thanks

lmk
  • 654
  • 5
  • 21
  • I might be missing something, but I think you just need to close the `OutputStream` once you're done with it – Nick Barnes Jan 08 '21 at 07:07
  • @NickBarnes its declared in a [try-with-resources statement](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html) which auto-closes it. Also even if its not closed (which is not the case here) why would the records coming in on the inputstream not be read and written to the table – lmk Jan 08 '21 at 07:36
  • 1
    Hah, so I did miss something... Didn't realise that all those assignments were in the `try()`. The `copyIn()` will block if the `OutputStream` isn't closed, which would have explained it hanging, but it looks like this isn't the cause. – Nick Barnes Jan 09 '21 at 02:11

1 Answers1

1

There seem to be a couple of different issues in there.

  • The program is hanging because the thread in the ExecutorService is keeping it alive; calling shutdown() after submitting the tasks causes it to terminate as expected.
  • The main reason nothing's being written is that copyIn() is throwing an exception: the trailing newline in the stream (bw.write("\n")) triggers an ERROR: invalid input syntax for integer: "" as it fails to find the id column.

Even then, it looks like this is still subject to some race conditions due to the timing of the resource cleanup. The copyIn() call will block until it reaches the end of its InputStream, and in the case of a PipedInputStream, the "end" is the point where the PipedOutputStream is closed. But after the stream is closed and the copyIn() call is unblocked, the input stream and the database connection are closed in quick succession, potentially before the copy has a chance to finalise. At best, it seems to successfully commit to the table, but then error out with a "Database connection failed when canceling copy operation".

To make sure that these resources aren't released while they're still in use:

  • Wait for the writer to complete
  • Close the OutputStream
  • Wait for the copier to complete
  • Close the InputStream / Connection

Waiting for the tasks to complete has the added benefit of propagating any exceptions to the main thread.

There's also a potential deadlock due to the newSingleThreadExecutor(): if the writer thread fills the pipe's buffer, it will block until the reader starts consuming the data, which will never happen if they're being executed sequentially. Using a newFixedThreadPool(2) should fix this.

With all that in mind:

  public static void main(String[] args) {
    ExecutorService executorService = Executors.newFixedThreadPool(2);
    try {
      try (Connection connection = DriverManager.getConnection("jdbc:postgresql://XX.XX.XX.XX:9432/somedb", "someadmin", "somepassword");
          BaseConnection pgcon = (BaseConnection) connection;
          PipedInputStream is = new PipedInputStream();
          BufferedReader br = new BufferedReader(new InputStreamReader(is));
      ) {
        Future write;
        Future copy;
        try (
            PipedOutputStream os = new PipedOutputStream(is);
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(os))) {
          write = executorService.submit(() -> {
            String frmtStr = "%s\t{\"id\":%s, \"somefield\":\"%s\"}\n";
            String row = null;
            for (int i = 1; i < 1000; i++) {
              row = String.format(frmtStr, i, i, ("row" + i));
              System.out.print(row);
              bw.write(row);
            }
            bw.flush();
            System.out.println("WRITTEN!");
            return true;
          });
          System.out.println(connection);
          CopyManager copyManager = new CopyManager(pgcon);
          String copySql = "COPY dcm.testtbl FROM STDIN";
          copy = executorService.submit(() -> copyManager.copyIn(copySql, br));
          System.out.println("QUITTING");
          write.get();
        }
        copy.get();
      }
    } catch (Exception ex) {
      System.out.println(ex);
    } finally {
      executorService.shutdown();
    }
  }
Nick Barnes
  • 19,816
  • 3
  • 51
  • 63
  • Thank you so much @Nick Barnes! The First point is something i knew (had updated my question earlier where i had added that, was just hacky code at that time), what i meant by the copy operation being stuck on copyIn is that it never seemed to see an end even when i add '.\' as mentioned in postgres [COPY](https://www.postgresql.org/docs/9.1/sql-copy.html) documentation. I missed the closing issues that you pointed and that "copyIn() call will block until it reaches the end of its InputStream, and in the case of a PipedInputStream, the "end" is the point where the PipedOutputStream is closed." – lmk Jan 09 '21 at 06:23
  • So it seems to me i really cannot use PostgreSQL CopyManager copyIn functionality to keep writing in a streaming fashion. I even tried using [CopyIn] (https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyIn.html) by doing `CopyManager#copyIn(String sql)` to get a reference and using its `writeToCopy(..)` and `flushCopy()` methods..even there nothing gets written until i call `endCopy()` after which that reference can't be used...so it seems there is no way to keep it & continuously stream rows that are to be written..which was the requirement i had as i noted in question – lmk Jan 09 '21 at 06:29
  • 1
    @lmk: Depends on what you mean by "written" - Postgres writes to the table immediately, but due to [MVCC](https://www.postgresql.org/docs/current/mvcc-intro.html), the new rows aren't visible to other sessions until the `COPY` statement completes and commits, so it's really only suitable for batch processing. If you need to see the rows as they arrive, they need to be written by separately-committed statements. But at that point, you're probably better off reusing a `PreparedStatement` for a simple `INSERT`. – Nick Barnes Jan 09 '21 at 17:41
  • Thanks again @Nick Barnes , the latter is what i meant. You have summed it up pretty well. I guess the only other way i can do this is to keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file...what do you think? – lmk Jan 09 '21 at 18:10
  • 1
    Depends on the details of your use case. Queueing the rows locally and committing periodically is fine if you can tolerate the delay, and the potential for data loss in the event of a crash. Storing the queue in a file could have some benefits, though it might be simpler to just keep it in memory, or even to continue writing the rows as they arrive and just closing and rebuilding the copy stream every x minutes. – Nick Barnes Jan 11 '21 at 01:20
  • But the simplest and safest approach is just to commit each write independently, and re-binding the parameters on a `PreparedStatement` should still be pretty efficient, so I'd definitely stick with this unless it becomes a clear performance bottleneck. – Nick Barnes Jan 11 '21 at 01:21
  • Thanks @Nick Barnes for your thoughts. The commit per row is not going to be efficient since the use case here is potential thousands of entries per minute & also i don't need any transactional semantics for this table. The csv file writing/appending was meant mainly to avoid data loss otherwise we could take the last option that you note..i wish `copyIn()' would have provided an option to keep the stream open & notify end of stream via ('\.') or some other reserved character.. – lmk Jan 11 '21 at 05:34