-1

I am trying to load data from Oracle to Greenplum using Java. I store the result set as comma separated values in to byte array input stream and then load it using copy in.

import java.sql.*; 
import au.com.bytecode.opencsv.CSVWriter;
import java.io.*;
import org.postgresql.copy.CopyManager;
import org.postgresql.core.BaseConnection;

public class ORtoGP {   
        public static void main(String[] args) throws SQLException {
            try {
                String dbURL = "jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = xxxxxx)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = xxxxxx) (SRVR = DEDICATED)))";
                String strUserID = "xxxxxx";
                String strPassword = "xxxxxx";
                Connection myConnection=DriverManager.getConnection(dbURL,strUserID,strPassword);
                Statement sqlStatement = myConnection.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
                String readRecordSQL = "select id,name from table where rownum <= 10 ";
                ResultSet rs = sqlStatement.executeQuery(readRecordSQL); 

                StringWriter stringWriter = new StringWriter();
                CSVWriter csvWriter = new CSVWriter(stringWriter);

                rs.first(); 
                csvWriter.writeAll(rs, true);
                String orresult = stringWriter.toString();
                System.out.println(orresult);

                byte[] bytes = orresult.getBytes();
                ByteArrayInputStream orinput = new ByteArrayInputStream(bytes); 


                String dbURL1 = "jdbc:postgresql://xxxxx:5432/xxxxx";
                String user = "xxxx";
                String pass = "xxxx";
                Connection conn2 = DriverManager.getConnection(dbURL1, user, pass);

                CopyManager copyManager = new CopyManager((BaseConnection) conn2);
                copyManager.copyIn("copy java_test from stdin with DELIMITER ','",orinput);

                rs.close();
                myConnection.close();
                csvWriter.close();

            } catch (Exception e) {
                System.out.println(e);
            }       
        }
    }

However, I run in to two issues:

  1. While bulk loading the data, the process is unable to identify end of line. So it gives this error. "ERROR: extra data after last expected column "
  2. Also it tries to load the data including double quotes around values.
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
CuriP
  • 83
  • 10
  • There is an easier way. There are two open source projects that make it a lot easier to load data from Oracle into Greenplum. One is Outsourcer and the other is gplink. Both can be found here: http://www.pivotalguru.com/ – Jon Roberts Aug 13 '18 at 14:18
  • Thanks.. I have already checked Outsourcer.. it has a dependency that you need to run it from a server where gpfdist is installed or something like that. I don't remember the specifics as it has been a while since I tried, but it just cannot be run from 'any machine'. I still tried to get it to work in a specific server, but was getting some Java errors, so I just left it. – CuriP Aug 13 '18 at 15:10
  • Outsourcer has to run on a machine that every host in the cluster can access. So the Master or Standby are the best options since the segment hosts are in a private network. – Jon Roberts Aug 13 '18 at 16:43
  • Thanks.. we have some challenges in getting it run from Master or Standby. DBA has pushed back master option.. we have operational issues between business groups in using stand by. So that option is also a ruled out. So I gave up on it. – CuriP Aug 13 '18 at 18:25

1 Answers1

0

According to the documentation the default format is text, which does not handle quoting.

You need to specify FORMAT csv in your command.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • Thanks.. I added this - copyManager.copyIn("copy java_test from stdin with csv header",orinput);, and it resolved those issues. However, I get a different error, org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92 Hint: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". I checked the database, and noticed that it does have UTF8 encoding. So does this error imply that incoming data is in a different form of encoding? How can this be fixed? Thanks. – CuriP Aug 12 '18 at 05:43
  • Did you look at the documentation link I provided? Did you see the `ENCODING` option? – Jim Garrison Aug 12 '18 at 05:50
  • Yes, I did try encoding as first thing. Unfortunately encoding not supported in my Postgres version - 8.2 (https://www.postgresql.org/docs/8.2/static/sql-copy.html). I also tried this - orresult.replaceAll("\u0000", "");.. but doesn't seem to make much difference. – CuriP Aug 12 '18 at 05:59
  • You will have to re-encode the data as you read it. The issue is not `\u0000` but `0x92`. – Jim Garrison Aug 12 '18 at 06:02
  • I add this to the code, and it resolved the encoding issue too.. - byte[] bytes = orresult.getBytes("UTF8"); . Thank you for taking time. – CuriP Aug 12 '18 at 20:39