1

I am trying to execute an sql statement (ExecuteSql proc) for creating a table, table's name is based on a filename attribute, and when ExecuteSQl gets executed, it creates the table but does not route the file to any relationship.

I think this is in part due to the fact that in order to create a table in CrateIO you need to put ";" at the end. If I do that I also get an error.

Any ideas ?

bsd
  • 1,207
  • 4
  • 15
  • 28

1 Answers1

2

ExecuteSQL is intended for queries that return a ResultSet. Try PutSQL for things like table creation (and other DDL statements). You shouldn't need a semicolon there either, but you can try it both ways to see what CrateIO likes.

EDIT (reply to comments): If you are specifying the query in ExecuteSQL, it probably looks something like:

CREATE TABLE ${filename} (myColumn1 STRING, myColumn2 INT, ...)

You can do a similar thing with a ReplaceText processor after the GetFile and before PutSQL. ReplaceText will set the content of the flow file to your SQL query, which is what PutSQL wants.

EDIT 2: If you want custom logic, you could consider a scripting processor vs writing a full custom processor. Then you could do the generation of a CREATE TABLE followed by a translation of CSV rows to INSERT statements. If you find yourself wanting to connect to a DB, check my blog post on how to use DBCP Connection Pools in a scripting processor.

mattyb
  • 11,693
  • 15
  • 20
  • 1
    I need to create a table based on the file's name. I don't see exactly how I could do that using PutSQL. So I have something like: [ GetFile ] ----> [ ExecuteSQL ] ----> rest of the flow. I am trying to achieve the insertion of a number of files from a directory in a separate table for each file in CrateIO, and then move the read files to another dir. – bsd Oct 31 '16 at 14:36
  • 1
    In order to use PutSQL you should have something like ConvertJSONToSQL in front of, and this processor's properties, you only have INSERT and UPDATE. In the PutSQL processor's manual page: _Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute._ – bsd Oct 31 '16 at 16:49
  • 1
    Don't you already have the SQL you want to execute? What are you putting into ExecuteSQL for the query? – mattyb Oct 31 '16 at 16:51
  • 1
    The doc is a little misleading, it executes any SQL command (except callable statements) that doesn't return a ResultSet, not just INSERT and UPDATE. – mattyb Oct 31 '16 at 16:59
  • 1
    The thing is that I don't want to change the contents of the file. I have a CSV file which gets translated into INSERT sql queries; While using ExecuteSQL I could specify in the sql query text box the query which I want to run, and the name of the table would be the name of the flowfile. I can see a way of making this work, but it is not efficient. I could read the files, do the replacement part, create the tables, and the re-read the files and insert their contents into the newly created tables, but it is ... any ideas ? – bsd Nov 01 '16 at 10:54
  • 1
    I think it is best to write a processor that does this specific thing that I need and then routes the files further. – bsd Nov 01 '16 at 11:01
  • 1
    What does the whole flow look like? How are the rows in the CSV file being translated to INSERT statements? – mattyb Nov 01 '16 at 14:55
  • 1
    CSV to avro schema and then avro to json and then json to sql statements. – bsd Nov 01 '16 at 20:29