1

Let's say I have a table:

db.table

I load the table and do some transforms on it, and, finally, attempt to store it

mytable = LOAD 'db.table' USING HCatLoader();

.
.
-- My transforms
.
.

STORE mytable_final INTO 'db.table' USING HCatStorer();

But the code complains I'm writing into a table with existing data.

I've looked at this JIRA ticket, which seems to be inactive (I have tried using FORCE and OVERWRITE in several places in the STORE command).

I've also looked at this SO post, but the author is loading from one location and storing in a different location. If I use what is in that post, the result from the transformation is no data. Deleting the files isn't an option. I'm thinking of storing the files temporarily, but I don't know if this is the best option.

I am trying to get the behavior you get in Hive using INSERT OVERWRITE.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Marko Galesic
  • 472
  • 5
  • 17
  • Why do you say "deleting the files isn't an option"? – reo katoa Oct 07 '13 at 19:44
  • Should have clarified - deleting the files before doing a Store, which forces Pig to execute on the DAG (i.e. the transformation), is not an option. We get nothing as the result if we do that. – Marko Galesic Oct 07 '13 at 19:54
  • I am facing similar issue I am getting following exception. org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : tablename. When I changed my table to External table exception is gone but script is still failing without any exception stacktrace. – Chetan Shirke Mar 03 '14 at 07:15

1 Answers1

2

I am not familiar with HCatLoader and HCatStorer. But if you LOAD from and STORE to HDFS, Pig provides shell commands that enable you to do the deleting and moving from within your script.

STORE A INTO '/this/path/is/temporary';
RMF '/this/path/is/permanent';
MV '/this/path/is/temporary' '/this/path/is/permanent';
reo katoa
  • 5,751
  • 1
  • 18
  • 30