Why does Shark running on EC2 give me a "Wrong FS" error when writing data to S3

Question

I am running Shark/Spark (0.9.1) on Amazon EC2 using the supplied setup scripts. I am reading data out of S3 and then trying to write back a table into S3. The data can be read from S3 fine (so my credentials are correct) but when I try to write data to S3 I get the following error:

14/07/31 16:42:30 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalArgumentException: Wrong FS: s3n://id:key@shadoop/tmp/hive-root/hive_2014-07-31_16-39-29_825_6436105804053790400/_tmp.-ext-10000, expected: hdfs://ecmachine.compute-1.amazonaws.com:9000 [duplicate 3]

I've tried several different methods of writing out data/tables but they all produce the same error. This particular error is generated from a HQL query like:

INSERT OVERWRITE DIRECTORY 's3n://id:key@shadoop/bucket' SELECT * FROM table;

Any ideas on why S3 is seen as a "wrong FS"?

I have also tried referencing the suggestions here to no avail: http://stackoverflow.com/questions/22220837/how-to-add-partition-using-hive-by-a-specific-date — gallamine, Aug 04 '14 at 13:52

score 0 · Answer 1 · edited May 23 '17 at 11:49

0

Wrong FS normally means wrong hostname. Here is an SO post relating to setting the hostname: getting Wrong FS: file while running hive query

Here is a thread where another user solved this problem by adding entries to the /etc/hosts file on his server: http://hadoop-common.472056.n3.nabble.com/Wrong-FS-td326744.html

However, after reading through your question several times it looks like your error is different. Your query is expecting to be written to hadoop FS, not S3. expected: hdfs://ecmachine.compute-1.amazonaws.com:9000. Perhaps you could export your query into a CSV and then import it into S3? See example here:

Exporting Hive Table to a S3 bucket

edited May 23 '17 at 11:49

Community

1
1

answered Aug 08 '14 at 12:41

Jordan

2,992
2
20
29

1

I can export, then import but that is a painful process - writing all of the files to a central machine from the HDFS and then uploading. From everything I read, Shark should be able to target S3 directly, but I can't seem to get any solutions as to why this isn't working. Thanks for the help though. – gallamine Aug 11 '14 at 13:28

Why does Shark running on EC2 give me a "Wrong FS" error when writing data to S3

1 Answers1