2

Im trying to load hdfs data as external but get the following error.

The folder ml-100k has multiple datasets with different datasets, so I just need to load that particular file.

hive> create external table movie_ratings (movie_id int, user_id int, ratings int, field_4 int) location 'hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-100k/u.data'
    > ;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-100k/u.data is not a directory or unable to create one)
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
user1050619
  • 19,822
  • 85
  • 237
  • 413

2 Answers2

4

You cannot create a table that points to a file, only to a directory, but there is a feature/bug that allows you to alter the location to a specific file.

create external table movie_ratings (movie_id int, user_id int, ratings int, field_4 int) location 'hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-100k';

alter table movie_ratings set location 'hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-100k/u.data';
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
0

You cannot create a Hive table over a specific file, you need to give a directory. So you can create a subdirectory under ml-100k/ and use it like this :

create external table movie_ratings (movie_id int, user_id int, ratings int, field_4 int) location 'hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-100k/new_subfilder/'

The bug mentioned by @Dudu may solve a specific case, but its not safe for general use, because inserting into such table will create new files and will never append the specified one !

54l3d
  • 3,913
  • 4
  • 32
  • 58