3

Im running a query in hive on a table with partitions.

select count(*) from activity where datestamp=2016-08-16

However the query throws the following exception

java.lang.IllegalStateException: Ambiguous input path hdfs://ip-172-29-1-53.us-west-2.compute.internal:8020/hive/dcm/activity/datestamp=2016-10-01/part-r-00000-41b9fc2f-101c-423a-901e-0f617c8fbd62.gz.parquet
at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:454)
at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:501)
at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1072)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:545)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)

Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.IllegalStateException: Ambiguous input path hdfs://ip-172-29-1-53.us-west-2.compute.internal:8020/hive/dcm/activity/datestamp=2016-08-16/part-r-00000-1fd9aa5b-6e66-4bf9-b015-a940cbd6cc5a.gz.parquet
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I have checked that the path actually has partitions.I also used parquet tools jar to open up the file and does look like the file has data in the right format. Any leads on what is ambiguous about the path

Himateja Madala
  • 321
  • 1
  • 4
  • 16

1 Answers1

0

We encountered the same problem like yours when there was an insert statement with dynamic partition executed earlier that might insert into existed partitions.

In order to restore service and prevent more severe problems that the possibly corrupted metadata(partition info) may lead to, a quick fix was then applied that:

We manually cleanse the partition metadata. That is, we executed an alter table xxxx drop partition (tag >= 'yyyyyyyy'); DDL to drop all the partitions. (For an external table, this wouldn't invoke any HDFS operation. Data would be intact.)

And then:

Executed a msck repair table command.

After this fix, queries to that table became normal again.

So my guess is that the partition metadata may suggest there are more than one partitions point to the same path(so that it prompts the path is ambiguous).

To execute a hive query, the execute engine will first fetch metadata before dive into the underlying file system.

Eric
  • 430
  • 5
  • 6