4

I'm trying run this a query, with a JOIN without ON property.

I'm running the query like:

hive -v -f  my_file.hql

I got this message:

In strict mode, cartesian product is not allowed. If you really want to perform the operation, set hive.mapred.mode=nonstrict

I update the hql file with:
set hive.mapred.mode=nonstrict on top of it.

But then I got this message:

SET hive.mapred.mode=nonstrict Query returned non-zero code: 1, cause: Cannot modify hive.mapred.mode at runtime. It is in the listof parameters that can't be modified at runtime

How I can solve this issue?

ps: I want to make this cartesian product.

How I make it happen? Where I can set this variable hive.mapred.mode works ?

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
Alvaro Silvino
  • 9,441
  • 12
  • 52
  • 80
  • have you tried setting it in hive-site.xml? – spijs Feb 16 '17 at 09:10
  • I don't have access to the hive-site.xml (using a shared cluster) @spijs – Alvaro Silvino Feb 16 '17 at 09:17
  • Do you have a good reason why you don't want to use `ON`? Apparently Hive cannot optimize converting a where clause to a `ON`. – spijs Feb 16 '17 at 09:25
  • I'm just using another table that does not necessary match perfectly with each other that's why does not make sense for me in this case use the ON on it. @spijs – Alvaro Silvino Feb 16 '17 at 09:28
  • Have you tried to set it in Hive interactive shell? If that also is not possible, it probably means that this has to be set before turning on hive-server meaning you probably can't figure this out without contacting your cluster admin. – spijs Feb 16 '17 at 10:12
  • What you could try though is to add a column to both tables which always has for example the value 1. If you then do a join on these values you would get the cartesian product if I'm not mistaken. – spijs Feb 16 '17 at 10:22

1 Answers1

13

As you already know a cartesian product is not allowed in strict mode (and for good reasons). In your use case it seems like you don't have permissions to make changes to these type of hive settings.

To hack around this problem what you could do is the following. First create two new tables

create table new_1 as SELECT *,1 as join_key from table1;
create table new_2 as SELECT *,1 as join_key2 from table2;

Then join these tables on this join_key. The result will be the cartesian product since it will match each row of table1 with each row of table2.

select * from new_1 join new_2 on join_key=join_key2

Just found out that using --hiveconf solves the problem:

hive -v -f  my_file.hql --hiveconf hive.mapred.mode=nonstrict

will allow the nonstrict mode specifically for this query.

Alvaro Silvino
  • 9,441
  • 12
  • 52
  • 80
spijs
  • 1,489
  • 18
  • 36