NOLOCK in Hive and Impala

Question

I want to understand why we don't use the NOLOCK keyword at the end SQL queries that run in Impala, Hive or MySQL. For example, I use both Hive and Impala through the Cloudera workbench at work. Cloudera workbench does not recognise the NOLOCK keyword. Similarly, I have used MySQL before as well. Even there, NOLOCK wouldn't work. Why don't we need to use NOLOCK with Hive, Impala, MySQL or Big Query?

Disclaimer: I want to clarify that I have a decent understanding of the NOLOCK feature and its use cases. This question is related to SQL flavours that do not require NOLOCK.

leftjoin · Accepted Answer · 2022-02-03T19:43:20.623

2

First of all, let's clarify what is NOLOCK. NOLOCK table hint allows to override default transaction isolation level(READ COMMITTED). NOLOCK = READ UNCOMMITTED transaction isolation level, that allows the query to see the data changes before committing the transaction which is changing the data.

Hive does not support such isolation level (READ UNCOMMITTED) even in transaction mode. Oracle also does not support READ UNCOMMITTED, so no need in NOLOCK hint in these databases.

In Hive you can turn off concurrency.

edited Feb 03 '22 at 19:43

answered Feb 03 '22 at 14:42

leftjoin

36,950
8
57
116

Thank you for your reply. Does that mean Hive and Oracle only read committed data? If true, where does the uncommitted data go? – Excel-lit Feb 03 '22 at 18:21
@Excel-lit the fact that they do not support it does not mean that they have similar architecture. Hive does not support rollback and commit ( all are autocommitted). It can fail or succeed, snapshot isolation is supported. In Oracle the uncommitted changes are applied to the data and redo log is used to support isolation and recovery https://docs.oracle.com/cd/E18283_01/server.112/e17120/onlineredo001.htm – leftjoin Feb 03 '22 at 18:31
See also this: https://stackoverflow.com/a/63378038/2700344 – leftjoin Feb 03 '22 at 18:48

score 0 · Answer 2 · answered Feb 03 '22 at 20:48

Further to @leftjoin:

At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration

NOLOCK in Hive and Impala

2 Answers2