why there is a data mismatch in hive and bigSQL by 1 record?

Question

I have created a hive table and integrated it to bigSQL. in the hive my count is proper, but in bigSQL, the record count is extra by 1. Below is the table properties that I have used to create the hive table.

create table test(name string,age int,sal float,city string,country string,emp_id int,increment int)
ROW FORMAT DELIMITED FIELDS TERMINATED  BY '|'
STORED AS TEXTFILE
LOCATION '/user/test'
tblproperties ("skip.header.line.count"="1");

The textfile that I am loading has column names in the very first row. So I have to use the

tblproperties ("skip.header.line.count"="1");

When I do a count query in hive, I get below output

Total MapReduce CPU Time Spent: 7 seconds 440 msec
OK
48203

However, when I synced the table in bigSQL, I am getting below count

+-------+
|     1 |
+-------+
| 48204 |

Any idea, where am I committing the mistake?

thanks

BigSQL probably does not honor `skip.header.line.count` -- Spark does not, for example (and Impala did not in earlier versions) — Samson Scharfrichter, Oct 01 '18 at 14:21
So, what would be the workaround in this case? Since the text file that I am loading on the HIVE table has column names in it. So forcibly, I will have to use the "skip.header.line.count". Please suggest. — learner, Oct 02 '18 at 03:58

score 0 · Answer 1 · answered Oct 03 '18 at 08:03

I found the workaround for this problem.

1) We need to create a temp hive table with tblproperties ("skip.header.line.count"="1");.
2) Load the file on this temp table.
3) create another table without tblproperties ("skip.header.line.count"="1");.
4) insert into tbl select * from temo_tbl;.

why there is a data mismatch in hive and bigSQL by 1 record?

1 Answers1