0

I have created a hive table and integrated it to bigSQL. in the hive my count is proper, but in bigSQL, the record count is extra by 1. Below is the table properties that I have used to create the hive table.

create table test(name string,age int,sal float,city string,country string,emp_id int,increment int)
ROW FORMAT DELIMITED FIELDS TERMINATED  BY '|'
STORED AS TEXTFILE
LOCATION '/user/test'
tblproperties ("skip.header.line.count"="1");

The textfile that I am loading has column names in the very first row. So I have to use the

tblproperties ("skip.header.line.count"="1");

When I do a count query in hive, I get below output

Total MapReduce CPU Time Spent: 7 seconds 440 msec
OK
48203

However, when I synced the table in bigSQL, I am getting below count

+-------+
|     1 |
+-------+
| 48204 |

Any idea, where am I committing the mistake?

thanks

learner
  • 73
  • 2
  • 9
  • BigSQL probably does not honor `skip.header.line.count` -- Spark does not, for example (and Impala did not in earlier versions) – Samson Scharfrichter Oct 01 '18 at 14:21
  • So, what would be the workaround in this case? Since the text file that I am loading on the HIVE table has column names in it. So forcibly, I will have to use the "skip.header.line.count". Please suggest. – learner Oct 02 '18 at 03:58

1 Answers1

0

I found the workaround for this problem.

1) We need to create a temp hive table with tblproperties ("skip.header.line.count"="1");.
2) Load the file on this temp table.
3) create another table without tblproperties ("skip.header.line.count"="1");.
4) insert into tbl select * from temo_tbl;.
learner
  • 73
  • 2
  • 9