Skip first row from Hive external table in sparklyr

Question

I have following data:

 "ElemUID   ElemName    Kind    Number  DaySecFrom(UTC) DaySecTo(UTC)"
"399126817  A648/13FKO-66   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492732  A661/18FRS-97   DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126819  A648/12FKO-2    DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126818  A648/12FKO-1    DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126816  A648/13FKO-65   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331142  A661/31OFN-1    DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331143  A661/31OFN-2    DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492739  A5/28FKN-65 DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492735  A661/23FRS-97   DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492740  B44/104FSN-33   DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"

I loaded it to HDFS. Then I defined a external table in Hive:

CREATE EXTERNAL TABLE IF NOT EXISTS deg
(
ElemUID int,
ElemName string,
Kind string,
Number float,
timefromdeg string,
timetodeg string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES ("skip.header.line.count"="1");

Then I loaded the dataset into the defined schema with LOAD DATA INPATH.. Now I want it to load it with tbl() to sparklyr. Everytime I do this I allways get the header as data in the first row: Output of glimpse():

Variables: 6
$ elemuid  <int> NA, 399126817, 483492732, 399126819, 399126818, 399126816, 39...
$ elemname <chr> "ElemName", "A648/13FKO-66", "A661/18FRS-97", "A648/12FKO-2",...
$ kind     <chr> "Kind", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ...
$ number   <dbl> NaN, NaN, 120, 60, 180, NaN, 120, NaN, NaN, 60, 180, NaN, NaN...
$ timefrom <dttm> NA, 2017-07-01 23:58:00, 2017-07-01 23:58:00, 2017-07-01 23:...
$ timeto   <dttm> NA, 2017-07-01 23:59:00, 2017-07-01 23:59:00, 2017-07-01 23:...

I feel like this disturbs my later analysis. At create external table() I already used TBLPROPERTIES ("skip.header.line.count"="1")

Is there a possibility to skip the first row in sparklyr?

Thank you!

the first row is mostly empty except for elemenname and kind. Please show a small reproducible example and expected output — akrun, Apr 18 '18 at 06:30
i think elemname and kind is not NA because these columns have a reproduceable format (char) — user60856839, Apr 18 '18 at 06:52
Don't think this is a sparklyr issue, see https://stackoverflow.com/questions/15751999/hive-external-table-skip-first-row — kevinykuo, Apr 26 '18 at 20:47

Skip first row from Hive external table in sparklyr

0 Answers0