2

I have following data:

 "ElemUID   ElemName    Kind    Number  DaySecFrom(UTC) DaySecTo(UTC)"
"399126817  A648/13FKO-66   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492732  A661/18FRS-97   DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126819  A648/12FKO-2    DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126818  A648/12FKO-1    DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"399126816  A648/13FKO-65   DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331142  A661/31OFN-1    DEZ   120.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"398331143  A661/31OFN-2    DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492739  A5/28FKN-65 DEZ     2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492735  A661/23FRS-97   DEZ    60.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"
"483492740  B44/104FSN-33   DEZ   180.00    2017-07-01 23:58:00.000 2017-07-01 23:59:00.000"

I loaded it to HDFS. Then I defined a external table in Hive:

CREATE EXTERNAL TABLE IF NOT EXISTS deg
(
ElemUID int,
ElemName string,
Kind string,
Number float,
timefromdeg string,
timetodeg string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES ("skip.header.line.count"="1");

Then I loaded the dataset into the defined schema with LOAD DATA INPATH.. Now I want it to load it with tbl() to sparklyr. Everytime I do this I allways get the header as data in the first row: Output of glimpse():

Variables: 6
$ elemuid  <int> NA, 399126817, 483492732, 399126819, 399126818, 399126816, 39...
$ elemname <chr> "ElemName", "A648/13FKO-66", "A661/18FRS-97", "A648/12FKO-2",...
$ kind     <chr> "Kind", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ", "DEZ...
$ number   <dbl> NaN, NaN, 120, 60, 180, NaN, 120, NaN, NaN, 60, 180, NaN, NaN...
$ timefrom <dttm> NA, 2017-07-01 23:58:00, 2017-07-01 23:58:00, 2017-07-01 23:...
$ timeto   <dttm> NA, 2017-07-01 23:59:00, 2017-07-01 23:59:00, 2017-07-01 23:...

I feel like this disturbs my later analysis. At create external table() I already used TBLPROPERTIES ("skip.header.line.count"="1")

Is there a possibility to skip the first row in sparklyr?

Thank you!

user60856839
  • 133
  • 11

0 Answers0