I'm using pyspark to read and process some data from local .plt
files. Here is what the file looks like:
Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
39.984094,116.319236,0,492,39744.2451967593,2008-10-23,05:53:05
39.984198,116.319322,0,492,39744.2452083333,2008-10-23,05:53:06
39.984224,116.319402,0,492,39744.2452662037,2008-10-23,05:53:11
39.984211,116.319389,0,492,39744.2453240741,2008-10-23,05:53:16
......
As is shown above, I'm not interested in the beginning 6 rows, what I want are the rows which start from the 7th row. So I want to use spark session to read this file from the 7th row. Here is the code I've tried but failed:
from pyspark.sql import SparkSession
session = SparkSession.builder.appName('file reader').master('local[*]').getOrCreate()
df = session.read.\
option('delimiter', ',').\
option('header', 'false').\
csv('test.plt')
df.show()
Could somebody give me some advice? Thank you for your attention.