I have a dataframe with 320 rows. I converted it to ndjson with pandas:
df.to_json('file.json', orient='records', lines=True)
However upon loading the data, I only obtain 200 rows.
with open('file.json') as f:
print(len(f.readlines()))
gives 200
spark.read.json('file.json').count
also gives 200
Only reloading it with pandas give the correct row count:
pd.read_json('file.json', orient='records', lines=True)
My dataset contains \n
characters in the fields. I am expecting to have as much or more lines when I load the records with python or spark.
What is the issue here with the pandas.to_json
method?