following is my sample csv file.
id,name,gender
1,isuru,male
2,perera,male
3,kasun,male
4,ann,female
i converted above csv file into apache parquet using pandas library. following is my code.
import pandas as pd
df = pd.read_csv('./data/students.csv')
df.to_parquet('students.parquet')
after that i uploaded the parquet file into the s3 and created a external table like below.
create external table imp.s1 (
id integer,
name varchar(255),
gender varchar(255)
)
stored as PARQUET
location 's3://sample/students/';
after that i just run select query, but i got following error.
select * from imp.s1
Spectrum Scan Error. File 'https://s3.ap-southeast-2.amazonaws.com/sample/students/students.parquet'
has an incompatible Parquet schema for column 's3://sample/students.id'.
Column type: INT, Parquet schema:\noptional int64 id [i:0 d:1 r:0]
(s3://sample/students.parquet)
Could you please help me to figure out what's the problem in here ?