I am trying to read csv file and create a external table query by the dataframe. Please help me how can achieve my goal?
Example:
Sppose I have df like this-
df = pd.DataFrame({'A': [1,2,3], 'B': [True, False, False], 'C': ['a', 'b', 'c']})
print(df.dtypes)
A int64
B bool
C object
dtype: object
I have to create external table based on the information given by dataframe-
CREATE EXTERNAL TABLE schema_name.table_name
(
A INT,
B VARCHAR(100),
C VARCHAR(100)
) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
'separatorChar' = ','
)
LOCATION 'location'
TABLE PROPERTIES ('skip.header.line.count'='1') ;
The conversion should be like this -
int64 - INT,
float64 - FLOAT,
object - VARCHAR(100),
bool - VARCHAR(10),
date - TIMESTAMP
Please help me to how can I create external table?