1

While loading the parquet file and getting the below error.

My parquet file contains 5,128,680 rows and it is loading 5,100,000 only and not loading 28,680 records and code as like below:

Error :

Response 
{
  "errorMessage": "Length mismatch: Expected axis has 28680 elements, new values have 100000 elements",
  "errorType": "ValueError",


import json 
import os
import pg8000
import boto3
import awswrangler as wr
import pandas as pd
import numpy as np
import pyarrow as pa

def lambda_handler(event, context):
    
    ## connect to RDS-ODS
    con = pg8000.connect(user=credential['username'], password=credential['password'], host=credential['host'], database=credential['dbname'])
    cursor = con.cursor()    
    dfs=wr.s3.read_parquet(path='s3://demobucket/sample_data_output_50lacks.parquet',chunked=100000)
    for wrdfs in dfs:
        wr.postgresql.to_sql(df=wrdfs,table="demo_test",schema="public",con=con)
    con.close()
    
    return {
        'statusCode': 200,
        'body': json.dumps('Parquet file reading')
    } 
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
murali
  • 11
  • 4
  • Which line is generating the error? Do you get the same error if you run it _outside_ of an AWS Lambda function? The error has nothing to do with Lambda. It seems to be related to the library you are using to read/write the data. – John Rotenstein Jan 29 '22 at 00:25
  • thanks for quick response. outside lambda also getting the error. if it is related to library in this case how to resolve it. – murali Jan 30 '22 at 16:06
  • only issue with chunked option. – murali Jan 30 '22 at 16:14
  • I have updated the title, since this seems related to AWS Data Wrangler. – John Rotenstein Jan 30 '22 at 22:51

0 Answers0