I've been working on a project that uses a fairly simple data pipeline to clean and transform raw csv files into processed data using Python3.8 and Lambda to create various subsets which are sent to respective S3 buckets. The Lambda function is triggered by uploading a raw csv file to an intake S3 bucket, which initiates the process.
However, I would like to also send some of that processed data directly to Quicksight for ingestion from that same Lambda function for visual inspection as well, and that's where I'm currently stuck.
A portion of the function (leaving out the imports) I have with just the csv processing and uploading to S3, and this is the portion I like direclty ingested to Quicksight:
def featureengineering(event, context):
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_file_name = event['Records'][0]['s3']['object']['key']
read_file = s3_client.get_object(Bucket=bucket_name,Key=s3_file_name)
#turning the CSV into a dataframe in AWS Lambda
s3_data = io.BytesIO(read_file.get('Body').read())
df = pd.read_csv(s3_data, encoding="ISO-8859-1")
#replacing erroneous zero values to nan (missing) which is more accurate and a general table,
#and creating a new column with just three stages instead for simplification
df[['Column_A','Column_B']] = df[['Column_A','Column_B']].replace(0,np.nan)
#applying function for feature engineering of 'newstage' function
df['NewColumn'] = df.Stage.apply(newstage)
df1 = df
df1.to_csv(csv_buffer1)
s3_resource.Object(bucket1, csv_file_1).put(Body=csv_buffer1.getvalue()) #downloading df1 to S3
So at that point where the df1 is sent to its S3 bucket (which works fine), but I'd like it directly ingested into Quicksight as an automated spice refresh as well.
In digging around I did found a similar question with an answer
import boto3
import time
import sys
client = boto3.client('quicksight')
response = client.create_ingestion(DataSetId='<dataset-id>',IngestionId='<ingetion-id>',AwsAccountId='<aws-account-id>')
but the hang up I'm having is in the DataSetId or more generally, how do I turn the pandas DataFrame df1 in the Lambda Function into something the CreateIngestion API can accept and automatically send to QuickSight as an automated spice refresh of the most recent processed data?