-2

I want to monitor a bucket and when two files got added in to the bucket that should trigger the dragen tool instance and run the dragen command in the instance using those two files. Here in my specific command I'm using in the instance it takes around 1 hour to process two files and store the results in output bucket I mentioned in the command. I tried to use lambda functions in aws but the time out is 15 minutes and while waiting for the execution of command I have sent to instance through SSM service the lambda function getting timed out. I want to know are there any alternative AWS services that provide same lambda functionality but for long processing?

My code:

import boto3
import time
def check_instance_status(instance_id):
    ec2_client = boto3.client('ec2')
    response = ec2_client.describe_instances(InstanceIds=[instance_id]) 
    if len(response['Reservations']) > 0:
        instance_state = response['Reservations'][0]['Instances'][0]['State']['Name']
        print(instance_state) 
        if instance_state == 'running':
            print("Instance is running.")
        else:
            print("Instance is not running.")
    else:
        print("Instance not found.")
def wait_for_instance_start(instance_id):
    ec2_client = boto3.client('ec2')
    waiter = ec2_client.get_waiter('instance_running')
    try:
        waiter.wait(InstanceIds=[instance_id])
        print("Instance has started successfully.")
    except Exception as e:
        print("Error occurred while waiting for instance start:", str(e))
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    file_key = event['Records'][0]['s3']['object']['key']
    print(bucket)
    print(file_key)
    print("lambda function started")
    try:
       ec2_client = boto3.client('ec2')
       ssm_client = boto3.client('ssm')
       print("Event triggered")
       instance_id = 'i-0ca693219ebbd0a07'
       print("InstanceId = ", instance_id)
       ec2_client.start_instances(InstanceIds=[instance_id])
       wait_for_instance_start(instance_id)
       check_instance_status(instance_id)
       bash_command = f'/opt/edico/bin/dragen -f -1 s3://{bucket}/input-samples/{file_key} -2 s3://{bucket}/input-samples/test_03.fastq.gz --ref-dir=/home/centos/hg19 --output-directory s3://priyanshu-test-exome/output --enable-map-align=true --RGID="A.A.1" --RGSM=CNTRL-0000131 --output-file-prefix="samples" --intermediate-results-dir=/ephemeral/intermediate-output'
       print("bash_Command = ", bash_command)
       print("ssm client = ",ssm_client)  
       response = ssm_client.send_command(
       InstanceIds=[instance_id],
       DocumentName='AWS-RunShellScript',
       Parameters={'commands': [bash_command]},
        CloudWatchOutputConfig={
        'CloudWatchLogGroupName': '/aws/lambda/solutions-team-dragen-lambda',
        'CloudWatchOutputEnabled': True
        }
       )
       command_id = response['Command']['CommandId']
       print('Command sent successfully. Command ID:', command_id)
       ec2_waiter = ec2_client.get_waiter('instance_status_ok')
       ec2_waiter.wait(InstanceIds=[instance_id])
       output = ssm_client.get_command_invocation(CommandId=command_id, InstanceId=instance_id)
       print("output = ",output)
       ec2_client.stop_instances(InstanceIds=[instance_id])
       print("stopped instance successfully")
    except Exception as e:
        print(e)
        print("Error in execution")
        raise e
  • How often do these files arrive (eg once per day, or every 30 minutes)? Do you need to process more than one set of files at a time? If it is only once per day, you could use an AWS Lambda function to start an EC2 instance. The instance could then perform the work and turn itself off. See: [Auto-Stop EC2 instances when they finish a task - DEV Community](https://dev.to/aws/auto-stop-ec2-instances-when-they-finish-a-task-2f0i) – John Rotenstein Jul 09 '23 at 22:43
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 10 '23 at 06:23
  • These files arrive frequently not every 30 minutes but definitely more than once per day. Every time two files get uploaded and I needed to process those two files with a tool. The task is only single command of the tool which will run for more than 15 minutes sometimes depending on file size, So I cannot find a way to split this single command to some small tasks to make use of step functions. – saideepak Gorla Jul 15 '23 at 01:18

1 Answers1

0

Have you looked at stepfunctions and ec2? Instead of writing code as Python you write stepfunctions templates to pass units of work to ec2 workers even starting and stopping instances when not needed. You can even get into spot instances depending on the workload.

If you can increase memory to get more processing power Lambda might work. Or you can see how to split work into smaller segments combined with more memory (and cpu) to keep it within that timeout.

Aaron D
  • 31
  • 1
  • 3
  • These files arrive frequently not every 30 minutes but definitely more than once per day. Every time two files get uploaded and I needed to process those two files with a tool. The task is only single command of the tool which will run for more than 15 minutes sometimes depending on file size, So I cannot find a way to split this single command to some small tasks to make use of step functions – saideepak Gorla Jul 15 '23 at 01:19