3

I am new to nextflow and here is a practice that I wanted to test for a real job.

#!/usr/bin/env nextflow

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
cns_ch.view()

The output of this script is:

N E X T F L O W  ~  version 21.04.0
Launching `cnvkit_call.nf` [festering_wescoff] - revision: 886ab3cf13
/data1/deliver/phase2/CNVkit/002-002_L4_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/015-002_L4.SSHT89_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/004-005_L1_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/018-008_L1.SSHT31_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/003-002_L3_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/002-004_L6_sorted_dedup.cns

Here 002-002, 015-002, 004-005 etc are sample ids. I am trying to write a simple process to output a file such as ${sample.id}_sorted_dedup.calls.cns but I am not sure how to extract these ids and output it.

process cnvcalls {
    input:
    file(cns_file) from cns_ch

    output:
    file("${sample.id}_sorted_dedup.calls.cns") into cnscalls_ch

    script:
    """
    cnvkit.py call ${cns_file} -o ${sample.id}_sorted_dedup.calls.cns
    """
}

How to revise the process cnvcalls to make it work with sample.id?

David Z
  • 6,641
  • 11
  • 50
  • 101

1 Answers1

2

There's lots of ways to extract the sample names/ids from filenames. One way could be to split on the underscore and take the first element:

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)


process cnvcalls {

    input:
    path(cns_file) from cns_ch

    output:
    path("${sample_id}_sorted_dedup.calls.cns") into cnscalls_ch

    script:
    sample_id = cns_file.name.split('_')[0]

    """
    cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
    """
}

Though, my preference would be to input the sample name/id alongside the input file using a tuple:

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns).map {
    tuple( it.name.split('_')[0], it )
}


process cnvcalls {

    input:
    tuple val(sample_id), path(cns_file) from cns_ch

    output:
    path "${sample_id}_sorted_dedup.calls.cns" into cnscalls_ch

    """
    cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
    """
}
Steve
  • 51,466
  • 13
  • 89
  • 103