3

I am redesigning a workflow, basically it starts off with a process which spawns multiple other processes. Initially I had the variables before starting my workflow and so I made a tuple of these variables and then passed it as input to a process. The process gets each value, and spawns a process for each value in the tuple.

However in my new architecture I get the 'tuple' in my processA. And processB then needs to take each value as input, and spawn one process per input.

My tuple looks like: {"002--002": some_params, "004--004": some_params, etc.}

I currently have these values as a list in Python: ['052--052', '054--054', '055--055', '059--059', '060--060', '066--066']

I was wondering how can I parse this Python list, to keep passing one argument and spawn multiple processes?

ProcessA also creates files such as somefile_052--052.someextension - and I basically want to pass the correct variable with the correct file.

Any help would be greatly appreciated.

Here is some code:

This is the files that I need to manipulate. I need to send all files with the same code, alongside the variable.

> ls
out.barcoded.subreads.bam             out.subreads.060--060.bam.pbi         out.subreads.090--090.subreadset.xml  out.subreads.149--149.bam             out.subreads.192--192.bam.pbi         out.subreads.249--249.subreadset.xml  out.subreads.285--285.bam             out.subreads.321--321.bam.pbi         out.subreads.479--479.subreadset.xml
out.barcoded.subreads.bam.pbi         out.subreads.060--060.subreadset.xml  out.subreads.091--091.bam             out.subreads.149--149.bam.pbi         out.subreads.192--192.subreadset.xml  out.subreads.252--252.bam             out.subreads.285--285.bam.pbi         out.subreads.321--321.subreadset.xml  out.subreads.482--482.bam
out.barcoded.subreads.lima.counts     out.subreads.066--066.bam             out.subreads.091--091.bam.pbi         out.subreads.149--149.subreadset.xml  out.subreads.227--227.bam             out.subreads.252--252.bam.pbi         out.subreads.285--285.subreadset.xml  out.subreads.454--454.bam             out.subreads.482--482.bam.pbi
out.barcoded.subreads.lima.guess      out.subreads.066--066.bam.pbi         out.subreads.091--091.subreadset.xml  out.subreads.172--172.bam             out.subreads.227--227.bam.pbi         out.subreads.252--252.subreadset.xml  out.subreads.303--303.bam             out.subreads.454--454.bam.pbi         out.subreads.482--482.subreadset.xml
out.barcoded.subreads.lima.report     out.subreads.066--066.subreadset.xml  out.subreads.107--107.bam             out.subreads.172--172.bam.pbi         out.subreads.227--227.subreadset.xml  out.subreads.259--259.bam             out.subreads.303--303.bam.pbi         out.subreads.454--454.subreadset.xml  out.subreads.489--489.bam
out.barcoded.subreads.lima.summary    out.subreads.071--071.bam             out.subreads.107--107.bam.pbi         out.subreads.172--172.subreadset.xml  out.subreads.233--233.bam             out.subreads.259--259.bam.pbi         out.subreads.303--303.subreadset.xml  out.subreads.464--464.bam             out.subreads.489--489.bam.pbi
out.barcoded.subreads.subreadset.xml  out.subreads.071--071.bam.pbi         out.subreads.107--107.subreadset.xml  out.subreads.175--175.bam             out.subreads.233--233.bam.pbi         out.subreads.259--259.subreadset.xml  out.subreads.307--307.bam             out.subreads.464--464.bam.pbi         out.subreads.489--489.subreadset.xml
out.subreads.052--052.bam             out.subreads.071--071.subreadset.xml  out.subreads.112--112.bam             out.subreads.175--175.bam.pbi         out.subreads.233--233.subreadset.xml  out.subreads.261--261.bam             out.subreads.307--307.bam.pbi         out.subreads.464--464.subreadset.xml  out.subreads.494--494.bam
out.subreads.052--052.bam.pbi         out.subreads.082--082.bam             out.subreads.112--112.bam.pbi         out.subreads.175--175.subreadset.xml  out.subreads.235--235.bam             out.subreads.261--261.bam.pbi         out.subreads.307--307.subreadset.xml  out.subreads.468--468.bam             out.subreads.494--494.bam.pbi
out.subreads.052--052.subreadset.xml  out.subreads.082--082.bam.pbi         out.subreads.112--112.subreadset.xml  out.subreads.185--185.bam             out.subreads.235--235.bam.pbi         out.subreads.261--261.subreadset.xml  out.subreads.313--313.bam             out.subreads.468--468.bam.pbi         out.subreads.494--494.subreadset.xml
out.subreads.054--054.bam.pbi         out.subreads.082--082.subreadset.xml  out.subreads.113--113.bam             out.subreads.185--185.bam.pbi         out.subreads.235--235.subreadset.xml  out.subreads.264--264.bam             out.subreads.313--313.bam.pbi         out.subreads.468--468.subreadset.xml  out.subreads.bam
out.subreads.054--054.subreadset.xml  out.subreads.085--085.bam             out.subreads.113--113.bam.pbi         out.subreads.185--185.subreadset.xml  out.subreads.241--241.bam             out.subreads.264--264.bam.pbi         out.subreads.313--313.subreadset.xml  out.subreads.471--471.bam             out.subreads.bam.pbi
out.subreads.055--055.bam             out.subreads.085--085.bam.pbi         out.subreads.113--113.subreadset.xml  out.subreads.187--187.bam             out.subreads.241--241.bam.pbi         out.subreads.264--264.subreadset.xml  out.subreads.316--316.bam             out.subreads.471--471.bam.pbi         out.subreads.json
out.subreads.055--055.bam.pbi         out.subreads.085--085.subreadset.xml  out.subreads.125--125.bam             out.subreads.187--187.bam.pbi         out.subreads.241--241.subreadset.xml  out.subreads.265--265.bam             out.subreads.316--316.bam.pbi         out.subreads.471--471.subreadset.xml  out.subreads.lima.counts
out.subreads.055--055.subreadset.xml  out.subreads.088--088.bam             out.subreads.125--125.bam.pbi         out.subreads.187--187.subreadset.xml  out.subreads.245--245.bam             out.subreads.265--265.bam.pbi         out.subreads.316--316.subreadset.xml  out.subreads.473--473.bam             out.subreads.lima.guess
out.subreads.059--059.bam             out.subreads.088--088.bam.pbi         out.subreads.125--125.subreadset.xml  out.subreads.188--188.bam             out.subreads.245--245.bam.pbi         out.subreads.265--265.subreadset.xml  out.subreads.317--317.bam             out.subreads.473--473.bam.pbi         out.subreads.lima.report
out.subreads.059--059.bam.pbi         out.subreads.088--088.subreadset.xml  out.subreads.143--143.bam             out.subreads.188--188.bam.pbi         out.subreads.245--245.subreadset.xml  out.subreads.273--273.bam             out.subreads.317--317.bam.pbi         out.subreads.473--473.subreadset.xml  out.subreads.lima.summary
out.subreads.059--059.subreadset.xml  out.subreads.090--090.bam             out.subreads.143--143.bam.pbi         out.subreads.188--188.subreadset.xml  out.subreads.249--249.bam             out.subreads.273--273.bam.pbi         out.subreads.317--317.subreadset.xml  out.subreads.479--479.bam             out.subreads.subreadset.xml
out.subreads.060--060.bam             out.subreads.090--090.bam.pbi         out.subreads.143--143.subreadset.xml  out.subreads.192--192.bam             out.subreads.249--249.bam.pbi         out.subreads.273--273.subreadset.xml  out.subreads.321--321.bam             out.subreads.479--479.bam.pbi

So I would like to send these files, and this variable: 059--059

out.subreads.059--059.bam
out.subreads.059--059.bam.pbi
out.subreads.059--059.subreadset.xml

Currently my code in the workflow is:

process procA{
    input:
    file bc_fasta from bc_fasta_chan

    output:
    set file("$analysis_config.cell/bam/out.subreads.*"), val("$analysis_config.cell/bam/out.subreads.*") into lima_out

    script:
    ```
    // run script to generate the above generated files
    ```
}

process procB{
    input:
    set file(bc_bam_file), val(bc_name) from lima_out.flatten()

    script:
    """
    ls
    echo ${bc_bam_file}
    """
}
DUDANF
  • 2,618
  • 1
  • 12
  • 42
  • 1
    This would benefit greatly with some example code. Does processA create an output file (with 'someextension') for each value in your list? If so you could just use [map](https://www.nextflow.io/docs/latest/operator.html#map) to get back the variable from the filenames. Not sure if I have understood what you're trying to do exactly. – Steve Jan 05 '21 at 11:59
  • I have edited my answer, have a look. I think I'm close but no breakthrough just yet! – DUDANF Jan 06 '21 at 13:04

1 Answers1

1

The trick is to be able to extract somehow the grouping variable from the filenames, and then call groupTuple. I've just used a simple regex to get this variable, but you could implement something more sophisticated if necessary:

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
    .flatten()
    .filter { it.name =~ subreads_pattern }
    .map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
    .groupTuple(size: 3, sort: true)
    .view()

Results:

[489--489, [out.subreads.489--489.bam, out.subreads.489--489.bam.pbi, out.subreads.489--489.subreadset.xml]]
[316--316, [out.subreads.316--316.bam, out.subreads.316--316.bam.pbi, out.subreads.316--316.subreadset.xml]]
...

Here's an example of how I would input these values into a process. My preference for handling companion files (in this case, we have files with the '.bam.pbi' extension) is to keep these alongside the BAM files. I just use a tuple for this. By calling first() on our tuple, we can get the BAM. This is just my preference though. You could have a separate file/path variable in your input tuple for the pbi companion file but you probably won't need to reference it in your script block.

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
    .flatten()
    .filter { it.name =~ subreads_pattern }
    .map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
    .groupTuple(size: 3, sort: true)
    .map { group_name, files -> tuple( group_name, files[2], files[0..1] ) }
    .set { subreads_ch }

process next_process {

    input:
    tuple val(group), path(subreadset), path(indexed_subreads) from subreads_ch

    """
    echo "subreadset XML: ${subreadset}"
    echo "subreads BAM: ${indexed_subreads.first()}"
    """
}
Steve
  • 51,466
  • 13
  • 89
  • 103
  • This is spot on sir! `tuple val(thisval), file("*") from lima_out` Would this work as input process syntax? – DUDANF Jan 11 '21 at 15:34
  • Just one question. This is meant to be output to a process, and then I input it into another. Is this just the names of the files or is it the actual files within the tuple? Cause currently I don't see the files. – DUDANF Jan 11 '21 at 15:54
  • 1
    @DuDoff: I think that it would work, but it might be better to rearrange some of the files inputs. Please see my edit above. Hopefully I am understanding correctly. To your second question, it's the actual files in the tuple. In my testing of the above example, the files get localized correctly in the workDir when each process is run. But I was a bit confused about your procA process outputs. I think this should have just been `file("$analysis_config.cell/bam/out.subreads.*") into lima_out`. – Steve Jan 12 '21 at 00:08
  • 1
    Thanks Steve. You are the real MVP! – DUDANF Jan 12 '21 at 12:44
  • 1
    I want to give you more than an upvote and a correct answer. Your answer has not only solved my issue but it has taught me how to manipulate variables between processes. – DUDANF Jan 12 '21 at 14:10
  • Glad I could help :-) – Steve Jan 12 '21 at 22:55
  • Hey Steve, I have a question to ask. The time I was testing the pipeline I was doing it all locally. Now that I'm trying to push it into production, I need to run the workflow in a hybrid manner. First process is local and next is on google cloud. I seem to have trouble actually uploading the files. Any idea? – DUDANF Feb 16 '21 at 14:32
  • 1
    @DUDANF, does your bucket-dir already have a subdirectory created? If not, you'll need to first create one, then you'll need to use something like: `-bucket-dir gs://your-bucket/yoursubdirectory`. What behaviour are you seeing? – Steve Feb 16 '21 at 22:53
  • 1
    I got it working. I just had to wait, it doesn't show any thing so i wondered if it was just crashed. But it works. Thank you sir! – DUDANF Feb 17 '21 at 20:12