1

I'm a newbie Nexflow user. And I'm struggling to familiarize input/output jacks in Nexflow. I knew that Nextflow has DAG visualisation, a useful feature for drawing a directed chart for flow.

I have a silly small chart like this. enter image description here

I want to write a Nextflow file for the upper pipeline. Especially, I expect that the outputs of process A can be jacked on processes B and C in a particular way. Outputs name be shown off in the output flowchart (when run with tag -with-dag).

If someone helps me, I'll very much appreciate it. Thanks.

Complement

This is my script. At my level, I just only can use path as my output. This leads my script more verbose because of the paths of the file. Above all, when using the draw chart feature, the output isn't clear as I expected like the initial flowchart.

#!/usr/bin/env nextflow

params.input_text = "abc"

process A{
    input:
    val text
    
    output:
    path A_folder
    
    """
    mkdir A_folder
    string=$text 
    for element in \$(seq 0 \$((\${#string}-1)))
    do
        echo \${string:\$element:1} > A_folder/\$element.txt        
    done
    """
}

process B{
    input:
    path A_folder
    
    output:
    path B_folder
    
    """
    mkdir B_folder
    echo \$(cat $A_folder/0.txt)\$(cat $A_folder/1.txt) > B_folder/glue1.txt  
    """
}

process C{
    input:
    path A_folder
    
    output:
    path C_folder
    
    """
    mkdir C_folder
    echo \$(cat $A_folder/2.txt | sed 's/c/3/g') > C_folder/tras.txt
    """
}

process D{
    input:
    path B_folder
    path C_folder
    
    output:
    path D_folder
    
    """
    mkdir D_folder
    echo \$(cat $C_folder/tras.txt)\$(cat $B_folder/glue1.txt) > D_folder/glue2.txt
    """
}

workflow{
    process_A = A(params.input_text)
    process_B = B(process_A)
    process_C = C(process_A)
    process_D = D(process_B, process_C)
}

enter image description here

Summarily, my question is "After writing the code and running the script (nextflow run script.nf -with-dag flow.png) . How to get the flowchart as similar to the first chart as possible?"

Rossy Clair
  • 175
  • 1
  • 6
  • By 'jacks' I think you mean [`channels`](https://www.nextflow.io/docs/latest/channel.html#channels)? The [`input`](https://www.nextflow.io/docs/latest/process.html#inputs) and [`output`](https://www.nextflow.io/docs/latest/process.html#outputs) blocks let you to define the input and output channels of a process, respectively. Could you provide a more concrete example of what you're trying to do? What are your inputs? – Steve May 04 '23 at 04:16
  • @Steve Thank you for your comment. I added the complement code for the post. – Rossy Clair May 04 '23 at 07:38

1 Answers1

1

As of version 22.04.0, Nextflow can do DAG visualisation using the Mermaid renderer. All you need to do is change the output file extension to mmd, for example:

nextflow run main.nf -with-dag flow.mmd

And we can simplify the workflow a bit by using native-execution and to get close to the desired result:

params.input_text = "abc"
process process_A {

    input:
    val text

    output:
    val a, emit: foo
    val b, emit: bar
    val c, emit: baz

    exec:
    (a, b, c) = text.collect()
}
process process_B {

    input:
    val x
    val y

    output:
    val z

    exec:
    z = x + y
}
process process_C {

    input:
    val a

    output:
    val b

    exec:
    b = a.replaceAll('c', '3')
}
process process_D {

    input:
    val one
    val two

    output:
    val three

    exec:
    three = two + one
}
workflow {

    entry_input = Channel.of( params.input_text )

    (output_1, output_2, output_3) = process_A(entry_input)

    (output_4) = process_B( output_1, output_2 )
    (output_5) = process_C( output_3 )

    (final_output) = process_D( output_4, output_5 )

    final_output.view()
}

Results:

$ nextflow run main.nf -with-dag flow.mmd
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [distraught_lorenz] DSL2 - revision: a1f4411ded
executor >  local (4)
[a7/e97d7c] process > process_A (1) [100%] 1 of 1 ✔
[53/317d41] process > process_B (1) [100%] 1 of 1 ✔
[b2/88be6d] process > process_C (1) [100%] 1 of 1 ✔
[39/38f318] process > process_D (1) [100%] 1 of 1 ✔
3ab

$ cat flow.mmd 
flowchart TD
    p0((Channel.of))
    p1[process_A]
    p2[process_B]
    p3[process_C]
    p4[process_D]
    p5([view])
    p6(( ))
    p0 -->|entry_input| p1
    p1 -->|output_1| p2
    p1 -->|output_2| p2
    p1 -->|output_3| p3
    p2 -->|output_4| p4
    p3 -->|output_5| p4
    p4 -->|final_output| p5
    p5 --> p6

We can then produce an image with the Mermaid Live Editor and the 'default' theme:

mermaid diagram

Additional thoughts:

Using parentheses around the channel declarations in the workflow block seems to prevent it from using the output channel names defined in the process blocks. Under the old DSL, the output of process_D (val three) was just shorthand for val three into three. Under DSL2, it appears the output channels still get named the same way but of course we no longer need the into keyword.

Steve
  • 51,466
  • 13
  • 89
  • 103