I would greatly appreciate a little pointer for the following. I have a TSV samples table:
Sample Unit Tumor_or_Normal Fastq1 Fastq2
A 1 T reads/a.t.1.fastq reads/a.t.2.fastq
A 2 N reads/a.n.1.fastq reads/a.n.2.fastq
B 1 T reads/b.t1.1.fastq reads/b.t1.2.fastq
...
and is read in
samples = pd.read_table(config["samples"], dtype=str).set_index(["Sample", "Unit", "Tumor_or_Normal"], drop=False)
samples.index = samples.index.set_levels([i.astype(str) for i in samples.index.levels])
I would like to merge all bam files that have the same Sample and Tumor_or_Normal. For example, C-1-T.bam and C-2-T.bam and C-3-T.bam should be merged into C-T.bam. I have a rule
rule merge_recal_by_unit:
input:
expand("recal/{{Sample}}-{Unit}-{{Tumor_or_Normal}}.bam",
Unit=samples.loc[samples.Sample].Unit)
output:
bam=protected("merged/{Sample}-{Tumor_or_Normal}.bam")
params:
""
threads:
8
wrapper:
"0.39.0/bio/samtools/merge"
but this gave an InputFunctionException. I've also tried replacing the expand with
lamblda wildcards: expand("recal/{{Sample}}-{Unit}-{{Tumor_or_Normal}}.bam",
Unit=samples.loc[wildcards.Sample].Unit)
but this gave me a syntax error, and
expand("recal/{{Sample}}-{Unit}-{{Tumor_or_Normal}}.bam",
Unit=samples.index.get_level_values('Unit').unique().values())
resulted in the message that numpy.ndarray object is not callable. This seems similar to this and this question, but I wasn't able to make it work.
Any help here would be greatly appreciated. Many thanks!