0

Trying to use expand and zip properly to expand two lists for defining the output rule is leading to new wildcards that I didn't think would be defined

BASES = ['wt_base', 'wt', 'wt_base_ars', 'F210i_base', 'no_4su']
CONTRASTS = ['f210i_base', 'f210i', 'wt_during_ars', 'F210i_during_ars', '24hr_4su']

rule top:
    input:
        expand(config['majiq_top_level'] + "delta_psi/" + "{base}_{contrast}" + ".tsv",zip, base = BASES,contrast = CONTRASTS)

Fails with

Building DAG of jobs...
InputFunctionException in line 81 of /SAN/vyplab/alb_projects/pipelines/splicing/rules/majiq.smk:
UnboundLocalError: local variable 'grps' referenced before assignment
Wildcards:
base=wt_base_f210i
contrast=base

I have already tried to remove the "_" both from from the rule and from the names with the same error

BASES=['wtbase', 'wt', 'wtbasears', 'F210ibase', 'no4su']
CONTRASTS=['f210ibase', 'f210i', 'wtduringars', 'F210iduringars', '24hr4su']

rule top:
    input:
        expand(config['majiq_top_level'] + "delta_psi/" + "{base}{contrast}" + ".tsv",zip, base = BASES,contrast = CONTRASTS)

InputFunctionException in line 82 of /SAN/vyplab/alb_projects/pipelines/splicing/rules/majiq.smk:
UnboundLocalError: local variable 'grps' referenced before assignment
Wildcards:
base=wtbasef210ibas
contrast=e

The error is due to a function later on, but that function would not fail if the input wildcards were in the provided lists of BASES or CONTRASTS.

Instead we have a combination of two values from each list as the 'base' wildcards and I don't even know where the contrast=base is coming from

I'm thinking the use of "_" in my list names might be the confusing part but I'm not sure?

Al Bro
  • 383
  • 5
  • 15
  • Yes the use of `_` in you wildcards and the output file makes it unclear for snakemake. You can see that with the wildcard `base` in your output which is `wt_base_f210i` instead of `wt_base`. You could use a hyphen to resolve it easily i.e change `{base}_{contrast}` to `{base}-{contrast}` – pd321 Sep 03 '19 at 07:48
  • HI, I have tried this, updated the post to reflect – Al Bro Sep 03 '19 at 08:02
  • You still need something between the two wildcards `base` and `contrast`. Can you try with an hypen or underscore(now that you dont have it in your list items). – pd321 Sep 03 '19 at 08:20
  • Using wildcard constraints might help with this kind of issue. – bli Sep 04 '19 at 07:50
  • HI @bli what do you mean by "using wildcard constraints" this is ambiguous to me? – Al Bro Sep 05 '19 at 15:45
  • 1
    In the snakemake documentation about rules, the section about wildcards explains that wildcards can be constrained to match certain patterns: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards If you know in advance the possible values for a given wildcard, as is your case for instance with `base` (that must be taken among `BASES`), you can build the corresponding constraint pattern. In the `wildcard_constraints` section of a rule (or the global one) this would appear as follows: `base="|".join(BASES)`. – bli Sep 06 '19 at 08:01

1 Answers1

0

This does not totally make sense to me why this was the case, but the solution seemed to be fixing input function in addition to removeing the ambigious hyphens in the list values:

Previously I was using a kind of lazy hack where a variable grp would always get defined in this loop if the group was present in my original lists

    #go through the values of the dictionary and break when we find the right groups in that contrast
    for v in compare_dict.values():
        for k in v.keys():
            if k == grp:

                grps = v[k]
                break
    #take the sample names corresponding to those groups
    grp_samples = list(samples2[samples2['group'].isin(grps)].sample_name)

Simply adding an else condition and defining grps to be an empty list otherwise seemed to have resolved the problem.

    #go through the values of the dictionary and break when we find the right groups in that contrast
    for v in compare_dict.values():
        for k in v.keys():
            if k == grp:

                grps = v[k]
                break
            else:
                grps = list('')
    #take the sample names corresponding to those groups
    grp_samples = list(samples2[samples2['group'].isin(grps)].sample_name)

Seems to have resolved the bug. I have no idea why that should be the case but okay.

Al Bro
  • 383
  • 5
  • 15
  • The `UnboundLocalError: local variable 'grps' referenced before assignment` error message suggested that `k == grp` never happened. – bli Sep 04 '19 at 07:54
  • It looks to me that your inner loop could be simplified as follows: `grps = v.get(grp, [])`. The `get` method of a dict takes a key, and an optional default value, for when the key is not found. – bli Sep 04 '19 at 07:58
  • Thank you for the suggestion about v.get, I didn't know the get method, helpful. Yeah I could see that the k == grp never happened, but the way snakemake works (in my head) should ensure that nothing feed to that function would cause that failure, apparently not the case after all – Al Bro Sep 05 '19 at 15:44
  • Well, this is not really a snakemake issue. This happens at Python level while the code is run: in your first version, when `k == grp` never happens, then `grps` is not defined after the loops, at the point where it is used (`samples2['group'].isin(grps)`), so Python complains. – bli Sep 06 '19 at 08:09