Questions tagged [samtools]

Samtools is a suite of programs for interacting with high-throughput sequencing data.

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

  1. Samtools Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
  2. Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
  3. HTSlib A C library for reading/writing high-throughput sequencing data Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of htslib so they can be built independently.

Links:

115 questions
2
votes
1 answer

Extracting unmapped reads where both mates are unmapped using samtools?

I'm trying to determine the best way to extract unmapped reads in which both mates in a pair did not map. Currently, it seems that my code is simply extracting all unmapped reads, regardless of their mate. I'm not sure how to go about this, as I'm…
Haley
  • 119
  • 1
  • 3
  • 16
2
votes
1 answer

limit forked samtools processes with gnu parallel within for and while loop including awk

I'm trying to limit a parallelized script. Aim of the script is to get a list within 10 samples/folders, and use the records of the list to perform a samtools command, which is the most demanding part. this is the simple version: for (10 items) do …
crazysantaclaus
  • 613
  • 5
  • 19
2
votes
1 answer

Print for loop input filename on piped awk output

I wish to print $fname on each line of awk the ouput for the following: for fname in $(ls *.bam | cut -c -9 | uniq) do echo $fname samtools depth -a "$fname".bam | awk '{sum+=$3} END {print "Average = ", sum/NR}' >> mean.depth …
GMK
  • 23
  • 4
2
votes
1 answer

Makefile - samtools installation failed

I'm trying to install samtools on openSUSE, I did this: cd htslib-1.2.1 ./configure make install Worked fine. bcftools-1.2 ./configure make install Worked fine. And for samtools: cd samtools-1.2 make install Produces this output: …
2
votes
2 answers

using htsjdk defined classes from jython or groovy

I am trying to access methods provided by htsjdk.jar from here: https://samtools.github.io/htsjdk/ and documented here: https://samtools.github.io/htsjdk/javadoc/htsjdk/index.html using jython. I need methods for accessing / querying BAM file index…
darked89
  • 332
  • 1
  • 2
  • 17
2
votes
2 answers

regex to match words of length specified within string

I am trying to parse the text output from samtools mpileup. I start with a string s = '.$......+2AG.+2AG.+2AGGG' Whenever I have a + followed by an integer n, I would like to select n characters following that integer and replace the whole thing…
vk673
  • 23
  • 3
2
votes
1 answer

pysam module import results in error

struggling to define what is causing this error. Have anaconda installed and used conda install pysam, worked for a bit but suddenly getting this error >>> import pysam Traceback (most recent call last): File "", line 1, in
user3234810
  • 482
  • 2
  • 5
  • 18
1
vote
0 answers

Samtools mpileup : error reading from input file

I am writing a code to analyse yeast sequencing data using R. I was able to do the alignment using BWA and thus obtain a .bam file and its .bai index using samtools. My next step is to perform variant calling using bcftools. Here is my command line…
1
vote
1 answer

MissingOutputException snakemake

I am getting an MissingOutputException from my snakemake workflow, snakemake creates the required output in the desired directory but keeps looking for it and exits. this is my snakefile. rule all: input: …
1
vote
1 answer

Is there any way to pass a bed file or similar with target regions to samtools coverage?

I would like to get the coverage and meandepth of different regions from a bam file. I guess samtools coverage is a good way to do that but I wasn't able to find a way to pass a file with my target regions. Is there any way to do that?
PaulaO
  • 67
  • 3
1
vote
1 answer

How to locate the position of non-missing sequence in the fasta

I have a fasta file including about 1,000 sequences each sequence have head and tail with missing replaced with 'N' like this >CHR1 NNNNNAAAGAGAGAGNNTTTAGAGAGGGACNNNNNN I want to get the start and end position of the target sequence (if there are N…
1
vote
1 answer

Using SAMtools and storing outputs for a large number of files

I've got 500+ files that I need to change from .bam to .sam so am trying to use samtools. I've done some looking on here and found this answer (Changing file paths outputs within a loop, in a shell script) and modified it to fit my…
Megan
  • 15
  • 4
1
vote
1 answer

Find sequencing reads with insertions longer than number

I'm trying to isolate, from a bam file, those sequencing reads that have insertions longer than number (let's say 50bp). I guess I can do that using the cigar but I don't know any easy way to parse it and keep only the reads that I want. This is…
PaulaO
  • 67
  • 3
1
vote
1 answer

How to retain reads on only an certain size in a bam file?

I have certain bam files which have reads of different size. I wish to create another bam file from it which has reads smaller than a certain length N. I should be able to run samtools stats and idxstats like commands on it. Is there a way to do it?
1
vote
2 answers

How to name hundreds of files in increasing order in bash?

I need to download 300 files on the cloud, and name them one by one in increasing order. I can achieve one time by running the following code. The pathname before '>' is the location of the initial files, the pathname after '>' is where I want to…
Fawkes Liu
  • 11
  • 2