0

I have a question where I don´t know how to start this. We did a scRNA sequencing experiment and I now have an AnnData dataset. I already know a lot about this dataset mainly by using scanpy library and I would like to "finalise" the analysis by extracting genes that have a specific RNA sequence in the 3'UTR. Unfortunately I have no idea how to approach this since I am no bioinformatician and couldn´t find a tutorial to do this. Can someone please help me with this problem?

Here how it looks like. enter image description here

enter image description here

enter image description here

Thanks in advance!

Greenline
  • 13
  • 5
  • Hi Greenline, please include at least a minimal reproducible example, so we know where we can at least start. Right now, we dont even know what your dataset looks like and information you provided are not enough to help you. – Bijay Regmi Jul 07 '22 at 16:10
  • I added some pictures to explain how it looks like. Does this help? – Greenline Jul 08 '22 at 13:17
  • Hey @Greenline, scanpy doesn't hold information regarding the gene's sequences. what you would need is another tool that given the transcriptome and the sequences returns the gene names or gene ids. You could try blasting for these genes or do a search on the fasta file – YotamW Constantini Aug 24 '22 at 16:12
  • 1
    Hey @YotamW Constantini, yes currently I am trying to get the binding sequences of available datasets on biomart over the `fiveUTRsByTranscript` function from the Biostrings library. Therefor I currently switched to R to do this and hopefully get a list with names of genes at some point that matches the binding sequence. – Greenline Aug 30 '22 at 13:17

1 Answers1

0

I just found a way around in R. It is on the bases on BSgenome.Mmusculus.UCSC.mm10. You can find the answer here. Just read in the result in python as a list and us the function sc.tl.score_genes to see where they are enriched.

Greenline
  • 13
  • 5