multiFASTA file processing

I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe...

How can I display an image from a file in Jupyter Notebook?

I would like to use an IPython notebook as a way to interactively analyze some genome charts I am making with Biopython's GenomeDiagram module. While there is extensive documentation on how to use...

How to find a open reading frame in Python

I am using Python and a regular expression to find an ORF (open reading frame). Find a sub-string a string that is composed ONLY of the letters ATGC (no spaces or new lines) that: Starts with ATG,...

command 'gcc' failed with exit status 1

I tried all answers but I can't solve the problem of installing Biopython package I installed Mingw , but when I try to install the package : python setup.py install I get the following error: ...

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the...

how to extend ambiguous dna sequence

Let's say you have a DNA sequence like this : AATCRVTAA where R and V are ambiguous values of DNA nucleotides, where R represents either A or G and V represents A, C or G. Is there a Biopython...

python remove special column from multiple sequence alignment

I have an AlignIO object: a,b,c,d,e are ids for each record ------------mfelaeySGLL---TLFL-IASFPIFT-SPIG--- a ------------mfelsgyAVLLFFMVIFL-VASFPLLS-SPIG---...

how the multiple pdbs can be written in single pdb file using biopython libraries

I wonder how the multiple pdbs can be written in single pdb file using biopython libraries. For reading multiple pdbs such as NMR structure, there is content in documentation but for writing, I do...

In python, how can I change the font size of leaf nodes when generating phylogenetic trees using Bio.Phylo.draw()?

I am using the Phylo package from Biopython to create phylogenetic trees. For big trees, I need to decrease the fontsize of the leaf nodes. It has been suggested to change...

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have...

reporting the best alignment with pairwise2

I have a fastq file of reads, say "reads.fastq". I want to align the sequnces to a string saved as a fasta file ref.faa. I am using the following code for this reads_array = [] for x in...

Convert Bio.Entrez class to string

I am currently using BioPython to query PubMed records from PMID's. I then store the desired information in a variable called abstract with a data type of: class...

How to convert a set of DNA sequences into protein sequences using python programming?

I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in...

Intramolecular protein residue contact map using biopython, KeyError: 'CA'

I am trying to identify amino acid residues in contact in the 3D protein structure. I am new to BioPython but found this helpful website...

pandas read csv ignore newline

i have a dataset (for compbio people out there, it's a FASTA) that is littered with newlines, that don't act as a delimiter of the data. Is there a way for pandas to ignore newlines when...

Where does pip install packages from?

I need to download the Biopython package using pip. I ran pip install biopython but got the following error: I verified that the problem is because the network blocks most sites by trying to...

ModuleNotFoundError: No module named 'docopt'

I have installed docopt by typing pip3 install docopt and now it is well installed: You can see it on the list umr5558-c02gl0y6drjm:Concatenate etudiant$ pip3 list Package ...

ModuleNotFoundError in Spyder

I tried to import the biopython package in Spyder and got the error message: ModuleNotFoundError: No module named 'biopython' although biopython is installed. I also checked the PYTHONPATH:...

NameError: name 'PROTOCOL_TLS' is not defined

I am trying to import Biopython modules on my Mac terminal but its throwing following error. It will be very helpful if someone could help me fix this issue. >>> from Bio import SeqIO Traceback...

Consensus dendrogram using scipy

I construct five different dendrograms using the scipy.cluster.hierarchy library (the dendrogram and linkage specifically) and now I need to do a consensus dendrogram based on these five...

Trim sequences based on alignment

I'm trying to edit an MSA (Multiple Sequence Alignment) file generated by ClustalW, to trim sequences before the consensus one, using BioPython. xxx refers to other bases not relevant here Here's...

Rosalind doesn't accept "Variables and Some Arithmetic" task

Link for the problem http://rosalind.info/problems/ini2/ Given: Two positive integers a and b, each less than 1000. Return: The integer corresponding to the square of the hypotenuse of the...

Python: Program for sequence logo generation using pyseqlogo

I'm trying to use this program to plot sequence logos in python using pyseqlogo. The program opens the file input from the user, creates a matrix 4 by seqLength matrix and inputs whatever number...

Biopython: export the protein fragment from PDB to a FASTA file

I am writing the PDB protein sequence fragment to fasta format as below. from Bio.SeqIO import PdbIO, FastaIO def get_fasta(pdb_file, fasta_file, transfer_ids=None): fasta_writer =...

How to interpret conda package conflicts?

I am attempting to create a conda environment with 3 packages and a specific python version and get the following output: $ conda create -n testing_junk -y instrain awscli samtools...

Remove duplicated sequences in FASTA with Python

I apologize if the question has been asked before, but I have been searching for days and could not find a solution in Python. I have a large fasta file, containing headers and...

Parsing a genbank file format with biopython's SeqIO

I'm trying to parse a protein genbank file format, Here's an example file (example.protein.gpff) LOCUS NP_001346895 208 aa linear PRI 20-JAN-2018 DEFINITION ...

Split a multifasta file to files with the same number of accesion numbers

I have a file that has thousands of accession numbers: and looks like this.. >NC_033829.1 Kallithea virus isolate DrosEU46_Kharkiv_2014, complete...

How do I find all Sequence Lengths in a FASTA Dataset without using the Biopython

Let's say we have a FASTA file like this: >header1 ASDTWQEREWQDDSFADFASDASDQWEFQW >header2 ASFECAERVA >header3 ACTGQSDFWGRWTFSH and this is my desired output: header1 30 header2 10 header3...

Dictionary comprehension with multiple values for each key

Im doing a course in bioinformatics. We were supposed to create a function that takes a list of strings like this: Motifs =[ "AACGTA", "CCCGTT", "CACCTT", "GGATTA", ...