This post is part of a series that summarised a workshop we ran recently. We were discussing programming, discussing common pitfalls, and then looked at some fun python tricks.
There are several sections:
List comprehension in Python is a concise and expressive way to create lists by applying an expression to each item in an iterable (such as a list, tuple, or range) and optionally filtering the items based on a condition. It's a powerful feature that can simplify code and make it more readable, reducing the need for explicit loops.
The general syntax of a list comprehension is as follows:
new_list = [expression for item in iterable if condition]
Here's what each part means:
fasta_headers = [">gene1|speciesA",">gene2|speciesB",">gene3|speciesA",]# -------------------------------------------#gene_names = [header.split("|")[0][1:] for header in fasta_headers]print(f'My gene list is : {", ".join(gene_names)}')# This syntax is not particularly faster or better, it might be clearer in SOME situations.# You can get the same result with a familiar loop.gene_names_again = []for header in fasta_headers:gene_names_again.append(header.split("|")[0][1:])print(f'My gene list is : {", ".join(gene_names_again)}')gene_names_filter = [header.split("|")[0][1:] for header in fasta_headers if header.endswith('speciesA')]print(f'My gene list is : {", ".join(gene_names_filter)}')
The output:
My gene list is : gene1, gene2, gene3My gene list is : gene1, gene2, gene3My gene list is : gene1, gene3
Dictionaries can be created in a similar way.
fasta_headers = [">gene1|speciesA",">gene2|speciesB",">gene3|speciesA",]gene_species = {header.split("|")[0][1:]: header.split("|")[1] for header in fasta_headers}print(gene_species)
The output:
{'gene1': 'speciesA', 'gene2': 'speciesB', 'gene3': 'speciesA'}
It can get pretty intense. In this one line, we are merging two lists into a single dictionary.
codons = ["ATG", "GCT", "TAA", "CAG"]amino_acids = ["Methionine", "Alanine", "STOP", "Glutamine"]codon_to_aa = {codon: aa for codon, aa in zip(codons, amino_acids)}print(codon_to_aa)
The output:
{'ATG': 'Methionine', 'GCT': 'Alanine', 'TAA': 'STOP', 'CAG': 'Glutamine'}
There's a lot of old code that doesn't use this preferred and cleaner syntax. Using enumerate
is usually safer, as you don't have this count variable floating around - too often you can forget to increment it.
dna_sequences = ["ATCGAAGCT", "GTTAGTCC", "AGCGTAAGGT", "GATC"]# ------------for index, sequence in enumerate(dna_sequences, start=1):gc_content = (sequence.count("G") + sequence.count("C")) / len(sequence) * 100print(f"Sequence {index}: GC content = {round(gc_content,2)}%")print('\nAgain with a counter')count = 1for sequence in dna_sequences:gc_content = (sequence.count("G") + sequence.count("C")) / len(sequence) * 100print(f"Sequence {count}: GC content = {round(gc_content,2)}%")count += 1
You use a lot of dictionaries in python. You often want to fetch both the key and value at the same time. Let's take our fasta header example from before.
fasta_headers = [">gene1|speciesA",">gene2|speciesB",">gene3|speciesA",]gene_species = {header.split("|")[0][1:]: header.split("|")[1] for header in fasta_headers}for gene in gene_species: # This is OKprint(f'My gene: {gene}; My species: {gene_species[gene]}')print('\nAgain with items')for gene, species in gene_species.items(): # This is BETTERprint(f'My gene: {gene}; My species: {species}')
The logging module in Python provides a flexible framework for emitting log messages from applications. It supports various log levels and destinations.
import logging# Configure the logging settingslogging.basicConfig(level=logging.INFO,format="%(asctime)s - %(levelname)s - %(message)s",datefmt="%Y-%m-%d %H:%M:%S")def find_motif(sequence, motif):positions = []for i in range(len(sequence) - len(motif) + 1):if sequence[i:i+len(motif)] == motif:positions.append(i)logging.info(f"Motif found at position {i}")return positions# Example sequence and motifsequence = "ATCGAAGCTGTTAGTCCAGCGTAAGGTGATC"motif = "GTTA"# Find the motif in the sequencemotif_positions = find_motif(sequence, motif)if motif_positions:logging.info(f"Motif '{motif}' found at positions: {', '.join(map(str, motif_positions))}")else:logging.warning(f"Motif '{motif}' not found in the sequence")
Questions or comments? @ me on Mastodon @happykhan@mstdn.science or Twitter @happy_khan
The banner image is an AI generated picture (Midjourney) with prompt; 'computer programming in the style of Girl with a Pearl Earring by Johannes Vermeer'. You can share and adapt this image following a CC BY-SA 4.0 licence.