Unix tricks for bioinformagicians

I use these Unix one-liners and such all the time in my bioinformatic work. I hope you’ll find then useful as well! Reverse complement all sequences in a fasta file. paste -d "\n" \ <(grep ">" sequences.fasta) \ <(grep -v ">" sequences.fasta | tr ATGC TACG | rev) \ > reverse_complemented_sequences.fasta Prepare a stability file from paired end reads. paste \ <(ls *R1*.fastq | awk -F"_" '{print $1}') \ <(ls *R1*. [Read More]

Introducing Awk

In this entry I’m going to explain the very basics of a little programming language called Awk. The primary reason why I’m advocating the use of Awk (and not for instance Perl or Python) is that Awk is absolutely great for writing very short programs, or one-liners. In fact, in my opinion Awk should never be used for anything longer than one-liners - if your task appears to demand a longer piece of code, it will probably be wiser to choose Perl or Python instead. [Read More]

Pipes, process substitution and why should a biologist ever care

This is the first entry in my series about how to keep DNA sequence processing as simple as possible. Each entry attempts to teach a useful Unix trick or two, focusing on relevance for biologists who might not have much prior Unix experience. Because of the very clever design choices made by the folks behind Unix, every trick can be combined with everything else in ways that lead to a much greater flexibility and expressiveness than any of the tricks alone - I attempt to demonstrate that in the following posts. [Read More]