Software Projects

During my years of research and study, I’ve had the good fortune of contributing to the following projects:

Current projects

Mauve – multiple genome alignment
Mauve is a software tool to compute whole genome multiple alignments among bacteria and small eukaryotic genomes (usually no bigger than Drosophila). The software includes a Java-based visualization module and a set of alignment programs written in C++, available for Linux, Mac, and Windows.

libhmsbeagle – a library for phylogenetic likelihood calculation
libhmsbeagle, also known as the beagle library, is a software development library for calculating phylogenetic likelihoods using a variety of compute hardware types. Currently supported hardware types and features include graphics processing units (GPUs) via CUDA and OpenCL, standard CPUs using SSE for fine-scale parallelism and OpenMP for coarse parallelism, and others. The library works on Linux, Mac, and Windows.

Repeatoire – alignment of interspersed genomic repeats
Software to construct multiple sequence alignments of interspersed repeats directly from raw genomic sequence. This project is led by Dr. Todd Treangen at the University of Paris.

Past projects

mpiBLAST – open-source parallel BLAST

mpiBLAST is a parallelization of the popular NCBI BLAST for MPI-based compute clusters. When searching large databases, it can yield super-linear speedups. It is extremely flexible, accommodating cluster architectures with and without shared storage and parallel filesystems. It integrates well with most job scheduling systems and has also been extended to grid architectures.

GenoPlast – Bayesian inference of genomic plasticity
Given a Mauve genome alignment, GenoPlast uses a statistical model to infer the baseline rates of gene gain and loss among a group of organisms, along with lineage-specific changes to those rates. Gain and loss are modeled independently. Thus it is possible to detect, for example, a lineage-specific accelerated loss with a constant rate of acquisition that may be characteristic of a recent lifestyle change in bacteria. This project is led by Dr. Xavier Didelot at the University of Warwick.

barphlye – Bayesian rearrangement phylogeny in Yersinia
barphlye supplements the BADGER software to analyze patterns present in ancient genome arrangements. A modified version of BADGER samples reconstructions of inversion phylogeny among a set of related organisms, and barphlye can then detect bias in the reconstructed ancestral genome arrangements. Using barphlye, I was able to discover rearrangement hotspots near the origin of replication in bacterial chromosomes. I was also able to confirm the bias towards “symmetric” inversion in circular bacterial chromosomes. Although Yersinia is in the title, the software can be applied to any bacteria with circular chromosomes, and can even operate on individual linear chromosomes.

Seevolution – a time machine for evolution

Seevolution is an interactive viewer for mutations occurring during genome evolution. The program is written in Java and uses Java3D. This project was developed by Mr. Andres Esteban-Marcos, who was a student I supervised at The University of Queensland.

ZORRO – probabilistic masking for phylogenetics

Despite over 30 years of research, accurate multiple sequence alignment remains a challenge. The number of possible alignments is astronomical and for any given optimality criterion, there are often numerous optimal or nearly optimal alignments. Moreover, alignments are merely a nuisance parameter in analysis of sequence evolution and its constraints. ZORRO attempts to quantify the uncertainty inherent in a given multiple sequence alignment and use knowledge of that uncertainty to improve downstream tasks such as phylogenetic inference. This project is led by Dr. Sourav Chatterji at the University of Davis, California.

GRIL – genome inversion and rearrangement locator
A simplistic tool to detect genome rearrangements in single-copy genomic regions among two or more organisms.

ASAP and the ERIC BRC
ASAP is A Systematic Annotation Package for microbial genomes which is part of the larger Enteropathogen Resource Integration Center project. Together, ERIC and ASAP provide a centralized, web based means to annotate the genomes of enteropathogens and serve as a clearinghouse for all types of annotation and experimental data. ASAP supports a wealth of automated annotation strategies, and its evolution has been described in a series of Nucleic Acids Research papers. ASAP is led by Associate Professor Nicole T. Perna and continues to grow at the Genome Evolution Laboratory since my departure.

libClustalW – a C library for Clustal-W 1.83
The 1.83 release of Clustal-W has been refactored as a C library. It builds in Visual Studio on Windows and on Linux, BSD, and other unices with gcc and automake. Note that the Clustal-W authors have since made a 2.0 release which represents a complete rewrite of the aligner and so this library may soon be obsolete.

Extensions to DualBrothers 1.1
Although I was not involved in the original DualBrothers project, I extended the software to apply it to whole-genome alignments and cases of arbitrary recombination among multiple species. Because DualBrothers is closed-source and the authors did not give me permission to publicly distribute my modifications, you will have to e-mail me for details. (and that’s the last time I will contribute to a closed-source project!). Fortunately, cBrother by Karin Dorman’s group also implements similar extensions and actually has publicly available source code.

libGenome – a C++ development library
libGenome is an open-source C++ library for reading and writing genome sequence data from common file formats, and also provides functions for basic manipulation of genome sequences. It was designed from the ground-up for speed and efficiency. It is available on sourceforge and also as part of the debian linux distribution.