Project Overview

Cruncher Pete · May 22, 2009, 12:37:59 PM

Project Summary]
Over the past few years, we have invited faculty, postdocs, graduate students, and others at the University of Maryland to use The Lattice Project for their research projects.

Working with various researchers, helping them organize and submit their jobs, and listening to their feedback has constantly helped us improve the system and has shown us where more work is needed. Taken together as a whole, the body of projects we have supported is extremely diverse. The Lattice Project is cited in a number of publications that have come out of these studies. Here we provide general information about the types of analyses that have been run on Lattice, the applications used in those analyses, and specific projects. For more information, visit our main research page.

Phylogenetic Analysis - GARLI
Protein Sequence Comparison - HMMPfam
Conservation Reserve Network Design - MARXAN

Phylogenetic Analysis - GARLI
The Cummings Laboratory and others are using GARLI to infer phylogenetic trees from nucleotide or amino acid data. Various nucleotide, codon and amino acid models are implemented for maximum likelihood (ML) estimates. Multiple searches for the ML tree as well as the calculation of bootstrap support values are parallelized by the Lattice project at the level of individual heuristic searches, i.e., every computing node has to carry out at least one complete heuristic search. This parallelization is particularly useful for large quantities of relatively short calculations, as is typical for nucleotide model bootstrap analyses with large numbers of repetitions.
The LepTree project (www.leptree.net) investigates evolutionary relationships within the insect order Lepidoptera (moths and butterflies), in particular of higher taxa, such as families, superfamilies and infra-orders. This molecular "backbone phylogeny" is based on the analysis of up to 26 protein coding nuclear genes (~19kb) for currently 123 taxa, but work on a matrix for 550 to 600 taxa is well underway. The chief method of analysis used in this study is a nucleotide model ML search in GARLI. The most commonly applied model is the general time reversible model with a gamma distribution of rates and a proportion of invariant sites (GTR+G+I). The LepTree project relies heavily on the computational resources provided by the Lattice project, as the sheer number of heuristic searches is not feasible to run on an individual desktop machine. The bulk of these heuristic searches consists of bootstrap replicates (up to 2,000 per analysis), but in addition, due to the heuristic nature of the search, multiple searches (up to 500) are required for confidence in having found the ML tree. For the LepTree project, many analyses of these types are carried out, e.g., for individual and combined genes, synonymous and non-synonymous data partitions, and with and without topological constraints for subsequent hypothesis testing.

Protein Sequence Comparison - HMMPfam
hmmpfam is part of the HMMER package. The HMMER package uses profile hidden Markov models (HMMs) to characterize regions of similar amino-acid sequence in protein families, groups of proteins with similar function found in related organisms. The hmmpfam program searches the protein sequences of proteins with unknown function against a carefully curated set of HMM models, called Pfam, from well-understood protein families. Protein sequences are assigned to one or more protein families on the basis of a statistically significant match to a Pfam HMM.

HMMPfam and RMIDb:

The Edwards lab provides the Rapid Microorganism Identification Database (RMIDb - www.RMIDb.org), a freely available web-resource and database for the identification of bacteria and viruses using mass spectrometry. The RMIDb searches protein sequences from all of the major protein sequence repositories, plus computational protein sequence predictions from sequenced bacterial genomes, for mass matches with experimental masses from mass spectra. Protein sequences are carefully categorized according to strain, species, and other taxonomic groupings, and according to protein function, cellular location, and biological process using the Pfam assignments computed by hmmpfam and their associated gene ontology (GO) classifications. The functional classification of protein sequences must be recomputed using hmmpfam because each of the sources of protein sequence uses different, sometimes conflicting, criteria for Pfam assignment, or provides no assignment at all. Functional classification of protein sequences makes it possible to analyze only the most likely to be observed proteins for mass matches, which decreases search time and increases the statistical significance of species identifications.

HMMPfam for RMIDb on BOINC:

The Edwards laboratory is using the HMMPfam service to compute Pfam assignments for all bacterial, plasmid, and virus protein sequences from Swiss-Prot, TrEMBL, GenBank, RefSeq, and TIGR's CMR, plus an inclusive set of all plausible Glimmer predictions from RefSeq bacterial genomes. These protein sequences, and their Pfam assignments, are used in RMIDb. The HMMPfam service is also being used as a model for 'data-heavy' bioinformatics applications on the Lattice Grid infrastructure, a collaboration between the Cummings and Edwards laboratories.

Conservation Reserve Network Design - MARXAN
MARXAN is a decision support system for the design of conservation reserve networks. It is useful for selecting a reserve system from a large number of potential sites that satisfies a number of ecological, social and economic criteria. For example, certain species or conservation features must be well protected within the reserve system, or the reserve system must not include more than a specified number of sites. The user translates their criteria into representation targets for the conservation features to be protected (i.e. number of populations of each species or percentage of each habitat type to be included in the reserve system), and optionally a cost threshold or desired level of site compactness. MARXAN will produce reserve network solutions that meet these design constraints while simultaneously minimizing the cost of the design (i.e. number of sites required to meet all representation targets).

Applications
The following platforms are supported:

Microsoft Windows x86 (32bit)
Linux x86 (32bit)
Mac OS X

Connecting to The Lattice Project
The project's Home Page is located at:http://boinc.umiacs.umd.edu/
The project is also listed in the various BOINC Account Managers and you can join directly through them.
Don't forget to join BOINC@Australia Team following your registration.

Statistics
View our Team Members List and their current score here
View detailed BOINStats of our Team in The Lattice Project here

News:

Project Overview

Cruncher Pete