-->

Archive for the ‘Uncategorized’ Category

biotorrents is born

Thursday, November 12th, 2009

I just posted my first biotorrent to biotorrents.net the other day.  What is biotorrents, you ask?  As the name suggests, it’s a BitTorrent tracker site for tracking biological datasets.  cool, huh?

Last year when I was working at a research institute in australia, I found that whenever I needed to download a new version of the NCBI databases I would have to get them from NCBI’s ftp site in the USA.  That may work well enough for people located stateside, but the transpacific pipes do not seem to treat blast databases as priority electrons (photons, whatev).  Usually after a few days and several dropped TCP connections I would finally have the whole enchilada.  Not pleasant, but with a little persistence it was possible.

Enter biotorrents.  Now I can fire up my favorite bittorrent client and download the NCBI database from any of a number of globally distributed seeders, which should work much faster.  That’s the dream anyway.  In reality biotorrents is in early days and we need people to contribute bandwidth to the effort by seeding things like the NCBI databases.  Ideally, our great leaders at NCBI, EBI, and elsewhere would take the initiative and contribute by seeding their own databases.  David Lipman and Ewan Birney if you’re listening, consider this a public challenge!!

New Mauve release

Thursday, November 12th, 2009

I’m happy to say that a new Mauve release with many bugfixes has been made official today.  Developing and maintaining Mauve has been a challenge for me over the years, and each release requires a seemingly tremendous amount of effort.  Mauve has somewhere between hundreds and thousands of active users, each of whom seems to be running a different version of their favorite operating system and Java virtual machine.  Every time we need to do a new Mauve release, we have to struggle to ensure that we haven’t broken functionality on the myriad of software configurations.  Currently this is done using a slew of virtual machines running different operating systems, but the process is hardly automated.  Clearly one goal for the future of Mauve would be to do more automated quality testing of the software.  The automated testing might detect some types of problems really well, but others, such as the recent problem with OpenJDK drawing the display incorrectly, seem like they would be nearly impossible to detect programmatically.  Nonetheless, if automated software testing can help find a problem before an unsuspecting user trips over it, surely it’s worth the effort.  no?

The PhyloCoding has begun!

Saturday, June 13th, 2009

Every year for the past several years, Google has operated a charitable program called Google Summer of Code to support development of open source software.  Organizations that develop medium to large open-source projects apply to Google for support.  Accepted organization create an “ideas list” of projects that would enhance their open source software.  Students from around the globe then apply for projects with the accepted organizations.  Successful students are paid by Google to work on the open-source project for the summer. Competition among organizations and students is stiff, with only 1000 of 5000+ students being accepted.

This year I’m co-mentoring a project with Marc Suchard that aims to develop a small and reusable open source software library to calculate phylogenetic likelihoods using CPUs and GPUs.  The project is part of this year’s Phyloinformatics Summer of Code which is being operated by the National Evolutionary Synthesis Center (NESCent) through Google’s program.  Lots of students applied for the project, perhaps because GPU computing is trendy and computer geeks tend to be some of the trendiest people around (even if not always socially graceful!).  Nonetheless there were many strong applicants and in the end, the successful student was Daniel Ayres, a Ph.D. student at UMD.

The project is now well underway, with all sorts of development activity by mentors, mentee, and other folks interested in the notion of a resuable library for phylogenetic likelihood models.

Thanks to Google’s charitable arm for supporting so many students and projects!

Hypocrisy inside open access journals

Sunday, February 1st, 2009

Update 2: Peter Binfield writes in the comments below that PLoS One has begun accepting LaTeX.  Hooray!

Update: someone pointed me to the Topaz project, which looks promising!

I am currently preparing an article for submission to an open access journal (PLoS One, to be specific).  I have just learned that PLoS One, like many other journals, requires all articles to be submitted in either .doc or .rtf format. But why do I care?  My article was originally written in the open-source LaTeX system and intended as a conference contribution.  The article deals heavily in math and statistics and makes use of LaTeX’s excellent equation typesetting abilities.  As far as I can tell, it’s no simple matter to convert a LaTeX document with equations to M$ Word format.

How can it be that the leaders of the open-access journal movement require submissions in a closed and proprietary format?  Didn’t the open-access journal movement draw at least some of its inspiration from the free software movement that predated it by at least 10 years?  I presume the answer to this question lies at least partially with the proprietary nature of publishing and typesetting systems in common use at publishing houses.  The good people at PLoS probably made a decision to purchase existing proprietary publishing software for their operation rather than investing in an alternative that supports open standards.  And sadly, they now probably view change as too expensive.

To their credit, the topical PLoS journals do accept papers written with open-source software such as LaTeX, but that policy has only been in place recently.  The editorial office converts LaTeX submissions on a case-by-case basis.  Last year I published a paper authored in LaTeX in PLoS Genetics.  While I was very happy that I didn’t have to do the conversion myself, I think that the PLoS approach (and that of other journals) essentially amounts to applying band-aids to a broken publishing system.  It is not a good long term solution.

We need a scientific publishing system that is founded on open document standards and open source software.  Viable alternatives such as OpenOffice exist, yet I can not rely on OpenOffice to save complex equations in Microsoft Word documents (it works fine in the native OpenOffice format).  PLoS should lead the way in revolutionizing scientific publishing, and they should start on the inside by developing a publication process based on open standards.  After five years of PLoS, why are we still without a viable open-source platform for scientific publishing?

In the meantime, I have to carefully consider whether it’s a more effective use of my time to painstakingly convert my document to Word and support the status quo, or whether I should instead spend that time adding content that would make my article appropriate for a journal that will accept LaTeX.  Reformatting documents is mind-numbing, while submitting elsewhere might actually involve some interesting work.

MSI Wind U100: First impressions

Friday, November 7th, 2008

For the past three years I’ve been carrying around a Dell X1 laptop (originally designed by Samsung as the Q30).  In the past months it’s begun to show signs of old age, the batteries no longer hold much charge, I’ve maxed the HDD and don’t want to offload more data, the screen is fading, and there’s a large divot in the trackpad’s left mouse button where my thumbnail hits. hehehe.

It was time to get a new laptop and I decided to find out why people are making so much noise about netbooks.  After poking around a few reviews, I narrowed down my candidates to the 10″ Asus eeePC, the MSI Wind U100, and the Acer Aspire One.  I finally ended with the MSI Wind based on its 160gb hard drive, the possibility to buy a 9 cell battery which will last 6+ hours, claims of solid build and a respectable keyboard, and supposedly little heat and little noise.  The major downside of the Wind is that it only ships with Windoze, so yet again I was stuck paying MS tax.

The laptop arrived in the mail this week,  and my first impression is that the reviews were generally spot on, except when it comes to noise.  The MSI Wind has a fan and a 2.5 inch spinning-hunk-o-metal hard drive inside, both of which can make a raucus if you’re sitting in a seminar room.  To put this in perspective, I’ve been using a Dell X1 for the past three years which has NO fan and uses a nearly silent 1.8″ hard drive.  Of course the problem with the X1’s lack of a fan is that it can get quite toasty even when doing basic computing like web browsing.  Why oh why did Transmeta have to die?

A new life, a new laptop

Friday, November 7th, 2008

I’ve just moved to Davis, California, where I’m working in the laboratory of Jonathan Eisen.  It’s a continuation of the very generous 3-year postdoctoral fellowship awarded to me by the National Science Foundation.  I spent the first two years at an unspecified research institute in Brisbane, Australia.

I was hoping my first post would be a rant about the scientific tradition of manuscript review by secret committee, complete with a personal example, but that will have to wait.  In the meantime I’ve just got a new laptop and I’m having loads of fun (err, problems?) configuring it that I feel I should share with the world.