About bpteague

Associate professor of Biology at the University of Wisconsin -- Stout. Teaching cellular and molecular biology, biotechnology, bioinformatics and bioengineering. Research interests include autonomous pattern formation and metabolic engineering.

Announcing: CytoFlow 0.1!

Python tools for quantitative, reproducible flow cytometry analysis

Welcome to a different style of flow cytometry analysis. For a quick demo, check out an example IPython notebook.

What’s wrong with other packages?

Packages such as FACSDiva and FlowJo are focused on primarily on identifying and counting subpopulations of cells in a multi-channel flow cytometry experiment. While this is important for many different applications, it reflects flow cytometry’s origins in separating mixtures of cells based on differential staining of their cell surface markers.

Cytometers can also be used to measure internal cell state, frequently as reported by fluorescent proteins such as GFP. In this context, they function in a manner similar to a high-powered plate-reader: instead of reporting the sum fluorescence of a population of cells, the cytometer shows you the distribution of the cells’ fluorescence. Thinking in terms of distributions, and how those distributions change as you vary an experimental variable, is something existing packages don’t handle gracefully.

What’s different about CytoFlow?

A few things.

An emphasis on metadata. CytoFlow assumes that you are measuring fluorescence on several samples that were treated differently: either they were collected at different times, treated with varying levels of inducers, etc. You specify the conditions for each sample up front, then use those conditions to facet the analysis.

Cytometry analysis conceptualized as a workflow. Raw cytometry data is usually not terribly useful: you may gate out cellular debris and aggregates (using FSC and SSC channels), then compensate for channel bleed-through, and finally select only transfected cells before actually looking at the parameters you’re interested in experimentally. CytoFlow implements a workflow paradigm, where operations are applied sequentially; a workflow can be saved and re-used, or shared with your coworkers.

Easy to use. Sane defaults; good documentation; focused on doing one thing and doing it well.

Good visualization. I don’t know about you, but I’m getting really tired of FACSDiva plots.

Versatile. Built on Python, with a well-defined library of operations and visualizations that are well separated from the user interface. Need an analysis that CytoFlow doesn’t have? Export your workflow to an IPython notebook and use any Python module you want to complete your analysis. Data is stored in a pandas.DataFrame, which is rapidly becoming the standard for Python data management (and will make R users feel right at home.)

Extensible. Adding a new analysis module is simple; the interface to implement is only four functions.

Statistically sound. Ready access to useful data-driven tools for analysis, such as fitting 2-dimensional Gaussians for automated gating and mixture modeling.

Sound like your kind of thing?  Join us.

Data viz: public vs. scientists

I deeply appreciate good design in data visualization, and this jumped out of my news queue today.

Conflicting views: Public versus scientists

I’m not going to comment on the content, except to say that for the most part I align myself with “AAAS scientists” — no surprise, right?  But imagine, for a moment, this data presented as a bar graph: “public” in red and “science” in blue.  Doesn’t this do a much better job conveying both “magnitude” and “difference”?

Genome Organization and Gene Activity

All living organisms face the same problem: their DNA is much longer than their cells.  If you took the DNA from a single human cell and stretched it all out end-to-end, it would be about 1 meter long!  Not only do the cells have to fit all that DNA in there, they have to be able to access it – to transcribe it, to copy it, etc.

Prokaryotes and eukaryotes solve these problems in different ways (as you might expect: remember, one of the ways prokaryotes and eukaryotes are different is that prokaryotic cells don’t have a nucleus.)  Prokaryotes solve the problem by supercoiling their DNA: imagine taking a piece of rope, pinning down one end and then twisting the other.  Eventually the rope starts wrapping around itself; and as you continue to add twists, the wrapping gets tighter and the end-to-end length gets shorter.  Prokaryotes have a set of enzymes that supercoil DNA to pack it tightly, and another set that selectively uncoils it when it needs to be accessed or copied.  Many of these proteins are present only in prokaryotes and not eukaryotes, which makes them a good target for antibiotics.

Eukaryotes solve the problem differently, wrapping their DNA around tetrameric protein cores called histones into a 10 nm-wide fibre that, close up, looks like “beads on a string.”

DNA beads on a string.  Image: Figure 31-19, Biochemistry, 6th ed, Stryer

These chromatin fibers are further squeezed together into higher-order structures, the sum of which is called chromatin: the gooey mass of DNA and proteins that together hold each cell’s genetic information intact.  Far from being random, these higher-order structures form something akin to a fractal globule, a self-organizing structure that achieves tight packing without becoming knotted.  Oh, and it’s quite visually striking too:

Fractal globule genome.

Fractal globule genome. Ashok Cutkosky, Najeeb Tarazi, Erez Lieberman-Aiden, via BioTechniques


Two things to note.  First, the fact that the DNA reproducibly self-organizes at this level explains the phenomenon of DNA transregulatory elements, where a spot on the genome regulates gene expression at loci many millions of bases away: just because they’re distant in linear “genome” space, doesn’t mean that they’re far away in actual space.

Second, genome architecture provides another layer of regulation for gene control.  Some parts of the DNA hairball are open, accessible for transcription (these genes are “on”), and some parts of the DNA hairball are closed, compacted, inaccessible (these genes are “off”).  What I find particularly wacky, and what got me thinking about this in the first place, is that these structural changes seem directly related to cell type.  That is, the DNA in a skin cell and a liver cell may have exactly the same sequence, the same genetic “program”, but because the DNA is arranged differently different parts of the program are “running.”

And yes, this means that if I could take a skin cell and change the parts of the DNA that are on and off, I might be able to make it into a liver cell, or a brain cell, or a heart cell.  This is one of the hottest areas of regenerative medicine research right now.  Soon, if you get hepatitis and need a new liver, you won’t have to wait for someone to die and take theirs — you’ll donate some skin cells (or some fat cells) and three months later you’ll have a new liver (well, some liver-like tissue) waiting for you in a jar.

This is also (one of) the reason(s) why biomedical science didn’t end when the human genome was sequenced.  (Not that it’s finished, even a decade after it was declared finished.)  Not only do we still not know what all that DNA does; there are several layers of regulation that determine whether a piece of genome is active or not, and sorting out all those relationships will provide graduate projects for a long time yet.

On IPython and Repoducible Research

IPython logo


I’m attending the last day of the Keystone Symposium on Precision Genome Engineering and Synthetic Biology.  The afternoons are free, and the skiing is kind of weak, so when I need a break from TALENs and Cas9 (so much Cas9), I’m learning Python.

What’s particularly interesting is the community that’s trying to position Python as the next big thing in scientific computing; the successor to R, MATLAB, Mathematica, etc.  I used to think of Python as a “programming language” like C or Java or PERL, where you wrote a program to do what you want, then ran it on your data.  (And there are plenty of resources to support using it that way; PyDev comes to mind.)  I knew from my first brush with it 15 years ago (!!) that it had a REPL interface: you can bring up a Python “command line” and type expressions in, and the interpreter will evaluate them for you and give you the answer.  I didn’t really think much of it; I figured it was useful for noodling around, learning the language, debugging, etc.

Boy was I wrong.

IPython is a Python shell with proper support for interactive computing, like R or MATLAB.  It extends “traditional” Python with support for parallel and distributed computing, tight integration with several visualization toolkits, and a browser-based notebook that lets you record your data analysis workflow along with the results, and then share the whole thing trivially with coworkers and collaborators.  It makes literate programming absolutely effortless.

(I should note that IPython isn’t the only player in this space; Spyder and Enthought Canopy are two of the other efforts to make Python well-suited for interactive scientific programming.)

The other part of the equation is a set of libraries for data handling and analysis.  SciPy and SAGE are two “meta” libraries, bundling together a lot of mature software for importing, manipulating and analyzing data; building and running models; doing computational experiments, etc.  I was particularly happy to discover pandas, a library for handling structured data similar to data frames in R.  The toolkit isn’t quite as developed as R or MATLAB, but it’s growing as companies embrace the open source ethos of using Python tools for their own work, improving those tools and then contributing their improvements back to the community.  The adoption seems to be particularly strong in the academic community; it even saw a spot on Nature.com recently.

Which brings me to reproducible research.  Philip Bourne is one of my science idols; he was the founding editor-in-chief of PLoS Computational Biology and the originator of the “Ten Simple Rules” series (if you are a researcher in any field and you haven’t browsed these, you should!).  He has long been an advocate of reproducible research, but especially in computer science and computational biology it can be difficult to document exactly the steps you took to generate your data or do your analysis.  The last time I heard him speak on the subject, he was advocating standard directory layouts to organize data and using GNU Make to automate the running of tools, programs and scripts.  Clunky and time-consuming to say the least.

An IPython notebook completely obviates that.  It lets you record exactly what you did (the Python code) along with the rationale (in beautiful rich-text) and the output, all stored in one place.  It makes publishing your work so that others can reproduce it trivial, but the importance goes way beyond that.  I’ve learned the hard way that keeping a good notebook isn’t for some speculative person who picks up my work when I’m gone, it’s for me-in-six-months.  Keeping track of where I’ve been mentally, and what I’ve tried that didn’t work (or occasionally did), is astoundingly important … and anything that can make that easier is something that I’ll adopt enthusiastically.

So, now I’m a Python enthusiast.  Not looking forward to scaling the learning curve, but the underlying language makes a lot more sense to me than, say, R (which I’ve been using for a decade and still don’t feel particularly comfortable in.)  if only I could get easy integration between IPython and my Drupal-based online notebook…..

Postscript – I know that Mathematica has had a notebook interface for something like 5 years.  IPython’s strikes me as more flexible, better looking, based on open standards, and you can get it without paying a zillion dollars.  (-:

Sponges have 70% homologous genes with humans?

I liked Julia Galef’s article in Slate about keeping a “surprise journal” – and this seems as good a place as any.  Maybe I’ll call it “Surprise, Me” or something cute.  Also, the word “surprise” has a very low semantic satiation threshold.

Spongia_officinalisAnyways.  At PG’s suggestion, I went digging into the genomics of sponges:

  • Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M. E. A., Mitros, T., … Rokhsar, D. S. (2010). The Amphimedon queenslandica genome and the evolution of animal complexity. Nature, 466(7307), 720–6. doi:10.1038/nature09201

  • Mann, A. (2010). Sponge genome goes deep. Nature, 466(7307), 673. doi:10.1038/466673a

  • Riesgo, A., Farrar, N., Windsor, P. J., Giribet, G., & Leys, S. P. (2014). The analysis of eight transcriptomes from all poriferan classes reveals surprising genetic complexity in sponges. Molecular Biology and Evolution, 31(5), 1102–20. doi:10.1093/molbev/msu057

Sponges are cool because they’re commonly accepted to be simplest animal.  (Remember, animals are multicellular (metazoan) eukaryotes.)  Actually, sponges are cool for lots of reasons – they live in incredibly diverse environments, have lots of different shapes (and ways of forming those shapes), and some are carnivorous.  (I’m imagining a fantastic sequel to Little Shop of Horrors here.)

But I’ve been reading alot about morphogenesis recently (how animals form tissues and structures and organs and appendages from a single cell), and it makes you wonder: what does a “minimal” animal look like?  We can take bacteria and remove genes, and remove some more, and remove some more and eventually you find the minimal set that lets the bacterium still eat and live and reproduce.  (Fred Blattner, who first sequenced the E. coli genome, made a business out of it.)  It’s hard to do that with an animal, of course: there’s so much complexity (in the technical sense) to animal development that perturb it a little bit and you don’t get an animal any more.

Genomics to the rescue!  You might think to ask “what genes are in the sponge?  It’s the simplest animal, right?”  And that would be a start — but sponges have genes that other animals don’t.  They live in places that other animals don’t (the sea floor) and have evolved genes particular to that environment.  Better to ask “what genes do sponges and other animals share?”  Lots of other animals have had their genomes sequenced, so if we sequence the sponge genome there’s a lot to compare it to.  In fact, with so much data available the best question to ask is “what genes did the last common ancestor of sponges and more complex animals have?”  Right before the split into the animals that became sponges and the animals that became, well, not sponges – that’s the genome we’re interested in, because comparing it to unicellular eukaryotes (protists and fungi and Dictyostelium and such) will tell us “what genes are required for multicellularity?”

You know the awful misquoted factoid about how humans and monkeys are 99% the same?  (It’s actually 96%, but who’s counting?)  It makes sense, though – chimps have the same body plan as us, the same organs, they’re intelligent.  How much of the genome do we share with the lowly sea-sponge?


Think about that for a moment.

All the wondrous complexity of human morphology, all the muscles and sensory organs and nerves and, you know, a two-ended digestive tract (little things), all are a relatively minor portion compared to what we do share.

The genes that we share, what are their functions?  Well, you find the things you’d expect: genes that relate to multicellularity, like genes responsible for cellular adhesion, programmed cell death, cell-cell communication and the like.  There are things that you might not immediately expect, but upon reflection make sense – much of the cellular differentiation machinery is shared, because sponges have different cell types like we do (just fewer, ~20 instead of ~200.)  Sponges share with us a relatively robust innate immune system (they have an associated microbiome, and they get sick too.)  And they share much of the machinery responsible for detecting and shutting down uncontrolled proliferation (ie cancer.)  Because apparently sponges get cancer too?

And then there are the real head-scratchers.  Sponges have genes that, in humans, code for neurons and muscles.  Neither of which sponges have.  Wut?  What do sponges do with them?  More interestingly, if the common ancestor had genes that, in humans, make nerves and muscles — what did the common answer use them for?

Why is this cool?  What it points to is not only how fascinating and complex sponges are (i wanna do synbio in sponges now), but how fascinating and complex our ancient common ancestor was.  Multicellular organisms have an evolutionary advantage over unicellular organisms because they can more efficiently utilize environmental resources (because of active transport between cells, your size isn’t constrained by the diffusion limit.)  What the lowly sponge teaches us is about the tradeoff: that a ridiculous amount of our genetic machinery goes into supporting our multicellularity.  And to keep all those pieces in working order, we need to copy our genes more faithfully, which means a lower mutation rate, which means slower evolution.  Or it would have, if transposable elements (Richard Dawkin’s “selfish genes”) hadn’t shown up.  But that’s another post.


Almost No-Knead Bread

(Cook’s Illustrated)
6 cups unbleached all-purpose flour
1/2 t instant yeast
1 T table salt
1 3/4 C water, room temp
3/4 C mild-flavored lager
2 T white vinegar
~2 linear ft. of parchment paper
1.  Whisk flour, yeast, salt in a large bowl.  Add beer, water, and vinegar.  Fold mixture together until a shaggy ball forms.  Cover with plastic wrap and let sit at room temp 8-18 hrs.
2.  Lay a 12×18″ sheet of parchment paper on a 10″ skillet and spray with non-stick cooking spray.  Transfer dough to lightly floured surface and knead 10-15 times.  Shape dough into a ball and transfer to the parchment-lined skillet (you want the parchment to drape over the side, so you can use it as a “sling” to transfer the dough to the dutch oven).  Spray dough with non-stick cooking spray.  Cover loosely with plastic wrap and let rise at room temp until doubled in size and does not spring back readily when poked with a finger, about 2 hours.
3.  About 30 minutes before baking, adjust oven rack to lowest position and place a 6-8 qt heavy-bottomed Dutch oven (with lid) on oven rack.  Heat oven to 500°.  Lightly flour dough and slash with a razor blade or sharp knife.  Carefully remove pot from oven and remove lid.  Pick up the dough by lifting the parchment paper sling and lower into the pot.  Cover pot and place in oven.  Reduce oven temperature to 425° and bake covered for 45 minutes.  Remove lid and continue to bake until loaf is a deep brown and instant-read thermometer inserted inot the center reads 210° (20-30 min longer.)  Carefully remove bread from pot, transfer to wire rack and cool to room temp, ~2 hrs.

Review of “Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students”

In response to Maria’s latest post:

In their study “Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students”, Sundaram et al. propose the intriguing hypothesis that dementor colonization may be responsible for the apathy and despair commonly associated with graduate studies. They measure both dementor-related environmental factors and health outcomes among a population of public health graduate students; observing a strong correlation between the two, the authors conclude that evidence exists for a causal relationship.

Sundaram et al. have identified a timely, important problem that inexplicably has not been addressed by other researchers in the field. This reviewer laments his own shortsightedness in this regard; I read the books, what, ten years ago? Despite a limited sample size, questionable ethical standards and shoddy statistical analyses, the study’s results are highly suggestive and deserve further investigation. It is unfortunate that the authors stopped short of an interventional study, given that cleaning the fucking microwave takes like five minutes, I mean really. I also would have liked to see some consideration given to other possible causes for student soullessness, including professors that ask for five data slides for their talk and then don’t use any of them; coworkers that use the last of the molecular weight standard and then don’t order any more; and mice that escape their cages, then get killed in mousetraps because the animal facility has a rodent problem.

Recommendation: accept with revisions.