Oh the weather outside is frightful …. now that winter has arrived in Boston for real, the only civilized way to survive is with lots of hot beverages. Here’s my favorite.
Monthly Archives: January 2015
Genome Organization and Gene Activity
All living organisms face the same problem: their DNA is much longer than their cells. If you took the DNA from a single human cell and stretched it all out end-to-end, it would be about 1 meter long! Not only do the cells have to fit all that DNA in there, they have to be able to access it – to transcribe it, to copy it, etc.
Prokaryotes and eukaryotes solve these problems in different ways (as you might expect: remember, one of the ways prokaryotes and eukaryotes are different is that prokaryotic cells don’t have a nucleus.) Prokaryotes solve the problem by supercoiling their DNA: imagine taking a piece of rope, pinning down one end and then twisting the other. Eventually the rope starts wrapping around itself; and as you continue to add twists, the wrapping gets tighter and the end-to-end length gets shorter. Prokaryotes have a set of enzymes that supercoil DNA to pack it tightly, and another set that selectively uncoils it when it needs to be accessed or copied. Many of these proteins are present only in prokaryotes and not eukaryotes, which makes them a good target for antibiotics.
Eukaryotes solve the problem differently, wrapping their DNA around tetrameric protein cores called histones into a 10 nm-wide fibre that, close up, looks like “beads on a string.”
These chromatin fibers are further squeezed together into higher-order structures, the sum of which is called chromatin: the gooey mass of DNA and proteins that together hold each cell’s genetic information intact. Far from being random, these higher-order structures form something akin to a fractal globule, a self-organizing structure that achieves tight packing without becoming knotted. Oh, and it’s quite visually striking too:
Two things to note. First, the fact that the DNA reproducibly self-organizes at this level explains the phenomenon of DNA transregulatory elements, where a spot on the genome regulates gene expression at loci many millions of bases away: just because they’re distant in linear “genome” space, doesn’t mean that they’re far away in actual space.
Second, genome architecture provides another layer of regulation for gene control. Some parts of the DNA hairball are open, accessible for transcription (these genes are “on”), and some parts of the DNA hairball are closed, compacted, inaccessible (these genes are “off”). What I find particularly wacky, and what got me thinking about this in the first place, is that these structural changes seem directly related to cell type. That is, the DNA in a skin cell and a liver cell may have exactly the same sequence, the same genetic “program”, but because the DNA is arranged differently different parts of the program are “running.”
And yes, this means that if I could take a skin cell and change the parts of the DNA that are on and off, I might be able to make it into a liver cell, or a brain cell, or a heart cell. This is one of the hottest areas of regenerative medicine research right now. Soon, if you get hepatitis and need a new liver, you won’t have to wait for someone to die and take theirs — you’ll donate some skin cells (or some fat cells) and three months later you’ll have a new liver (well, some liver-like tissue) waiting for you in a jar.
This is also (one of) the reason(s) why biomedical science didn’t end when the human genome was sequenced. (Not that it’s finished, even a decade after it was declared finished.) Not only do we still not know what all that DNA does; there are several layers of regulation that determine whether a piece of genome is active or not, and sorting out all those relationships will provide graduate projects for a long time yet.
On IPython and Repoducible Research
I’m attending the last day of the Keystone Symposium on Precision Genome Engineering and Synthetic Biology. The afternoons are free, and the skiing is kind of weak, so when I need a break from TALENs and Cas9 (so much Cas9), I’m learning Python.
What’s particularly interesting is the community that’s trying to position Python as the next big thing in scientific computing; the successor to R, MATLAB, Mathematica, etc. I used to think of Python as a “programming language” like C or Java or PERL, where you wrote a program to do what you want, then ran it on your data. (And there are plenty of resources to support using it that way; PyDev comes to mind.) I knew from my first brush with it 15 years ago (!!) that it had a REPL interface: you can bring up a Python “command line” and type expressions in, and the interpreter will evaluate them for you and give you the answer. I didn’t really think much of it; I figured it was useful for noodling around, learning the language, debugging, etc.
Boy was I wrong.
IPython is a Python shell with proper support for interactive computing, like R or MATLAB. It extends “traditional” Python with support for parallel and distributed computing, tight integration with several visualization toolkits, and a browser-based notebook that lets you record your data analysis workflow along with the results, and then share the whole thing trivially with coworkers and collaborators. It makes literate programming absolutely effortless.
(I should note that IPython isn’t the only player in this space; Spyder and Enthought Canopy are two of the other efforts to make Python well-suited for interactive scientific programming.)
The other part of the equation is a set of libraries for data handling and analysis. SciPy and SAGE are two “meta” libraries, bundling together a lot of mature software for importing, manipulating and analyzing data; building and running models; doing computational experiments, etc. I was particularly happy to discover pandas, a library for handling structured data similar to data frames in R. The toolkit isn’t quite as developed as R or MATLAB, but it’s growing as companies embrace the open source ethos of using Python tools for their own work, improving those tools and then contributing their improvements back to the community. The adoption seems to be particularly strong in the academic community; it even saw a spot on Nature.com recently.
Which brings me to reproducible research. Philip Bourne is one of my science idols; he was the founding editor-in-chief of PLoS Computational Biology and the originator of the “Ten Simple Rules” series (if you are a researcher in any field and you haven’t browsed these, you should!). He has long been an advocate of reproducible research, but especially in computer science and computational biology it can be difficult to document exactly the steps you took to generate your data or do your analysis. The last time I heard him speak on the subject, he was advocating standard directory layouts to organize data and using GNU Make to automate the running of tools, programs and scripts. Clunky and time-consuming to say the least.
An IPython notebook completely obviates that. It lets you record exactly what you did (the Python code) along with the rationale (in beautiful rich-text) and the output, all stored in one place. It makes publishing your work so that others can reproduce it trivial, but the importance goes way beyond that. I’ve learned the hard way that keeping a good notebook isn’t for some speculative person who picks up my work when I’m gone, it’s for me-in-six-months. Keeping track of where I’ve been mentally, and what I’ve tried that didn’t work (or occasionally did), is astoundingly important … and anything that can make that easier is something that I’ll adopt enthusiastically.
So, now I’m a Python enthusiast. Not looking forward to scaling the learning curve, but the underlying language makes a lot more sense to me than, say, R (which I’ve been using for a decade and still don’t feel particularly comfortable in.) if only I could get easy integration between IPython and my Drupal-based online notebook…..
Postscript – I know that Mathematica has had a notebook interface for something like 5 years. IPython’s strikes me as more flexible, better looking, based on open standards, and you can get it without paying a zillion dollars. (-:
Sponges have 70% homologous genes with humans?
I liked Julia Galef’s article in Slate about keeping a “surprise journal” – and this seems as good a place as any. Maybe I’ll call it “Surprise, Me” or something cute. Also, the word “surprise” has a very low semantic satiation threshold.
Anyways. At PG’s suggestion, I went digging into the genomics of sponges:
Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M. E. A., Mitros, T., … Rokhsar, D. S. (2010). The Amphimedon queenslandica genome and the evolution of animal complexity. Nature, 466(7307), 720–6. doi:10.1038/nature09201
Mann, A. (2010). Sponge genome goes deep. Nature, 466(7307), 673. doi:10.1038/466673a
Riesgo, A., Farrar, N., Windsor, P. J., Giribet, G., & Leys, S. P. (2014). The analysis of eight transcriptomes from all poriferan classes reveals surprising genetic complexity in sponges. Molecular Biology and Evolution, 31(5), 1102–20. doi:10.1093/molbev/msu057
Sponges are cool because they’re commonly accepted to be simplest animal. (Remember, animals are multicellular (metazoan) eukaryotes.) Actually, sponges are cool for lots of reasons – they live in incredibly diverse environments, have lots of different shapes (and ways of forming those shapes), and some are carnivorous. (I’m imagining a fantastic sequel to Little Shop of Horrors here.)
But I’ve been reading alot about morphogenesis recently (how animals form tissues and structures and organs and appendages from a single cell), and it makes you wonder: what does a “minimal” animal look like? We can take bacteria and remove genes, and remove some more, and remove some more and eventually you find the minimal set that lets the bacterium still eat and live and reproduce. (Fred Blattner, who first sequenced the E. coli genome, made a business out of it.) It’s hard to do that with an animal, of course: there’s so much complexity (in the technical sense) to animal development that perturb it a little bit and you don’t get an animal any more.
Genomics to the rescue! You might think to ask “what genes are in the sponge? It’s the simplest animal, right?” And that would be a start — but sponges have genes that other animals don’t. They live in places that other animals don’t (the sea floor) and have evolved genes particular to that environment. Better to ask “what genes do sponges and other animals share?” Lots of other animals have had their genomes sequenced, so if we sequence the sponge genome there’s a lot to compare it to. In fact, with so much data available the best question to ask is “what genes did the last common ancestor of sponges and more complex animals have?” Right before the split into the animals that became sponges and the animals that became, well, not sponges – that’s the genome we’re interested in, because comparing it to unicellular eukaryotes (protists and fungi and Dictyostelium and such) will tell us “what genes are required for multicellularity?”
You know the awful misquoted factoid about how humans and monkeys are 99% the same? (It’s actually 96%, but who’s counting?) It makes sense, though – chimps have the same body plan as us, the same organs, they’re intelligent. How much of the genome do we share with the lowly sea-sponge?
Think about that for a moment.
All the wondrous complexity of human morphology, all the muscles and sensory organs and nerves and, you know, a two-ended digestive tract (little things), all are a relatively minor portion compared to what we do share.
The genes that we share, what are their functions? Well, you find the things you’d expect: genes that relate to multicellularity, like genes responsible for cellular adhesion, programmed cell death, cell-cell communication and the like. There are things that you might not immediately expect, but upon reflection make sense – much of the cellular differentiation machinery is shared, because sponges have different cell types like we do (just fewer, ~20 instead of ~200.) Sponges share with us a relatively robust innate immune system (they have an associated microbiome, and they get sick too.) And they share much of the machinery responsible for detecting and shutting down uncontrolled proliferation (ie cancer.) Because apparently sponges get cancer too?
And then there are the real head-scratchers. Sponges have genes that, in humans, code for neurons and muscles. Neither of which sponges have. Wut? What do sponges do with them? More interestingly, if the common ancestor had genes that, in humans, make nerves and muscles — what did the common answer use them for?
Why is this cool? What it points to is not only how fascinating and complex sponges are (i wanna do synbio in sponges now), but how fascinating and complex our ancient common ancestor was. Multicellular organisms have an evolutionary advantage over unicellular organisms because they can more efficiently utilize environmental resources (because of active transport between cells, your size isn’t constrained by the diffusion limit.) What the lowly sponge teaches us is about the tradeoff: that a ridiculous amount of our genetic machinery goes into supporting our multicellularity. And to keep all those pieces in working order, we need to copy our genes more faithfully, which means a lower mutation rate, which means slower evolution. Or it would have, if transposable elements (Richard Dawkin’s “selfish genes”) hadn’t shown up. But that’s another post.
Almost No-Knead Bread
Review of “Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students”
In response to Maria’s latest post:
In their study “Effects of High Dementor Density on Health Outcomes, Including Soul Loss, in Graduate Students”, Sundaram et al. propose the intriguing hypothesis that dementor colonization may be responsible for the apathy and despair commonly associated with graduate studies. They measure both dementor-related environmental factors and health outcomes among a population of public health graduate students; observing a strong correlation between the two, the authors conclude that evidence exists for a causal relationship.
Sundaram et al. have identified a timely, important problem that inexplicably has not been addressed by other researchers in the field. This reviewer laments his own shortsightedness in this regard; I read the books, what, ten years ago? Despite a limited sample size, questionable ethical standards and shoddy statistical analyses, the study’s results are highly suggestive and deserve further investigation. It is unfortunate that the authors stopped short of an interventional study, given that cleaning the fucking microwave takes like five minutes, I mean really. I also would have liked to see some consideration given to other possible causes for student soullessness, including professors that ask for five data slides for their talk and then don’t use any of them; coworkers that use the last of the molecular weight standard and then don’t order any more; and mice that escape their cages, then get killed in mousetraps because the animal facility has a rodent problem.
Recommendation: accept with revisions.
My first post. More coming soon…