Optical Microscopy Provides a Path to a $10M Mouse Brain Connectome (if it eliminates proofreading)
In a recent article , the Wellcome Trust estimated that it would cost roughly $10B over 17 years to reconstruct a whole mouse brain, due in large part to the cost of proof-reading. Here, we analyze emerging optical connectomics technologies that have the potential to reduce this cost through the high throughput and low capital costs of optical microscopy; lower resolution imaging data, which drastically reduces the cost of computational reconstruction; and cellular barcoding, which has the potential to eliminate the need for proofreading. We argue that in the limit where optical barcoding enables us to eliminate proofreading, this optical connectomics technology could reduce this cost by more than 1000x, to $7M per mouse brain with a $10M initial capital expenditure. Joergen Kornfeld and I began working on this approach in 2017 in Ed Boyden’s lab and Michale Fee’s lab, respectively, inspired by work done previously by Adam Marblestone, Ed Boyden, Dawen Cai, and others. My lab at the Francis Crick Institute is now actively collaborating with Joergen’s lab at the Max Planck Institute and with E11 Bio, a Focused Research Organization led by Andrew Payne, on realizing this vision, and it was recognized in the Wellcome report as a possible source of major cost reductions in the future. E11’s role is essential, as the development of a mature, high-performance connectomics technology likely goes beyond the engineering capabilities and project timelines of academic labs; and indeed, this project was the original inspiration for the Focused Research Organization model .
We and others have been investigating connectomics based on optical barcoding [3-6], which is intended to eliminate proofreading altogether using molecular barcodes to uniquely label neurons. Protein-based optical barcoding schemes are inspired by brainbow, in which neurons express a random mixture of fluorescent proteins that allow neighbors with different mixtures of proteins to be distinguished from each other, reducing the difficulty of reconstruction and proofreading. In the approaches we have been developing, fluorescent proteins are replaced with epitope-labeled “barcode proteins” that can be visualized in successive rounds of antibody multiplexing. With ~30 barcode proteins admitting 2^30 ~ 1 billion possible unique combinations, it is conceivable in the limit for barcoding approaches to uniquely label every neuron in the brain, thus eliminating the need for reconstruction and proofreading altogether. Simultaneously, we can identify synapses by imaging synaptic markers, and (compared to electron microscopy) have superior ability to resolve the specific functional molecules present at each synapse. The costs of mapping an entire mouse brain using this technology are derived from the costs of acquisition, storage, and reconstruction:
The cost of acquisition, segmentation, and reconstruction for this technology is likely significantly cheaper than that of electron microscopy. In our hands so far, it seems that 50nm isotropic resolution is sufficient to resolve processes and synapses, but each voxel must be imaged many times to read out synaptic markers and barcode proteins, so the benefits of imaging at lower resolution are counterbalanced by the need to image the same volume many times. To scale this technology to an entire mouse brain, we would need to image roughly 0.5 cubic centimeters at roughly 50nm^3 isotropic resolution, and each voxel would need to be imaged roughly 100 times in order to acquire all the necessary barcode proteins and synaptic markers. This corresponds to roughly 4*10^17 voxels, or roughly 400 petabytes of data uncompressed. In our hands, using commercially available laser beds, it is possible to image at approximately 1-2 GHz per microscope, with 1-2 cameras, each with 4-10 megapixels, operating at 50-100Hz. This throughput could in principle be increased to roughly 40 megapixels at 50-100Hz by saturating the etendue of the Nikon Ti2 microscope body, which would require significant but relatively straightforward optical engineering. As a middle estimate, we consider microscopes operating at 100Hz, with two 40 megapixel cameras, for a total throughput of 8 gigapixels per microscope per second. At this throughput, imaging 4*10^17 voxels would take roughly 20 microscope-months, or 2 months on 10 microscopes.
To estimate the cost of acquisition, we consider both capital costs and operating costs. An optical microscope such as that described above costs roughly $500k (about 10x less than an electron microscopy), and likely requires roughly $1M per microscope per year to operate, corresponding to 2-3 fully loaded FTEs per microscope. Thus, a facility with 20 microscopes would require a capital expenditure of $10M and would be able to image roughly one brain per month, for operating expenses of roughly $2M per brain.
Optical connectomics methods are inspired by Brainbow, a method that enables neurons to be labeled with a random collection of fluorescent proteins. The unique combination of proteins expressed in each neuron helps each neuron to be distinguished from its neighbors. In optical connectomics technologies, the ~3-5 fluorescent proteins used in Brainbow are replaced with ~30 barcode proteins that can be detected in series, rather than simultaneously. Reproduced from https://braintour.harvard.edu/archives/portfolio-items/brainbow.
Currently, rates for cloud storage run to approximately $10k per petabyte per month. Assuming a 90% compression rate (as in the Wellcome estimate), we can estimate that we would need storage for 50 petabytes, or roughly $500k/month if it were stored on the cloud. With an acquisition time of ~2 months, storage costs of ~$3M (for 6 months) seems reasonable, after which time the compressed raw data could be transferred to lower cost, long-term storage.
Data Storage + Reconstruction
For reconstruction and segmentation, a detailed analysis is not provided in the Wellcome piece. However, Schubert et al.  estimated that reconstruction of EM data requires roughly one GPU-hour per 4.4 gigavoxels. It is currently uncertain how precisely reconstruction will be extended to optical data: in the limit of perfect optical barcoding, segmentation and even reconstruction per se may not even be necessary, because most pixels may be uniquely assigned to cells based on their combination of barcode proteins. However, if we assume conservatively a similar computational requirement of one hour of a NVidia A100 per 4.4 gigavoxels, then reconstructing the entire brain at 50nm isotropic resolution would require roughly $2M at current cloud computing costs of $1.10 per A100-hour. In reality, the final number would likely be significantly lower as costs of cloud computing come down.
In conclusion, with a combination of new barcode-based optical connectomics technologies that can eliminate the need for proofreading and new improved optical microscopy techniques, it is reasonable to expect that we may eventually be able to acquire mouse brain connectomes with a marginal cost of <$10M, and capital expenditures of ~$10M. The cost savings are achieved through the low capital costs of optical microscopy (10x cheaper than EM); the use of barcode methods to eliminate the need for proofreading; and the lower resolution of optical connectomics data, which greatly reduces data storage and computational reconstruction costs. The technologies required have not yet been realized but are straightforward and reasonable extensions of currently available technology. Most importantly, it is not clear yet whether optical connectomics will ultimately enable us to eliminate proofreading altogether, and significant further technology development is required before we will know for sure.
As always, reality may lead to substantial inefficiencies. For example, if we require 25nm resolution in all channels, rather than 50nm resolution, that would lead to a roughly 8x increase in total cost. Moreover, achieving 8 gigapixel throughput per microscope per second would require significant dedicated engineering hardware; 100 Hz acquisition time may not be possible in all samples for all channels; and it is unclear at this point what duty cycle can be sustained over long periods of time on these systems. Nonetheless, it seems relatively safe, given the estimates here, to posit a 10x-100x cost improvement relative to EM in the “short” term (~5 years), and a 100x-1000x improvement in the long-term (~10-15 years), assuming the eventual data is of sufficiently high quality.
On this point: it is still very unclear whether optical connectomics will ultimately compete with EM connectomics, let alone surpass it. Moreover, other emerging technologies, such as “connectomics-by-sequencing,” are being pursued by Tony Zador, Evan Macosko, and others, and may in principle provide even better scaling properties. Nonetheless, this analysis suggests that investment in technology development (including EM, optical, and others) now may yield connectomic approaches that are up to 1000x more cost-effective than current technologies. When it comes to reconstructing the entire mouse brain, we should be investing a significant amount in technology development, rather than placing all of our eggs in one basket.