Biological collections are irreplaceable as keepers of the hard core of biodiversity information and natural heritage. Yet, only 10-20% of their vast holdings in Europe have been databased and much less imaged. As the consequence, most of their content is not accessible online.
This project will launch us seriously towards digitizing all of those one billion specimens in European museums. Modern mass-digitisation technology makes it possible for one person to produce images of 500 samples in a day, which is 100,000 samples per year. A small digitization centre of 10 people can thus reach one million samples in a year. In this project we establish 10 such centres around Europe and run them for 5 years, which means 50 million specimens digitized. (WP1)
Budget will be 15 million €. We aim at 0,30 € / specimen, which is about 1/10 of the cost today. We will develop innovative new imaging methods for the kinds of samples which cannot be imaged fast with current methods. This will accelerate the rate of digitization further. (WP2)
In order to transcribe data from those images, we mobilize a workforce of 1000 people around the world. Transcribing will be distributed to the country and expertise that is required to read them correctly and where the cost is right. An international collaboration will emerge because, for instance scientists in an African country can transcribe their own specimens. Other transcribing will be prioritised demand-driven, when then data is really needed. It will be supported by efficient, adaptive workflows which guarantee that no data need to be georeferenced twice and that scientific names will be interpreted correctly. (WP3)
We will establish a data centre that can handle such a big load of images, as 3 TB will be generated each day. This will use the European HPC infrastructure where the data will form one virtual pool. The functions will be packaged in a virtual research environment for biological collections. This will ensure seamless online collaboration which accelerates biodiversity science. (WP4)
We will mobilize the scientific community to curate and annotate the data, increasing the quality of data and maintain the data up to date. We will also digitize “dark species” which have not yet been described, in anticipation that availability of images and data will accelerate discovery of new species. Such nameless specimens will be DNA-barcoded. (WP5)
There will be an outreach programme that will communicate the use and benefits of this big data pool, and gain support for this operation. We will cooperate closely with major other digitisation programmes such as the NSF iDigBio. At the end of the project we will transfer this infrastructure to the willing nearby host institutions, who will operate it to digitize further collections. During the project we will explore business models for the operation, with the aim that parts of the operation will continue as independent businesses. (WP6)