Saturday, April 17, 2010

What would Linneaus do?

Long-term persistent storage, a technology challenge of global proportion

One of the highlights of visiting Uppsala, Sweden for the recent EGEE User Conference, was learning about Carl Linneaus. Born in Uppsala, Linneaus was a cornerstone for the university. He died in, and was laid to rest in Uppsala Cathedral—the largest cathedral in all of Scandinavia. Linneaus was famous for being the first botanist/physician/researcher to consistently apply Latin binomial nomenclature to plants and animals, thereby initiating what has become a universally implemented method of species classification. Uppsala University has lovingly preserved much of Linneaus’ field research—specimens and documentation--for future generations to mine.

It is the year 2010 and you are a molecular biologist who is on the brink of finding a cure for a disease that has affected millions of people. You are leveraging technology in ways that make the best and brightest computational scientists take note. Everywhere you turn, funding agents give money to support your research. Funding agents like to give money to support advances in science that will transform the lives of many. You are awarded tenure at your institution. Your discoveries are published. You put forth hundreds of students who have learned your techniques. You retire well, and live in the south of France.

Flash forward 250 years. You are a nanomolecular biologist who is attempting to find a cure for a new disease. You would love to get your hands on the computational research of those who worked on a similar strain of the disease in 2010, but the data is no longer available. Although the computational power that is available to you is far greater than that which was used in 2010, much of the effort expended over the lifetimes of those scientists who committed their research to digital files is lost. You can, however, read Linneaus’ field notes at Uppsala University.

The architecture of digital storage changes every 20 years or so. Storage costs money—power to cool the systems, media on which to store it, and therefore a commitment by an institution, scientific domain, or government, to protect and preserve the data for future generations to mine. Most institutions struggle to meet their administrative data management needs. Many commit to storing academic data for the tenure of the professional, or a period of two years (maximum) beyond the termination of a contract.

Perhaps Linneaus was lucky in that he contributed to society before this situation rendered it impossible to preserve and protect academic research for generations to come. In practice, preserving paper in a museum is much cheaper. Linneaus would have used hand laid paper or animal vellum. Writing would have been done with indigo ink. By virtue alone, these materials would stand the test of time. Today, if we do commit something to paper, it is most likely printed on uber-ephemeral recycled paper--the fibers of which are short, put through a bleaching process, and bound together with an acidic medium. Toner and vegetable inks fade, and paper that is designed to self destruct is not the best way to preserve anything.

This week, while having the opportunity to chat with those who operate many of the world’s technology infrastructures, it seems as though we all share the same problem. Most technology has a five-year life span at best. Many infrastructures are built and sustained by cycles of four or five-year funding streams. When you think about it, how could a long-term, international, data repository be funded? Where would you house it? Who would you trust with the contents? Which set of security policies would you apply, and how would you sustain the natural evolution of progress? Who wants to be responsible for the world’s data anyway?

The year is 2030 and I have passed away. My grandchildren are rifling through a box which represents my life. Dozens of USB memory sticks, CDs, and floppy disks hold the writings, images, and evidence of my existence--but they don’t know that. They have never seen a floppy disk, let alone a device that would read one. Therefore, these trinkets will most likely be turned into a piece of primitive jewelry, made into Christmas ornaments, or sent to the trash for an archaeologist to find, or not.


KurtJB said...

Very interesting reading. I love the ending, the last paragraph, your take on it about the future, obsolete data storage devices, unreadable due to the advances in technology, and lost bits of a persons personal history.

Nice work Elizabeth

Catherine Gater said...

Excellent post, very thought provoking! I might start labelling my photo CDs now "Do not throw away!!" It's an interesting point, how do you best preserve electronic data from a present-day Linneaus and the means to read it for hundreds of years to come?