Sunday, December 5, 2010

Can we do better than Google?

If we were to file our Parc Safari data, would it go under "A" for "African Animals?" Or "Animals (African)"? Or maybe "Q" for "Quebec" (subsection "Africa")? Could the excavation of a zebra in Hemmingford, Quebec ever make a lick of sense without extensive context?

In 2004, a group of archaeologists met in a National Science Foundation-funded workshop to develop a concept that seems paradoxically fantastical and obvious: a centralized cyberinfrastructure, a working database of all archaeological data, ever, in the history of the world. Obvious because, as you the reader knows, all the most efficient organizational systems use online technology; fantastical because Keith Kintigh’s 2006 report on the workshop barely scratched the surface of the logistical nightmare that this theoretical database would be.

There’s a misconception among the Indiana Jones-loving masses that archaeology is a finite field, that almost everything to be dug up has been dug up, and that the whole discipline will one day go the way of the CD (that is, obsolescence). An understandable assumption, perhaps, when the only exposure one has had is a dusty black-and-white photo of the pitted, ravaged Valley of the Kings, but a very untrue assumption all the same. After all, as long as the world spins, things will get buried. As long as there is archaeology, there is new data. And as long as there is research, there is a very real need for easy access to old data.

In the information age, it seems easy to say, “well, let’s throw it online!” Surely after the database is designed and the bugs are exorcised, it will be a self-sustaining system, with all new information sliding neatly into its categories like books on a library shelf. And therein lies the first roadblock.

What is unique about archaeology is that data – the hard facts and the numbers – does not lose its importance, no matter how much new data is discovered. You can only excavate a site once (unless something has gone very wrong), and thus whoever got there first will be the effective data master forever. Even after the archaeologist’s death, new archaeologists will be referring back to those numbers and those pictures for as long as there is interest in that civilization. And since archaeology has existed in some form since the European Renaissance, the collective archaeological record is not only vast, but almost entirely in print.

And it is by no means enough to transcribe the data into electronic form. The ideal database is one that facilitates cross-referencing, which leads to the problem of standards. According to J.D. Richards’ From Anarchy to Good Practice, the documentation standards that exist now, adopted by archaeologists under pressure from libraries and museums of the world, are more guideline than law: “Guides to Good Practice, or Best Practice, but not Required Practice”. Without standards, the decisions you make on how narrow your animal categories will be (i.e. ungulates vs. artiodactyla vs. deer) or what side you support in the metric/imperial war will inevitably clash with the decisions of at least one of your colleagues, and researchers tend to panic when faced with a clash.

Richards goes on to question the real need for standards, using an example that very few people couldn't identify with: in the age of Google, we are all too used to the “type-and-hope” method of research; that is, plugging something into a search engine and praying that something at least mildly relevant comes back. And usually, it works. A truly standardized database would likely operate on a “point-and-click” basis, wherein the user would narrow down categories to find what he or she was looking for. Such a system that could comfortably accommodate all the vast and varied data accumulated over decades would have to be detailed to the point of inscrutability. To simplify the system bears the risk of shuffling aside inconvenient data that refuses to fall neatly into a category. The designers of this database would be walking a very, very thin line. The "Metadata" (the details of how a database is arranged) would have to be very controlled to avoid the comedy cliché of the filing system that, in a quest for maximum efficiency, has become too detailed to be at all useable.

During our discussion, the point was made that, once the database is in wide use, it puts pressure on academics to publish quickly. It would be difficult to take one’s sweet time in writing a dissertation when eager colleagues could easily access your data and begin their own inquiries. It also raises new ethical questions concerning the importance of disclosure in a new age of instant data-sharing.

The website “tDAR” (the Digital Archaeological Record) is the first attempt at a realization of the theoretical database. It is quite new and displays this not-exactly-inspiring disclaimer above the search bar:

“As this is a beta release, we will appreciate your tolerance of any problems you encounter and encourage you to send comments, suggestions and bug reports to [address].”

Out of curiosity, this humble blogger entered “tiwanaku” into the search bar, reasoning that surely the name of such an important site would return a wealth of information. I must say, I was mildly shocked when only a single result appeared. Either the young tDAR is not looking to pressure archaeologists into contributing, or there is a resounding lack of interest from the wider community. Either way, at this point, this isn’t the database we’re looking for. (We don’t need to see its identification. We can go about our business. Move along.)

Despite all this, both the NSF-funded forum and Roberts are convinced of the need for a database. To them, the benefits outweigh the risk. As a student of archaeology who is quite often fed up with the existing online resources, I do hope their vision becomes reality sooner rather than later.

Kintigh, Keith. "The Promise and Challenge of Archaeological Data Integration." American Antiquity 71.3 (2006): 567-578. Web. 11/05/10.

Richards, J.D. (2009) From anarchy to good practice: the evolution of standards in archaeological computing. Archeologia e Calcolatori, 20 . pp. 27-35.