Current Practice in Linked Open Data for the Ancient World
Uploaded bySebastian Heath- Files1 of 2
connect to download
Current Practice in Linked Open Data for the Ancient World
Current Practice in Linked Open Data for the Ancient World
ISAW Papers 7 (2014)
Current Practice in Linked Open Data for the Ancient World
Editors: Thomas Elliott, Sebastian Heath, John Muccigrosso
http://doi.org/2333.1/gxd256w7
Abstract: Reports on current work relevant to the role of Linked Open Data (LOD) in the study of the ancient world. As a term, LOD encompasses approaches to the publication of digital resources that emphasize stability, relatively fine-grained access to intellectual content via public URIs, and re-usability as defined both by publication of machine reabable data and by publication under licenses that permit further copying of available materials. This collection presents a series of reports from participants in 2012 and 2013 sessions of the NEH-funded Linked Ancient World Data Institute. The contributors come from a wide range of academic disciplines and professional backgrounds. The projects they represent reflect this range and also illustrate many stages of the process of moving from concept to implementation, with a focus on results achieved by the mid 2013 to early 2014 timeframe.
Subjects: Humanities--Study and teaching.
1. Prologue and Introduction (Thomas Elliott, Sebastian Heath and John Muccigrosso)
2. Linked Open Bibliographies in Ancient Studies (Phoebe Acheson)
3. Linked Data in the Perseus Digital Library (Bridget Almas, Alison Babeu, Anna Krohn)
4. Herculaneum Graffiti Project (Rebecca Benefiel and Sara Sprenkle)
5. The Homer Multitext and RDF-Based Integration (Christopher Blackwell and D. Neel Smith)
6. Moving the Ancient World Online Forward (Thomas Elliott and Charles Jones)
7. Linked Open Data and the Ur of the Chaldees Project (William B. Hafford)
8. ISAW Papers: Towards a Journal as Linked Open Data (Sebastian Heath)
9. Beyond Maps as Images at the Ancient World Mapping Center (Ryan Horne)
10. Open Context and Linked Data (Eric Kansa)
11. Geolat: Geography for Latin Literature (Maurizio Lana)
12. The Europeana Network of Ancient Greek and Latin Epigraphy (EAGLE) (Pietro Maria Liuzzo)
13. Bryn Mawr Classical Review (Camilla MacKay)
14. Byzantine Cappadocia: Small Data and the Dissertation (A. L. McMichael)
15. Coinage and Numismatic Methods. A Case Study of Linking a Discipline (Andrew Meadows and Ethan Gruber)
16. Exploring an Opportunity to Link the Dead in Ancient Rome (Katy Meyers)
17. RAM 3D Web Portal (William Murray)
18. Assessing the Suitability of Existing OWL Ontologies for the Representation of Narrative Structures in Sumerian Literature (Terhi Nurmikko-Fuller)
19. Berkeley Prosopography Services (Laurie Pearce and Patrick Schmitz)
20. Linking Portable Antiquities to a wider web (Daniel Pett)
21. Pompeii Bibliography and Mapping Resource (Eric Poehler)
22. It’s about time: Historical Periodization and Linked Ancient World Data (Adam Rabinowitz)
23. Publishing Archaeological Linked Open Data: From Steampunk to Sustainability (Andrew Reinhard)
24. Mining Citations, Linking Texts (Matteo Romanello)
25. Linked Data and Ancient Wisdom (Charlotte Roueché, Keith Lawrence, and K. Faith Lawrence)
26. Linked Open Data for the Uninitiated (Rebecca Seifreid)
27. Pelagios (Rainer Simon, Elton Barker, Pau de Soto, and Leif Isaksen)
28. Linked data and the future of cuneiform research at the British Museum (Jon Taylor)
29. Integrating Historical-Geographic Web-Resources (Tsoni Tsonev)
30. Moving from Cross-Collection Integration to Explorations of Linked Data Practices in the Library of Antiquity at the Royal Museums of Art and History, Brussels (Ellen Van Keer)
ISAW Papers (ISSN 2164-1471) is a publication of the Institute for the Study of the Ancient World, New York University. The articles in ISAW Papers 7 were considered for publication by Tom Elliott and Sebastian Heath as members of the ISAW staff and faculty.
©2014 The authors. Distributed under the terms of the Creative Commons Attribution 4.0 Unported license.
ISAW Papers 7.1 (2014)
Prologue and Introduction
Tom Elliott, Sebastian Heath and John Muccigrosso
Prologue
The articles published collectively as ISAW Papers 7 are the result of two meetings held at New York University's Institute for the Study of the Ancient World and at Drew University in early summer of 2012 and 2013 respectively. Both had the title "Linked Ancient World Data Institute" and came to be known by the acronym LAWDI. Organized by the present authors, the meetings were generously funded by the Office of Digital Humanities of the National Endowment of the Humanities. We are very grateful to the NEH and to our home institutions for their support of the events.
We would also like to thank all the LAWDI participants and all the contributors to this collection. At the LAWDI meetings, we strove for a combination of structured presentation and informal exchange, and here in the publication we intend to offer a forum for description of ongoing work. Different authors have made different use of the opportunity. A common thread is the willingness to make digital resources available on the public internet. But beyond that there is great diversity. Personal perspectives mix with brief reports and with technical discussions. Because all the contributions are informed more or less explicitly by the principles of Linked Open Data (LOD) it will be useful to review those here.
Introduction
Linked Open Data is a set of best practices for sharing digital resources. Among the foundational texts that helped define and promote LOD's principles is Tim Berners-Lee's "Linked Data" of 2006, which is online at http://www.w3.org/DesignIssues/LinkedData.html. Reaching further back into the history of the web, the same author's "Cool URIs Don't Change," online at http://www.w3.org/Provider/Style/URI, offers simple suggestions that are still useful today. Also influential is Sauerman et al. (2008) W3C Interest Group Note entitled "Cool URIs for the Semantic Web," for which see http://www.w3.org/TR/cooluris/. The ideas outlined in these three relatively brief discussions were described in much greater detail by Tom Heath and Christian Bizer in their 2011 book Linked Data: Evolving the Web into a Global Data Space, which is also available online at http://linkeddatabook.com/. Readers wanting to explore further approaches to LOD will be well served by "clicking through" on those links and also be investigating the resources at http://linkeddata.org.
Each session of LAWDI was only three days long so required considerable compression of technical details. We will do the same (and more) in this introduction. The one point that was stressed at LAWDI was stability, particularly of published URIs (and for discussion of the difference between URIs and URLs see Sauerman et al. [2008]). The fundamental practice of giving individual URIs to all discrete resources that a project published was energetically endorsed. Among the types of URIs often used as examples within the study of the ancient world were Pleiades URIs for ancient places (e.g. https://pleiades.stoa.org/places/59672) and the Worldcat catalog stood out as providing common identifiers with impact beyond any single discipline (e.g. http://www.worldcat.org/oclc/768969647). Beyond stability, the faculty and participants at LAWDI both stressed and searched for common vocabularies that can be used to encourage machine readability of the information available at the stable URIs that currently exist and that will become available. The Pelagios Project (Isaksen et al. 2014), which is aggregating disparate uses of Pleiades identifiers, has shown the utility of expressing links between resources in such a way that they can be automatically harvested. Nomisma.org has also made progress in this direction. Putting aside specific examples, the articles published here often express the desire for greater clarity on how to refer to people, places, and events, in addition to such abstract concepts as periods and typologies. That common practice is desirable was a clear outcome of the 2012 LAWDI session (Elliot, Heath and Muccigrosso 2012) and remained a topic of discussion in 2013.
One Technical Point...
Despite an avoidance of excessively technical presentations, LAWDI did not shy away from highlighting the “RDF Triple” as an important tool in ongoing efforts to encourage interoperability. We likewise think it probably that readers of the current essays will find it useful to be familiar with this fundamental concept in the publication of re-usable data on the public internet. Meaning that we offer here a very brief answer to the question, "What is a Triple?".
A triple is a statement of information in three parts. Using colloquial language, those three parts can be defined in the following way:
The Subject: The thing being talked about.
The Predicate: The category of statement one is making about the subject.
The Object: The content of the statement that one is making about the subject.
For example, the English sentence "Athens is located in Attica" has the subject "Athens", the predicate "located in", and the object "Attica".
It is best practice in LOD to use URIs for all three elements of a triple so that the following three URIs are a machine readable version of the same sentence:
https://pleiades.stoa.org/places/579885
http://www.geonames.org/ontology#locatedIn
http://www.geonames.org/6692632
Using the triple terminology, clicking on the "subject," or first URI in the list, will bring up the Pleiades record for Athens. Clicking on the predicate will bring up a human-readable introduction to the Geonames vocabulary. And clicking on the "object" will bring up the geonames.org record for Attica.
Showing that one simple English sentence can be transformed into a group of URIs is a prelude to the much grander assertion that triples can express many forms of information. An individual project's database might record object dimensions or site locations in a row and column oriented format that suits its particular needs. If that information is published on the Internet as triples, it is more likely to be usable by other consumers, whether human or automated. More complex series of triples can be linked together. A type of pottery is discovered in a building. That pottery has an origin and the building has a terminus post quem. All of this can represented as triples. If URIs are used, then those triples can refer to definitions distributed across the Internet. Data starts to be linked and analysis by third-parties is enabled. This is the vision to which LAWDI optimistically alluded.
Specific examples and discussions of triples appear often in this collection of essays, most directly in Almas et al., Heath, Kansa, and Romanello. The range of uses illustrated therein suggests the role that triples can play in providing a common format for representing diverse concepts published on the public Internet. It is optimistic to think that all digital data relevant to the ancient world will be published in this form, but it is also true that growing use of triples is leading to greater compatibility between the resources that are generating them.
Full presentation of the role of triples in LOD is beyond the scope of this introduction. We do encourage interested readers to see the useful discussion in Heath and Bizer's online book. Section "2.4.1 The RDF Data Model" is the most relevant part.
On This Collection
In closing we note that much diversity was on display at the LAWDI meetings and that the same is apparent in the scope of articles now being published as ISAW Papers 7. Our goal as editors is that the wide range of styles and approaches present in the face-to-face events will come through in LAWDI's written record. These are timely statements of ongoing work and we stress that they mostly relate to the state of affairs in late 2013. We hope that the issues this collection raises will be useful for others thinking about implementing digital resources. And we believe we speak for all LAWDI participants when stressing that the ideas articulated here are inspired by the desire to share well-conceived digital resources with all audiences willing to make use of them.
Works Cited
Berners-Lee, T. (1998). “Cool URIs Don't Change.” <http://www.w3.org/Provider/Style/URI>
Berners-Lee, T. (2006). “Linked Data.” <http://www.w3.org/DesignIssues/LinkedData.html>
Elliott, T., S. Heath and J. Muccigrosso (2012). Report on the Linked Ancient World Data Institute. Information Standards Quarterly, 24(Spring/Summer, 2/3): 43-45. < http://www.niso.org/publications/isq/2012/v24no2-3/elliott/>
Heath, T. and C. Bizer (2011). Linked Data: Evolving the Web into a Global Data Space <http://linkeddatabook.com/>
Sauerman L. and R. Cyganiak 2008. “Cool URIs for the Semantic Web.” <http://www.w3.org/TR/cooluris/>
©2014 Tom Elliott, Sebastian Heath and John Muccigrosso. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.2 (2014)
Linked Open Bibliographies in Ancient Studies
Phoebe Acheson
One of my main takeaways from the 2012 Linked Ancient World Data Institute was the need for a comprehensive bibliography of scholarly literature in the field of ancient studies that is both open and follows the principles of linked data (see also [Elliott 2012]). Such a resource could facilitate scholarly work across many related sub-disciplines and increase discoverability of academic studies that mention specific subjects. It would include a robust controlled vocabulary, with the ability to tag citations with stable identifiers for places and monuments, authors, texts and passages within texts, historical figures, artifacts and works of art, and themes or subjects discussed in the scholarly works included. It would allow more comprehensive identification of bibliographic items on a topic, and would facilitate the automatic feeding of scholarly citations into web sites discussing topics. Imagine, for example, a Pleiades place page (Foss et al. 2013) or an entry for an artifact at Arachne (German Archaeological Institute and University of Cologne 2013) including a feed of bibliographic citations discussing that place/artifact that automatically updated when new articles were added to the comprehensive bibliography. A similar initiative could systematically add access to scholarly literature in ancient studies to widely used information resources on the web such as Wikipedia.
There are a number of linked open data building blocks in place for bibliographic uses that could facilitate the construction of a comprehensive linked open bibliography for ancient studies. Libraries have a longstanding interest in unique identifiers and controlled vocabularies, many of which have translated to the linked open data environment fairly seamlessly. Library name authorities are now collected at the Virtual International Authority File (OCLC 2013), a collaborative resource which has shown itself open to the addition of new names and formats as scholarly disciplines present the need for them (for example, Syriac scholars: (Smith-Yoshimura and Michelson 2013)). There are several standard numbers for monographs, the most linked open data friendly of which is currently the OCLC number, since the open WorldCat database provides stable URIs for items by OCLC number as well as providing RDFa for many entries. (Wallis 2012) Control numbers for scholarly bibliographic items that are not monographs are less pervasive, but for those journals and other web publishers that have adopted it the DOI is an excellent control number for scholarly articles and other publications. (International DOI Foundation 2013) For controlled subject vocabularies, the Library of Congress Subject Headings are available as linked open data, (Library of Congress 2013) and the Getty Center has recently announced a plan to open access to its controlled vocabularies for art and archaeology. (Vocabulary Program, Getty Research Institute 2013)
Most of these building blocks are relevant to bibliographic citations as a whole, with no special focus on ancient world studies. Could they be incorporated into an existing bibliographic index in use by scholars of the classical world? L’Annee Philologique is a long-standing and comprehensive resource with special strengths in classical philology, but it is not open. (Rebillard 2013) The Archaologische Bibliographie of Projekt Dyabola has an open version (Zenon DAI: (Deutsches Archaologisches Institut 2013)) and an existing, fairly robust, subject classification, but its subject coverage is limited to archaeology. Neither has yet announced an exploration of linked open data, although L’Annee’s work with the Classical World Knowledge Base is certainly in the ballpark. (Rebillard, Chandler, and Ruddy 2013)
TOCS-IN is another bibliographic resource for ancient Mediterranean studies (with disciplinary coverage closely modeled on L’Annee Philologique) that is open-access, and run through crowdsourcing: volunteers add bibliographic citations for specific journals they agree to cover. (Matheson and Poucet 2013) Following the LAWDI 2012 workshop I experimented with placing the TOCS-IN database content into the Zotero bibliographic tool (a write-up is here: (Acheson 2012b)) The experiment was not a success because of current technological limitations of Zotero for large citation sets; I tried to add 80,000 citations to a Group Library and the sync server was overwhelmed. Another attempt in this vein could be made with more institutional support, using an open-access bibliographic repository housed on an institutional server – one currently available alternative is BibServer, which is linked open data compliant. (Open Knowledge Foundation 2012)
How can individual scholars contribute towards a linked open bibliography for ancient studies? My first recommendation is to expose bibliographic citations on the open web, ideally in a format that is friendly towards linked data, if you are developing a scholarly internet-based project or publication. Zotero, which allows the export of citations in BIBO format, is already widely used in the humanities in American academia. (Roy Rosenzweig Center for History and New Media 2013) The Ancient World Open Bibliographies project (Acheson 2013) has been collecting open-access bibliographies in Zotero since October 2010, and increasingly scholars are creating these bibliographies within Zotero in the first place. Zotero can also serve as a storage location for a bibliography that is presented as an inherent part of a digital project: the online bibliography on Evagrius Ponticus by Joel Kalvesmaki of Dumbarton Oaks is an excellent model to follow.(Kalvesmaki 2013) Scholars can also include links to the stable URIs for people, monographs, or journal articles if they are posting bibliographic citations in web environments; I wrote some recommendations for good linking practices for bibliographic citations in a blog post shortly after LAWDI 2012.(Acheson 2012a)
A linked open bibliographic citation database for ancient studies would benefit scholars (and their students) directly, especially those at smaller or less wealthy institutions which do not subscribe to indexes, but it would also serve as a form of outreach to the broader public. While the ancient world has much inherent interest, as an academic discipline classics and related fields are also vulnerable to the charge of being old-fashioned and thus irrelevant. A linked open bibliographic database as proposed here could provide a bridge between popular (and undergraduate level) discussions of topics on the web and the more formal publications of the academic world. Academics should embrace the semantic web, as linked open data is a natural extension of the traditional scholarly practices of citation and weaving webs of references.
Works Cited
Acheson, Phoebe. 2012a. “LAWDI 3: Good Linking Practices for Bibliographic Stuff.” Becoming A Classics Librarian. http://classicslibrarian.wordpress.com/2012/06/13/lawdi-3-good-linking-practices-for-bibliographic-stuff/.
Acheson, Phoebe. 2012b. “TOCS-IN at Zotero: A Project That Didn’t Work.” Becoming a Classics Librarian. September 20. http://classicslibrarian.wordpress.com/2012/09/20/tocs-in-at-zotero-a-project-that-didnt-work/.
Acheson, Phoebe. 2013. “Ancient World Open Bibliographies.” Ancient World Open Bibliographies. Accessed September 28. http://ancientbiblio.wordpress.com/.
Deutsches Archaologisches Institut. 2013. “DAI Zenon.” Accessed September 28. http://zenon.dainst.org/.
Elliott, Tom. 2012. “Horothesia: Ancient Studies Needs Open Bibliographic Data and Associated URIs.” Horothesia. http://horothesia.blogspot.com/2012/06/ancient-studies-needs-open.html.
Foss, C., S. Mitchell, R. Talbert, S. Gillies, and J. Becker. 2013. “Places: 638753 (Aphrodisias).” Pleiades. May 18. https://pleiades.stoa.org/places/638753.
German Archaeological Institute, and University of Cologne. 2013. “144918: Archaisierende Statuette.” Arachne. Accessed September 24. http://arachne.uni-koeln.de/item/objekt/144918.
International DOI Foundation. 2013. “The Digital Object Identifier System.” Doi.org. Accessed September 24. http://www.doi.org/.
Kalvesmaki, Joel. 2013. “Bibliography - Guide to Evagrius Ponticus.” Guide to Evagrius Ponticus. Fall. http://evagriusponticus.net/bibliography.htm.
Library of Congress. 2013. “Library of Congress Linked Data Service Authorities and Vocabularies.” Library of Congress. Accessed September 24. http://id.loc.gov/.
Matheson, Philippa M. W., and Jacques Poucet. 2013. “Tables of Contents of Journals of Interest to Classicists.” Accessed September 28. http://projects.chass.utoronto.ca/amphoras/tocs.html.
OCLC. 2013. “VIAF Virtual International Authority File.” VIAF. Accessed September 24. http://viaf.org/.
Open Knowledge Foundation. 2012. “Bibserver.” GitHub. https://github.com/okfn/bibserver.
Rebillard, Eric. 2013. “APh - L’Année Philologique.” Accessed September 28. http://www.annee-philologique.com/.
Rebillard, Eric, Adam Chandler, and David Ruddy. 2013. “Classical Works Knowledge Base.” September. http://cwkb.org/.
Roy Rosenzweig Center for History and New Media. 2013. “Zotero | Home.” Accessed September 28. http://www.zotero.org/.
Smith-Yoshimura, Karen, and David Michelson. 2013. “Irreconcilable Differences? Name Authority Control & Humanities Scholarship.” Hangingtogether.org. http://hangingtogether.org/?m=201303.
Vocabulary Program, Getty Research Institute. 2013. “Getty Vocabularies as LOD.” The Getty Research Institute. August 27. http://www.getty.edu/research/tools/vocabularies/lod/.
Wallis, Richard. 2012. “Get Yourself a Linked Data Piece of WorldCat to Play With.” Data Liberate. http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/.
©2013 Phoebe Acheson. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.3 (2014)
Linked Data in the Perseus Digital Library
Bridget Almas, Alison Babeu, and Anna Krohn
Overview
The Perseus Digital Library is currently working towards making all of its data available according to the best practices outlined by Heath and Bizer (2011).
We started by thinking carefully about the URIs that we are using to name and address the Perseus texts, catalog metadata, and other data objects from the Perseus Digital Library, so that we could feel reasonably confident in ensuring that these URIs will be stable and properly dereferenceable. We solicited and took into account feedback from members of the digital classics community on our approach to the definition of our URIs.
Once we completed the step of defining our URI schemes, our next priority has been to embark on publishing stable URIs for the various pre-existing resources in the library. Once we have completed this step for all of the major resource types in the library, we will begin to alter the way in which the resource content is represented to advertise its linkable features via RDF-A.
We decided upon this incremental approach because, given the large volume of resources in the Perseus Library, and the limited amount of manpower to get the work done, we felt it would be most beneficial to our own work, and our community of users, to publish our URIs as we go, and not wait until delivery of all the underlying resources could also be made compliant.
URIs in the Perseus Digital Library
As of this writing, we have released URIs for texts, citations and bibliographic catalog records. Work is in progress on authors, names and place entities, Greek and Latin lexical entities, and artifacts and images. Future efforts will include a variety of annotation types.
All Perseus data URIs are published under the http://data.perseus.org URI prefix, followed by one or more path components indicating the resource type, then a unique identifier for the resources, and an optional path component identifying a specific output format for the resource.
Citations and Texts
The Perseus stable URIs for citations and texts leverage Canonical Text Services (CTS) URNs (Blackwell and Smith), enabling us to take advantage of the CTS data model while still supporting Linked Data standards.
Individual passage citations can be retrieved at URIs which adhere to the following syntax:
http://data.perseus.org/citations/<CTS PASSAGE URN>[/format]
Currently supported data formats for citations are HTML and XML. In the future RDF/XML and JSON-LD may also be supported.
The URI syntax for an entire text, without a passage citation is:
http://data.perseus.orgs/texts/<CTS TEXT URN>[/format]
HTML is the only currently supported format for full text URIs, but the XML format will be available soon.
Combining CTS and URI standards
By combining the CTS and URI standards we produce semantically meaningful URIs for
texts and citations. This is maybe best illustrated by example.
The following URN identifies the notional work, Homer's Iliad, without reference to a specific edition or translation of that work:
urn:cts:greekLit:tlg0012.tlg001
We can append an edition identifier to the above URN to specify the unique resource which is Perseus' TEI XML version of Homer's Iliad that is identified in the Perseus CTS inventory as 'perseus-grc1'.
urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
The next URN identifies Book 1 Line 100 of the notional work the Iliad:
urn:cts:greekLit:tlg0012.tlg001:1.100
And this one identifies that line in the specific Perseus edition:
urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100
Any of these URNs can be resolved at a stable URI by prefixing them with the http://data.perseus.org URI prefix and path which identifies the resource type (i.e. in this case 'text' or 'citations'):
http://data.perseus.org/texts/urn:cts:greekLit:tlg0012.tlg001
http://data.perseus.org/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001:1.100
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100
Note that, per the CTS standard, if you request a URN for a notional work without including an edition or translation specifier, then we return the default edition for that work in our repository, which in this case happens to be the perseus-grc1 edition.
Although not yet implemented, in the future we will take advantage of the subreference feature of CTS URNs to support URIs for every word or contiguous sequence of words in a text, for example:
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.1@μῆνιν[1]
You can explicity link to the TEI XML format for the citation, rather than the default HTML display, by appending the optional format path to the URI:
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100/xml
Catalog Records
We currently publish URIs for work and edition/translation level records, as well as the CTS textgroup. URIs for authors, editors and translators are forthcoming. We leverage the CTS URNs for the texts in the catalog record URIs. The URI for a catalog record can be distinguished from that of the text itself by the 'catalog' path element:
http://data.perseus.org/catalog/<textgroup urn>[/format]
http://data.perseus.org/catalog/<work urn>[/format]
http://data.perseus.org/catalog/<edition urn>[/format]
http://data.perseus.org/catalog/<translation urn>[/format]
So for example, the following are the canonical URIs for objects in the CTS hierarchy for Homer's Iliad:
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001.perseus-eng1
You can explicity link to an ATOM feed, rather than the default HTML interface, of the catalog metadata for a textgroup, work or edition/translation by appending the optional format path to the URI:
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001/atom
Support for alternate formats of RDF and JSON for catalog records is forthcoming.
Publication
The URIs themselves are currently published in the user interfaces of the Perseus Digital Library. For texts and citations, they are included in the "Data Identifiers" widget which appears on any text display:
Figure 1: Data Identifiers Widget.
For the catalog records, they appear on the top of each catalog display:
Figure 1: Catalog Record Display.
VoID files for all URIs are forthcoming.
HTTP Responses
The default response for any URI request whose HTTP Headers indicates that the calling client accepts text/html is a redirect, specifying the HTTP 302 response code, to the HTML display for the page for the requested resource in the corresponding Perseus user interface. As discussed above, requests for specific supported formats (currently xml for citations, atom for catalog records) can be made by appending a path element for the format to the resource URI. The response for these requests will typically be the resource contents in the requested format, and an HTTP 200 response code.
In order to enable people to cite textual resources which have CTS URNs assigned, but which are not currently digitized in the Perseus Digital Library, we redirect URIs referencing these resources to the Perseus Catalog. The response code for these redirects is HTTP 303 (See Other). If we have a catalog record for the requested resource, the target of the redirect is that catalog record, which may contain links to other locations at which you can find the actual text (such as the Internet Archive or Google Books, etc.). Although not yet implemented, in the future, if a resource not found in the Catalog is requested, we plan to redirect to an interface through which data for the resource can be submitted for inclusion in the catalog.
Resource Contents
As mentioned previously, currently the resources served at the Perseus URIs do not advertise any linked/linkable data contained within them via RDF. This is essential for full compliance to linked data best practices and is on our roadmap for future releases of the Perseus interfaces to the data.
Data Sharing Initiatives
In our efforts to connect with other groups in the scholarly and library communities, the Perseus Digital Library has made our authority record data for classical authors available to the Virtual International Authority File. The contribution of names from our author records will expand the VIAF name clusters, adding different forms of a given author’s name, and assist in VIAF’s goal of building truly international authorities that are useful to libraries and scholars. This relationship will also help the catalog to provide links to the VIAF clusters so as to make as much information about an author available and forward the further development of the Semantic Web.
Works Cited
Blackwell, Christopher and Neel Smith (2012). An overview of the CTS URN notation. The Homer Multitext. Available at http://www.homermultitext.org/hmt-doc/cite/cts-urn-overview.html.
Heath, Tom and Christian Bizer (2011). Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. San Rafael: Morgan & Claypool.
©2014 Bridget Almas, Alison Babeu, and Anna Krohn. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.4 (2014)
The Herculaneum Graffiti Project
Rebecca Benefiel and Sara Sprenkle
The Herculaneum Graffiti Project has two main aims: to digitize hundreds of handwritten wall-inscriptions of the first century AD and to provide a resource that will enable these graffiti to be studied in the context of their location in this ancient city.
Herculaneum was destroyed in the year AD 79 by the volcanic eruption of nearby Mount Vesuvius. Before that catastrophe, it had been a thriving seaside town on the Bay of Naples in central Italy. Messages written throughout the town attest to a lively interest among the area’s inhabitants in writing and communicating thoughts publicly. The archaeological site preserved roughly 250 of these texts, written on the walls of every type of building. The texts feature a remarkable variety of content, from greetings to friends to grocery lists, from drawings of gladiators to comments on philosophers. These were collected between 1929 and 1946 and were recorded in the Corpus Inscriptionum Latinarum, vol. IV, Supp. 3, but without a map of the site and with minimal to no illustration.
The objectives of the Herculaneum Graffiti Project include reexamining these inscriptions, recording their physical data, and making these inscriptions widely accessible via digital publication online. In designing a search engine that will allow scholars and the public to search for graffiti at Herculaneum, our aim is to move away from viewing a graffito as simply a brief text or isolated marking and instead move toward understanding these messages as textual artifacts within a larger social framework for communication. A concurrent aim is to record and highlight the presence of figural graffiti, drawings that were incised into wall-plaster in the same manner as textual messages but were not recorded systematically.
EAGLE, The Electronic Archive of Greek and Latin Epigraphy
The Herculaneum Graffiti Project works in collaboration with EAGLE, a federation of epigraphic databases that together will ultimately make all Greek and Latin inscriptions through the sixth century CE available and publicly accessible online (http://www.eagle-eagle.it; http://edr-edr.it). The fieldwork and analysis that will be accomplished in Herculaneum by HGP will allow for detailed contributions to EAGLE, where a comprehensive set of metadata for each inscription will be available. The HGP website will provide a complementary way to search for graffiti and will display basic information for these inscriptions, each with a link to the respective entry in EAGLE.
Inscriptions and Wall-inscriptions
The epigraphische Datenbank Heidelberg (EDH), one of the four main databases of EAGLE, recently introduced a visual element on its homepage, which now displays a map marking the locations of the inscriptions it contains (http://edh-www.adw.uni-heidelberg.de). A map like this, which locates the provenance of inscriptions by town or site, is possible in Google Earth or Google Maps. At Herculaneum, we are working with a different scale of context, identifying not just the city where inscriptions are found, but where in the city inscriptions are found, from block to block and property to property, indoors and outdoors. This city-wide perspective provides a new vantage point and an approach that could eventually be applied to inscriptions in other ancient cities as well. Herculaneum provides a special opportunity to study the presence of writing throughout the city since these wall-inscriptions, inscribed into a non-moveable surface, were all found in situ. By using the availability of precise location information for this epigraphic material to create a spatial visualization of the presence of graffiti throughout the city, we hope to open up new types of questions about the epigraphic landscape of Herculaneum.
Text and Image
Our project also seeks to provide a solution to the problem of how to organize, store, and retrieve non-textual inscriptions. The search capacities for most databases are text-driven. Yet, text was just one part of the habit of writing on walls. Drawings were incised into wall-plaster in a similar manner, and these are more problematic to include in databases. The only way to search for and find such a graffiti drawing at present is to know how it is described (e.g. navis, viri facies). With the HGP, the researcher will have two other possibilities to track down such drawings. First, there will be a clickable hotspot at locations where ancient graffiti are present. By clicking, the researcher will be presented with all of the graffiti in that location, both textual and figural. Secondly, the scholar can search for all figural graffiti or can search by class of drawing (human motifs, animal motifs, boats, etc.). Results will list all appropriate examples and provide a map of their locations.
Search capabilities
We have therefore designed a search engine that allows a researcher to search for graffiti in multiple ways.
If one is interested in a particular house or property, it is possible to search for graffiti by location, either clicking one or more properties directly on a map of the site or selecting a property from a pull-down menu.
It is also possible to search by class of property (house, tavern, shop, or workshop). The search will return all examples found in that particular category of space, with individual locations displayed in both list form and highlighted on the city map.
As mentioned above, one can search by type of drawing. This search will return individual images, scenes with more than one image, or drawings that have text associated with them. Thumbnails will be available on the HGP site. By clicking on the thumbnail, the viewer will be taken to the larger, copyright-protected image on the EAGLE website.
Finally, there is the traditional search by text or keyword. Executing this search will result in a list of examples that match the search terms, as well as a map that displays the locations of all results.
Prototype of Graffiti within Insula I.8 at Pompeii
The Herculaneum Graffiti Project is under construction and will have its first field season in summer 2014. Currently we have a working prototype limited to one city-block or insula in Pompeii: Insula I.8, a city-block of mixed zoning located near the center of the city. This prototype illustrates the search capabilities and the format in which results will be displayed and draws on the graffiti of Pompeii, Insula I.8, which have already been entered in EAGLE.
Herculaneum coming soon.
Figure 1. Plan of Insula I.8 at Pompeii
Figure 2. Sample results screen.
©2014 Rebecca Benefiel and Sara Sprenkle. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.5 (2014)
The Homer Multitext and RDF-Based Integration
Christopher W. Blackwell and D. Neel Smith
The Project
The Homer Multitext (HMT) is an international collaboration aimed at recovering and documenting the history of Greek epic poetry based on primary source documents, particularly the fragmentary papyri from late Antiquity and the annotated Byzantine codices of the Homeric Iliad. The data generated by the project consists of image files, transcriptions and translations of Greek and Latin texts in poetry and prose, commentary texts, and relationships among these. The digital library architecture that the project has developed since 2001 to manage this work is called CITE for Collections, Indices, Texts and Extensions.
Through participation in the LAWDI workshop at New York University in the summer of 2012, the HMT’s editors recognized the significant potential of RDF triples not only as a means of linking between projects, but of capturing and integrating the project’s data for internal use. The simplicity of RDF, combined with its flexibility, freely available tools, and widespread support, moved us to integrate RDF into the heart of our workflow and architecture.
Separation of Concerns
Integrating a large, diverse, and evolving body of data, under development by a widely distributed group of scholars at many stages of their career, from first-year students of Greek to tenured Professors at major universities, requires rigorous attention to separation of concerns. We have tried to separate scholarly activities cleanly, and associate with each activity an archival data format most appropriate for it.
For example, it is most convenient to edit a transcription of a Greek texts as a valid TEI-XML document. Subsequent analysis of that document, however, is made considerably more difficult by the arbitrarily deep hierarchical structure of any non-trivial XML text; for many kinds of analysis, processing a flattened, tabular format is preferable. For serving, querying, and sharing the archival material, an RDF triplestore is most efficient and broadly useful. For presentation of data delivered to web browsers for human or machine consumption, a combination of XML, XSLT, and CSS is most convenient. Since 2012, much of the development on the CITE architecture has focused on a test-driven environment for a publication cycle of edit, test, integrate, compile, serve, and format.
Task
Format
Tools
Validation
Editing
Texts
TEI-XML
Oxygen, vel sim.
RelaxNG Schema
Collections
Plain text, comma- or tab-delimited
Google Fusion Tables, Git Online Editor, &c.
csv/tsv parsing libraries
Building & Integrating
Testing Validity of Texts
flattened, tabular data
Gradle build-system
Custom scripts, Perseus Morphological Service
Integrating Texts, Data, and Images
RDF triples
Gradle build-system
Custom scripts
Publishing
Serving Data
RDF triples
Fuseki SPARQL endpoint, vel sim.
Discovery, Query, Retrieval
RDF triples > XML
CITE Servlet
End-user display
Citations > XML Fragments > HTML
CITEKit
URN Citation
Every object in the HMT can be cited by a URN—CTS URNs for texts, CITE URNs for objects and images.1
A Model of “Text”
The most complex data in the HMT are the texts, and for this separation of concerns to work we have to be working with a conceptual model of “text” that allows us to move from hierarchical XML to tabular data to RDF and back to hiearchical XML without loss.
For our purposes, we define a “text” as an “ordered hierarchy of citation objects”, following the OHCO2 defined by Neel Smith and Gabriel Weaver.2 By prioritizing units of citation over any other hierarchy, we can guarantee the most important scholarly activity: citation and retrieval. Any other content elements or orthogonal views of a text can be accomodated by this model, to a greater or lesser degree of granularity, depending on editorial decisions about citation.
OHCO2 is implemented in the CITE architecture through the Canonical Text Services protocol, which defines a structure for a catalog, a small number of valid requests for discovery and retrieval, and the format of responses to those requests.
Collections and Images
CITE defines a “collection” as a group of data objects sharing a defined set of fields. Each object has a URN, and named fields defined in a catalog. Collections may be ordered or unordered; in an ordered collection each object has one field that defines its place in a sequence with an integer value.
Sub-references to URNs
CITE and CTS URNs define texts, collections, or images. Because scholarship demands citation to specific parts of objects—passages of text, regions-of-interest on objects, particular fields of a data-object—all CITE URNs may include a sub-reference, providing arbitary granularity, specific to the object defined by the URN.
Type
URN
Points to…
Sub-reference?
CTS Text
urn:cts:greekLit:tlg0012.tlg001.msA
Homer, Iliad, edition of Manuscript A
none
CTS Text
urn:cts:greekLit:tlg0012.tlg001.msA:1.1
Homer, Iliad, edition of Manuscript A, Book 1, Line 1
none
CTS Text
urn:cts:greekLit:tlg0012.tlg001.msA:1.1@μῆνιν
Homer, Iliad, edition of Manuscript A, Book 1, Line 1
the string “μῆνιν”
CITE Image
urn:cite:hmt:vaimg.VA052RN–0053
hmt namespace, vaimg collection, image VA052RN-0053
none
CITE Image
urn:cite:hmt:vaimg.VA052RN–0053@0.1381,0.4192,0.3954,0.0368
hmt namespace, vaimg collection, image VA052RN-0053
a rectangular region-of-interest
CITE Object
urn:cite:hmt:venAsign.10
hmt namespace, venAsign collection, item 10
none
CITE Object
urn:cite:hmt:venAsign.10@GreekName
hmt namespace, venAsign collection, item 10
the contents of the field GreekName for this item
Doing the Work at Build-Time
Validation and Testing
We use Gradle to build our .ttl file, compiling XML texts, and collections and indices saved as .csv or .tsv files. Our build-scripts perform validation on XML files as well as other domain-specific tests. HMT-MOM (for “Mandatory Ongoing Maintanance”) includes scripts that enforce a specified canon of legitimate Unicode characters for Greek texts, ensure the referential integrity of URN values in indices and CITE Collection data structures, and support further manual review by providing visualizations of the state of completion for each folio our collaborators edit. HMT-MOM also does linguistic checking, matching each word-token against the Morpheus morphological parser; words that fail to match must be identified as non-standard forms actually present on a manuscript, non-lexical strings (numbers, case-endings, etc.), or new Greek vocabulary to be entered (ultimately) into a new lexicon of the language.
Inferencing
The URN-syntax of all CITE citations captures hierarchical relationships between group/work/edition/citation (for texts) or namespace/collection/object (for data objects and images). The citations show us that urn:cts:greekLit:tlg0012.tlg001.msA:1.11 (Homer, Iliad, MS. A edition, Book 1, line 11), belongs to urn:cts:greekLit:tlg0012.tlg001 (Homer, Iliad), and so forth.
Earlier versions of our CITE services—perl, Cocoon, eXist, AppEngine—were processor-intensive, sorting out hiearchical relationships a N-levels of depth on the fly. Some of the solutions to implementing a generic architecture for complex data were clever.
In the current implementation of CITE/CTS services, we take advantage of RDF to avoid cleverness at all costs. The Gradle build citemgr that processes our XML texts, tab-delimited and comma-delimited collections and indices, and their catalogues, and makes explicit at build-time every relationship necessary to capture the model of a text or collection-object.
A complete list of the RDF verbs used to describe the Homer Multitext data is available through this query:
SPARQL Endpoint
http://beta.hpcc.uh.edu:3030/ds/
Query
select distinct ?v where { ?s ?v ?o . }
Building the CITE Services, then, is an exercise in constructing sufficient SPARQL queries to retrieve triples, based on their URN subjects.
We have found this to accelerate our speed of developing and exposing HMT data, using a CITE service written as a Java Servlet, with a Fuseki triple-store. The build-process that constructs a .ttl file of 443,000 lines, containing all of our HMT data (texts, objects, images), currently takes one minute, 13 seconds, on a three-year-old Mac Pro.
With this system, we have a very clean separation of concerns between archival data, served data, the network service, and end-user applications, with each standing entirely on their own. The archival texts and data are complete, of course, and end-users can retrieve them by resolving citations through the CITE service, and our RDF storage captures and makes explicit all intrinsic relationships among objects in our digital library. This strikes us a a clean, open, and forward-looking approach.
The cost is the time spent building the .ttl file (which is inconsiderabe), and inefficiency in the middle layers, between the CITE Service and the SPARQL endpoint. It takes 10 SPARQL queries to retrieve and re-assemble one citation-node of a text, in response to a CTS GetPassagePlus query.
This approach may not scale. In our ongoing collaboration with the Department of Informatik at Leipzig University, which is working toward implementing a CTS library containing tens of thousands of books, we are finding that the computing cost of making an averge of 10 SPARQL queries for each requested citation-node, when the SPARQL server is hosting millions of statements, might require a more efficient, more clever, solution that works directly with URNs at query time. Alternatively, a client that connects to a SPARQL end point using web sockets may solve what is primarily an I/O bottleneck. In either case, the value of having cleanly separated concerns—data, integration, storage, services, applications—will be even more apparent.
Sourcecode and Data
All data for the Homer Multitext is freely available.
Package
URL
Direct download of archived images
http://amphoreus.hpcc.uh.edu
Nexus Repository for versioned artifacts
http://beta.hpcc.uh.edu/nexus/index.html
HMT-XML: working repository for project data
https://github.com/neelsmith/hmtarchive
CITE-Manager: integrate, test, and build RDF from project data
https://github.com/neelsmith/citemgr
CITE-Servlet: CITE/CTS Services implemented as a Java servlet, querying a SPARQL endpoint
https://github.com/neelsmith/citeservlet
CITEKit: resolve CITE/CTS URNs to their objects via AJAX in HTML
https://bitbucket.org/Eumaeus/citekit
Notes
1 Blackwell, C., and D.N. Smith. “A Gentle Introduction to CTS & CITE URNs.” Homer Multitext Project Documentation (November 2012). http://www.homermultitext.org/hmt-doc/guides/urn-gentle-intro.html.
2 Smith, D. Neel, and Gabriel Weaver. “Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture.” Text Mining Services (2009): 129.
©2014 Christopher W. Blackwell and D. Neel Smith. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.6 (2014)
Moving the Ancient World Online Forward
Tom Elliott and Chuck Jones
From its very beginning it has been the mission of the library of the Institute for the Study of the Ancient World not only to acquire traditional paper-based scholarship but also to develop digital resources as a fundamental component of its collections, integrating the world’s oldest languages, scripts, and cultures with the newest technologies. Under Charles Jones' leadership as ISAW's Head Librarian, it has pursued this agenda in partnership with not only NYU Library's Digital Library Technology Services team, but also ISAW's own Digital Programs department (directed by Tom Elliott), both of which are widely regarded as innovative thought leaders in the fields of digital scholarly publication and preservation.
In this context, a key concern remains the identification, description, cataloging, and accessibility of the growing global corpus of digital scholarly resources for the widest possible audiences. Both retrospective digitization and born-digital scholarly publishing are producing a vast, varied trove of documents, databases, web sites, and other digital resources of value for the study of antiquity. ISAW's research mission and spatio-temporal footprint demand that these be brought within the reach of its faculty, students, and visiting scholars in an organized and useful manner, continuously updated. Both the nature of the material, and the transformative vision pursued by ISAW, dictate that we address this concern not only for our own internal needs, but also for an extramural, global audience. That much of this material appears on the open web, outside established library acquisition and cataloging channels, reveals tension with traditional library practises. What we need, therefore, is an easy way to capture and describe digital resources anywhere on the open web that integrates smoothly with existing and emerging library systems, with standard scholarly research management software, and with ISAW and NYU's public facing websites and digital publications.
ISAW took its first public step in support of this goal in 2009, when Jones launched AWOL: The Ancient World Online1 as an extension to Abzu2, his pioneering guide to networked open access data relevant to the study and public presentation of the Ancient Near East and the Ancient Mediterranean world. AWOL is now firmly established as the primary vector of information on digital and digitized antiquity, with more than five thousand subscribers to its daily update and more than 750 unique visitors daily over the past year. It indexes and describes more than twenty existing and emerging scholarly resources each week, and its running list of open access serial publications includes well over a thousand titles.3 It regularly lists emerging and existing born digital projects, and it publicizes repositories of digitized scholarship relating to antiquity with a cumulative content of thousands of volumes.
With the assistance of a grant from the Delmas Foundation, we now seek to expand upon AWOL's current, lightweight but labor-intensive blog-based format to realize the vision outlined above, i.e., a comprehensive and sustainable combination of software, process, and people that can help scholars and students around the world find and use the full richness of the new digital scholarly landscape. A short, illustrated description of the current and proposed workflows follows.
The Ancient World Online reaches its audience through a blog4 and via syndication of the news feed from the blog to email and social media outlets. As illustrated in Figure 1, Jones identifies web content suitable for inclusion in AWOL and then copies it manually into the blog via the standard web interface provided by Google's Blogger service, which hosts the blog.5 He edits the blog content as necessary and then publishes it, making it immediately available to AWOL's blog audience over the World-Wide Web. Blogger pushes summary information about each publishing update to Google's Feedburner service,6 which in turn automatically publishes both an update feed (to which users can subscribe using third-party feed reading tools like Feedly.com) and a daily email digest. AWOL has 6,096 current subscribers7 to the daily email syndication of the news feed8; 277 persons connected to the Facebook syndication of the news feed ; and 585 persons connected to the Twitter syndication of the news feed9. On the blog page itself in the last six months there have been 184,795 unique visits (=1,200 /day) of which 31,622 are returning visits.
Figure 1.
Our proposed enhancements to AWOL will maintain the core delivery mechanisms via the existing blog and Feedburner, while adding additional publication channels via NYU library systems and OCLC Worldcat,10 as well as ISAW's own website. We will also streamline and improve content capture and preparation through automation and the use of a structured, collaborative web database designed for web and bibliographic citation.
Figure 2.
As illustrated in Figure 2, instead of manually copying information about websites into the blog, the editor will use the free, open-source Zotero citation manager11 to capture snapshots of individual web resources and to annotate these with bibliographic information and categorical tags. This data will be automatically synchronized with a free group account on the Zotero server, thereby enabling the editor to involve interested third-party collaborators and the existing ISAW cataloging staff in capturing, refining, and updating data for publication. Zotero server's open Application Programming Interface (API)12 will automatically make the resulting data available in a variety of formats suitable for further use online and in other citation management systems like ProCite and EndNote, thereby creating a new dissemination channel for a broader set of audience use cases (not shown in diagram). One of these Zotero output formats – an open standard called Bibliontology RDF13 – will provide the input to two conversion programs ISAW developers will create: one to transform the AWOL bibliographic content into bibliographic data suitable for submission to NYU's library catalog systems and the other to use Blogger's open API14 to post new or updated content to the existing blog without manual intervention. ISAW's website and other digital publications will also take advantage of the Zotero API outputs to incorporate and reuse AWOL content as appropriate.
Notes
1 ISSN 2156-2253 http://ancientworldonline.blogspot.com.
2 http://www.etana.org/abzubib
3 http://ancientworldonline.blogspot.com/2012/07/alphabetical-list-of-open-access.html.
4 http://ancientworldonline.blogspot.com.
5 http://www.blogger.com.
6 http://feedburner.google.com.
7 As of 23 September 2013.
8 As of 23 September 2013, https://www.facebook.com/AncientWorldOnline.
9 As of 23 September 2013, @AWOL_tweets
10 http://www.worldcat.org/.
11 http://www.zotero.org.
12 http://www.zotero.org/support/dev/server_api/v2/start.
13 http://bibliontology.com/.
14 https://developers.google.com/blogger/.
©2014 Tom Elliott and Chuck Jones. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.7 (2014)
Linked Open Data and the Ur of the Chaldees Project
William B. Hafford
Research, at its core, is the act of making connections among data – building up systematically to a supportable idea or conclusion. The basic components, therefore, are the data themselves, the individual points that demonstrate the concept that unites them. To restate the matter: Researchers are dependent on their data.
Yet, data are not always easy to acquire, and many different researchers may end up gathering the same or similar data many different times, slowly building toward their own conclusions. If the data were readily accessible, already linked to similar instances or searchable in such a way as to complete the larger scale grouping for analysis, research into that data would be faster, easier, and would allow for less duplication of effort.
Such is one major idea behind linked data on the web. Imbedded hyperlinks in online documents have long led us to other documents that might be of use in finding more information, but data points within those documents have not been quickly extractable, and digital data repositories have for long periods existed in relative isolation. If computers can find and access similar data across many data stores, research becomes far more powerful.
As researchers of the ancient world, Archaeologists face the problems of any researcher: often the act of gathering data from various reports and repositories, physical or virtual, takes far longer than the process of connecting those data in order to come to some understanding of the ancient concept or practice being investigated. One person studying, for example, figurines from an archaeological site, may comb through field notes for occurrences of the objects, spending days, weeks, or months locating every one. Another person may later go through the same notes for occurrences of amulets or statuettes, covering many of the same items and spending their own days, weeks, or months. Perhaps the work of the earlier researcher has guided them somewhat to make their search more effective, but they likely still would have to go through every field note to see if items that meet the new criteria were missed. In a digital age, this sort of collecting of data can be done very quickly--if the data is arranged in a machine-readable way.
Such is the beauty of linked data on the web. They are published in their own self-defining schemas, related to other schemas wherever possible (Heath and Bizer 2011: p85-86, 99). This allows the computer to make connections across data and across different data stores. It would thus be possible to search not only figurines, statuettes, and amulets from one archaeological site, but from all sites published as linked data on the web.
To facilitate such research, archaeological data--especially those from early excavations that are at the foundation of our understanding of major portions of the ancient world--should be made accessible and machine-readable in a linked (and openly available) manner. This is one of the driving concepts behind the Linked Ancient World Data Initiative, and behind the project being conducted for a particular site entitled Ur of the Chaldees, a Virtual Vision of Woolley's Excavation. This project is jointly conducted by the University of Pennsylvania Museum and the British Museum with lead funding from the Leon Levy Foundation. Sir Leonard Woolley excavated at the ancient city of Ur from 1922 to 1934, uncovering huge amounts of private housing and public buildings as well as religious, industrial, and funerary areas from at least the Ubaid through the Persian periods--some 5,000 years of occupation. In so doing, he created an enormous amount of data, far more than just the tens of thousands of artifacts he recorded and sent to Baghdad, London, and Philadelphia.
During the excavation, more than 15,000 field catalogue cards were produced, covering more than 25,000 artifacts. Field photographs from the twelve years number 2,350 and at least 4,500 hand-written field notecards were also produced, used later to aid in publication and then stored in the British Museum. The actual publication record was good, with yearly reports appearing in the Antiquaries Journal, and eventually ten volumes on the excavations and nine volumes on the cuneiform texts, though the full series took some fifty years to produce.
These publications have long been held to be the definitive record of Ur and they do indeed hold much vital material. But the volumes, as extensive as they seem, say far from everything that can be said about the site. Moreover, they contain interpretations and only part of the data on which those interpretations were based. Woolley was a good archaeologist, but he could not publish every object nor completely explain every decision when limited to the slow process of paper publication. With the advent of linked open data on the web, it is possible to present it all in machine-readable formats. Furthermore, in this format any grouping made by Woolley or anyone else can be quickly deconstructed and new ones created to allow for still newer interpretations or reanalysis of the old.
The excavation of Ur led to much of our current understanding of the ancient Near East and as such is clearly important to analyze and reinvestigate. But there are many other reasons to aggregate the data from it and others like it. Not only will this allow for new visions, but it will also unite and protect the information.
First, unification: The current information is physically divided. Laws of the early 20th century typically allowed for finds to be split between host nation and excavating institutions. This meant that from Ur, half of the artifacts went to Baghdad and the other half was split between London and Philadelphia. Even the archival documents are relatively dispersed. Gathering the data from all of the institutions will reunite the excavated portions of the city in a virtual space.
Second, protection: In the wake of the second Gulf War the Iraq National Museum was looted, demonstrating the fragility of our hold on physical data even in the modern age. The loss of cultural heritage is tragic, but at least if the information were recorded in a virtual space, there would remain researchable data for the future. Much of the material looted from the museum was eventually returned, but even in the case of some items protected in the Rafidan Bank vaults or moved to other secret locations before the war, there was some damage due to the impromptu storage conditions (chiefly the Nimrud Ivories, see McCauley 7/2/2010).
Even in the western museums, some artifacts have been lost to environmental conditions over decades of storage and a few items are now listed as ‘Not Accounted For’ with no clear understanding of how they went missing. Misplacement, loss, damage, or theft can occur anywhere. This is not to say that such occurrences should be overlooked, but there will likely be a small percentage of loss no matter what actions are taken to prevent it. Thus, every object in the care of museums must be carefully recorded to mitigate loss and to demonstrate the importance of each piece. The data must then be made available to researchers for continued understanding of these objects, individually and in the aggregate.
After the looting of the Iraq National Museum in 2003, the British Museum and the Penn Museum began to look at their collections in hopes of assisting Baghdad with understanding what may have gone missing. Ur was a site that provided many of the first entries into the Iraq Museum, since the modern country itself was being formed at the time those excavations began. The records of artifacts in the IM were not as complete as they might have been, but if Philadelphia and London could show what they had from this most important site, Baghdad could better assess their own collection. As it turned out, the recording in the two western cities was not as complete as might be hoped either, and thus began a long project to upgrade records with the goal of helping all three museums reunite Ur in a virtual space.
It began by looking to a list of artifacts from the excavations but quickly expanded with the realization that there was much more information that should be digitally shared. Archives, photos, and field notes all held information on how and why Woolley had come to the conclusions he had. Furthermore, the artifacts were often not connected back to their field data, having lost their field numbers. The potential was there to reconnect them and to put them all online for everyone’s use. But, in 2003 digital scholarship had not advanced to the level where such data could easily be published. It was possible, but it would take a great deal of money and time. Funds were slow in coming and small in number. Thus the earliest project years managed to gather only sporadic material such as medium resolution scans of field notes in the British Museum and field photos in the Penn Museum.
Finally, in 2011 with the increased ability of and interest in digital humanities, the Leon Levy Foundation graciously granted funds to conduct an exploratory year during which an assessment of work to be done could be made. The project set about determining how much Ur material was outside of Iraq, how scattered it was, as well as how long it would take to make it all digital. The exploratory year began in 2012, putting all of Woolley's 15,000 artifact cards, many of which covered multiple objects, into a database and separating them to create a list of and primary data on every object excavated and written up at Ur during the excavations. This assessment showed that around 40% of the objects in each museum had no clear connection to their original field data. It recommended that every artifact from Ur in both museums be examined and reconnected to field records wherever possible. They must also be assessed for condition, the need for conservation and/or repairs. Publication of any item must be confirmed and any mistakes referenced in one, easily accessible place.
After the initial work, the Leon Levy Foundation and Hagop Kevorkian Fund provided continuation grants to begin the individual examinations of artifacts as well as to continue scanning and transcribing all field notes and missives from the field as well as all ancient texts found at the site. This work is currently underway in both London and Philadelphia. Baghdad is conducting its own inventories but will hopefully join when their work allows.
As soon as possible, and beginning with portions of the data so as not overly to delay, all of the information will be published with stable URIs and RDF/XML, JSON and/or other machine-readable references, connected to some version of ArchaeoML and/or CIDOC-CRM wherever possible (see Open Context for the format we are currently hoping to emulate). The site created will thus be a record of everything from Ur, essentially a modern publication, but also a research tool that provides ways of interlinking and presenting core data so that more can continuously be learned and published about the site and its history. In this way it will be a growing record, a continually expanding work. It will be searchable textually, visually, and spatially (though this latter aspect will not be possible within the first two years). In other words, any keyword, artifact number, or other indicator can be entered into the site and all occurrences found throughout field notes, transliterated and translated cuneiform texts, catalogues, and publications; photos can be browsed and similar objects called up; and maps can be searched by area, room or tomb with the site populating these spaces with artifacts in context wherever possible.
The site will be open to all and linked to related sites, such as the online catalogues of relevant museums and the Cuneiform Digital Library Initiative. Most importantly, the RDF that describes the data will be part of the overall linked data on the web so that other connections can be made with any other information also published in linked open format. This means that new ways of envisioning the data, new ways of assembling it with similar information from other sites, will be possible, leading to more encompassing and more complete understandings of the Ancient Near East as a whole.
Truly this will be a virtual vision of Woolley’s excavation–and so much more.
©2014 William B. Hafford. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.8 (2014)
ISAW Papers: Towards a Journal as Linked Open Data
Sebastian Heath
Introduction
The present contribution to the set of essays published under the rubric of “ISAW Papers 7” is necessarily self-referential. ISAW Papers, the Institute for the Study of the Ancient World’s digital scholarly journal, is both its topic and its venue. These overlapping roles will prove useful by allowing direct illustration of the progress ISAW has made in implementing the goals with which the journal was initiated. By way of high-level overview, those goals are to publish article-length scholarship that (1) is available at no-cost to readers, (2) that can be reused and redistributed under a Creative Commons license, and (3) that is stored in formats that are very likely to be readable into the far future. Additionally, articles in ISAW Papers should link to stable resources similarly available on the public Internet. This last goal is intended to increase the discoverability and utility of any individual article as well as of the growing network of digital resources available for investigating the ancient world.
In describing progress to date, the following paragraphs will not shy away from raising technical issues. They do not, however, offer complete instructions for deploying Linked Open Data in a journal context nor detailed introductions to the technologies described. The discussion is practice oriented and so makes reference to the articles published to date. This approach and the movement from overview to specifics is intended to introduce readers to some of the opportunities ISAW Papers has recognized and also to the challenges it faces.
To start broadly, the editorial scope of ISAW Papers is as wide as ISAW’s intellectual mission, which itself embraces “the development of cultures and civilizations around the Mediterranean basin, and across central Asia to the Pacific Ocean.” (ISAW n.d.) Temporally, ISAW is mainly concerned with complex cultures before the advent of early modern globalization. Though it is important to note that ISAW does not try to impose strict limits on what falls within its intellectual purview. Indeed, the origins, development and reception of all phases of the Ancient World are fair game at ISAW.
Review and Licensing
Two additional concerns of a scholarly journal - review and licensing - can also be addressed efficiently. ISAW Papers publishes anonymously peer-reviewed articles as well as articles read and forwarded for publication by members of the ISAW faculty. This aspect of the editorial process is made clear for each article. The goal here is to provide a balance between the many benefits that peer review can provide to an author while similarly ensuring that it is neither a barrier to new work nor an impediment to timely publication. In terms of licensing, ISAW asks authors to agree to distribution of their text under a Creative Commons Attribution (CC-BY) license. The same applies to images authors have created on their own or which ISAW creates during the editorial process. We consider such open distribution to be an important component of a robust approach to future accessibility. It is, however, the case that authors have needed to include images whose copyright is held by others. This situation remains a fact of public scholarly discourse. Accordingly, we ask that authors obtain permission for ISAW to publish such images in digital form but do not require explicit agreement to a CC license. As with peer-review, a reasonable balance of current realities and future possibilities is the goal.
Partnership with the NYU Library
Initial public availability takes place in partnership with the New York University (NYU) Library. So for example, the text you are reading now will be accessible via the URI “http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/heath/”. While ISAW has complete responsibility for the editorial process, that is for shepherding an author’s intellectual content into a form that enables both long-term accessibility and immediate distribution, we rely on the Library to provide the infrastructure for that long-term preservation. Each party in this relationship brings its institutional strengths to the endeavor. In particular, it is very useful that the library assigns a Handle to each article (CNRI n.d.). For example, the URL “http://hdl.handle.net/2333.1/k98sf96r” will redirect to whichever URL the NYU Library is using to host ISAW Papers’ first article (Jones and Steele 2011). If a reader follows that link within a few years of the publication of this current discussion, it is likely she or he will be redirected to “http://dlib.nyu.edu/awdl/isaw/isaw-papers/1/.” Further out into the future, the handle may resolve to a different address. But we at ISAW are confident that an institution such as the NYU Library offers a very strong likelihood of ongoing availability. And it is of course the case that we encourage readers and other institutions to download and re-distribute any and all ISAW Papers articles. Such third-party use and archiving, enabled through initial distribution by the Library, will also contribute to the long-term preservation of this content.
An additional result of collaboration with NYU Library staff, particularly my colleagues in the ISAW library, is the creation of individual records in the NYU Bobcat library catalog for each article. This local initiative leads in turn and automatically to the creation of a Worldcat record for each article. Accordingly, “http://www.worldcat.org/oclc/811756919” is the Worldcat “permalink” for the record describing C. Lorber and A. Meadow’s 2012 review of Ptolemaic numismatics. The journal itself has a Library of Congress issued International Standard Serial Number (2164-1471) as well as its own Worldcat record at “http://www.worldcat.org/oclc/756047783”.
Broad Strokes and Specific Citations
There is a future point at which the following short list will describe the main components of a born-digital article published in ISAW Papers:
An archival version in well-crafted XHTML5 that is available through the NYU Faculty Digital Archive (http://archive.nyu.edu).
Links to stable external resources encoded using RDFa, a widely supported standard that is discussed below.
The NYU Library will provide access to a version of the document formatted for reading and with additional User Interface (UI) elements that encourage engagement with the content.
The two new abbreviations in the above list - XHTML and RDFa - can bear further explanation. As is probably well-known to many readers, HTML, specifically its 5th version HTML5, is the standard published by the Worldwide Web Consortium (W3C) that specifies the format of text intended for transmission from Internet servers to web browsers. As a simple description, HTML allows content-creators to specify the visible aspects of a text: e.g., that titles and headings are in bold, that paragraphs are visually distinct by indentation or spacing, and other aspects such as italic or bold spans. For its part, the W3C has quickly become a standards-setting body with global impact. At this moment, HTML5 documents can be directly read - that is rendered into human readable form on screen - by many applications running on many different forms of computing devices ranging from desktops and notebook computers to tablets and phones. It is likely that this easy readability of HTML documents will continue far into the future and ISAW believes some degree of readability for such content is guaranteed in perpetuity to the extent that that can be reasonably foreseen.
XHTML is the variant of HTML that adheres strictly to the requirements of the Extensible Markup Language (XML). XML is in turn a standard that provides more explicit indications of the structure of a text than does HTML. For example, an item in a list in HTML can be indicated by “<li>An item in a list”, whereas XHTML requires that the markup be “<li>An item in a list</li>”. Note the terminating “</li>”, which is required in XML. While a full discussion of XML and XHTML would take up excessive room here, it is fair to say that their added requirements are geared towards enabling more reliable processing by automated agents, meaning the manipulation of the text and rendering of results by computer programs.
At this point in the discussion it is worth highlighting one particular aspect of XHTML that ISAW Papers utilizes extensively. On the public internet, the presence of a “pound sign” or “#” in a web address often indicates a reference to a particular part of a document. When used in this way, the exact part referenced is indicated in the HTML document itself by the presence of an ‘id’ attribute. Meaning that HTML’s ‘p’ element, which is used to mark paragraphs, can be identified by mark up of the form ‘<p id=”p10”> … </p>’. In ISAW Papers, all paragraphs in the main body of an article have such an id and can therefore be directly referenced via URLs. For example “http://dlib.nyu.edu/awdl/isaw/isaw-papers/6/#p3” is a direct link to the third paragraph of M. Zarmakoupi’s (2013) article on urban development in Hellenistic Delos.
Towards Linked Open Data
Most of the discussion so far should be considered as preliminary to a focus on ISAW Paper’s implementation of the principles of Linked Open Data (LOD), principles that were summarized in the Introduction to this set of articles. With that description in mind, ISAW Papers can make some claim to being “5 Star” linked data as defined in Berners-Lee’s fundamental note of 2006. Its articles are available at stable URLs that can be considered URI-based identifiers and XHTML is a machine readable and non-proprietary format. Furthermore, and as only suggested by the short list of “main components,” ISAW Papers does provide RDF. That is, each article has embedded within it statements in the form of ‘triples’ that describe particular aspects of that article’s content. An example will make this aspect of the journal more clear.
The URL “http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/#p70” is a link to the 70th paragraph of G. Bransbourg’s article on market integration in the Roman economy during the imperial period. Looking at the source of that text shows the following XHTML markup:
… the presence of a Tyrian colony in <span class="reference" rel="dcterms:references" typeof="dcterms:Location"><a rel="rdfs:isDefinedBy" href="https://pleiades.stoa.org/places/432815" property="rdfs:label">Puteoli</a></span> …
The ‘<a href="...">’ component of that markup is “plain old” HTML that allows the text of Puteoli to be highlighted in a browser so that a user can follow the link to the Pleiades page. That is standard functionality on the world-wide web. It is the additional markup that makes the meaning of that link machine readable. In English, the semantics indicated here can be stated as, “ISAW Papers 3 makes reference to a location. That location is, in turn, defined at the webpage “https://pleiades.stoa.org/places/432815”, and has the label “Puteoli” in the context of this article.” Similar markup for references appears in this article and in other ISAW Papers articles. For example, the first paragraph of A. McCollum’s note on Syriac geographic knowledge - accessible via “http://dlib.nyu.edu/awdl/isaw/isaw-papers/5/#p1” - contains a reference to the scholar Gregory bar ‘Ebrāyā, with a definition of that individual provided by a link to Wikipedia.
And while the following is beyond the scope of this discussion, it should be noted that the link between Puteoli in this text and the Pleides URI is entered by hand. It is hoped, even assumed, that such named entity recognition will become more automated in the futuure.
RDFa and Triples
Fundamental to the design principles of ISAW Papers is that the markup used here conforms to an existing W3C standard, specifically “HTML+RDFa 1.1” (Sporny 2013), which is itself part of the RDFa 1.1 group of standards (Adida et al. 2013). For its part, “RDFa” is the second abbreviation given in the brief list of “main components” above. It stands for “Resource Description Format in Attributes.” As a very short description, RDFa allows discrete machine-readable statements to be embedded in XHTML. These statements are called “triples” and take the form of:
A subject, in this case ISAW Papers 3.
A predicate or type of information being represented, in this case a relationship between the article and a website that represents a “reference”.
An object, in this case a Location.
To summarize and repeat, the triples indicated by the markup drawn from Bransbourg (2012) read “ISAW Papers 3 references a Location” and further specifies that “The Location is defined at “https://pleiades.stoa.org/places/432815”. Furthermore, these triples use publicly defined vocabularies. In the snippet above, “dcterms:references” indicates that ISAW Papers uses the vocabulary published by the “Dublin Core Metadata Initiative”. For a definition of the term “references” see “http://dublincore.org/documents/dcmi-terms/#terms-references”.
It is not the goal of this discussion to provide a full explanation of RDFa or triples. But it is worth stressing the strategic goal that the use of RDFa forwards. To state that goal simply, ISAW Papers articles intends to represent links to stable resources in such a way that the meaning of those links can be read and used by automated agents. That progress towards this goal is actually being made is indicated by the ability of current tools to read and query the data inherent in the articles published to date. For example, the W3C tool titled, “RDFa 1.1 Distiller and Parser” and available at the time of writing this at “http://www.w3.org/2012/pyRdfa/Overview.html” will recognize the triples in “http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/”. Readers can try this themselves by pasting the article’s URI into the W3C Distiller’s “URI:” field. Doing so will show a large number of links to stable resources. In particular, using any such tool to list the triples in that article will reveal machine readable information related to authorship and subject in addition to clear specification of links to geographic entities beyond Puteoli. Additionally, bibliographic information is shown to be specified using the BibliographicResource and bibliographicCitation terms of the Dublin Core.
Issues in the Implementation of Linked Open Data
The word "towards" in the title of this contribution is intended to communicate to readers that the process of defining how ISAW Papers will implement LOD is not yet finished. Articles are available at stable URIs and do provide machine-readable links to other URIs. Nonetheless, this “Linked Open Data” has not reached a final form.
Keeping to the markup surrounding the reference to Puteoli in G. Bransbourg’s article, that was given as RDFa above, with the semantics of that RDFa “translated” into, admittedly stulted, English. Rendering that RDFa as turtle - another common format for commnuncating triples - gives the following excerpted sequence:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/> dcterms:references [
a dcterms:Location;
rdfs:isDefinedBy <https://pleiades.stoa.org/places/432815>;
rdfs:label "Puteoli"@en
] .
It is possible that no deep expertise in reading turtle is necessary for readers to see that this is an alternate rendering of the English, “ISAW Papers 3 makes reference to a location. That location is, in turn, defined at the webpage “https://pleiades.stoa.org/places/432815”, and has the label “Puteoli” in the context of this article.” There are strengths here. No suggestion is made that the webpage https://pleiades.stoa.org/places/432815 is in fact the site of Puteoli. When you go to the page, you get information about the site, a “definition” as it were, which is why the predicate rdfs:isDefinedBy is used. And again, that a W3C sponsored tool can render the information in an ISAW Papers article is a demonstration of progress towards interoperability.
But it is very important to note that this specific use of the the Dublin Core vocabulary in combination with the RDF Schema vocabulary published at http://www.w3.org/2000/01/rdf-schema# is idiosyncratic. And it is idiosyncratic because there is no universally accepted standard for deploying public vocabularies to describe relationships between documents. There are a number of vocabularies that could be used - some readers will be familiar with Bibo and Cito terms - but their use is not fully settled.
There is also room for progress on the creation of stable URIs for named entities, although many solutions are appearing. It is clear that VIAF (OCLC 2010-13) will be the publisher of identifiers for authors. Pleiades provides URIs for geographic entities. The Perseus Catalog (Perseus Digital Library n.d.) will provide URIs for many ancient texts, particularly those drawn from Greco-Roman culture. Likewise, ISAW Papers will continue to link to identifiers for numismatic concepts established by Nomisma.org (Meadows and Gruber 2014) and welcome progress being made by disciplines such as Syriac studies with its developing portal at http://syriaca.org.
The exact form of references to all such resources should be standardized across projects, or rather, variation in form should be reduced. It is certainly the case that many of the papers in this collection show excellent progress towards that goal. And it is hoped that ISAW Papers can contribute to the development of such standards by highlighting the need for them with usable data. From the particular perspective of this one born-digital journal, agreement on basic issues such as how to specify the semantics of links to well known resources will be a large step towards enabling the deposition of archival versions of all ISAW Papers articles into NYU’s Digital Archive.
Conclusion
That last point can stand as a conclusion to this discussion, intended as it is to capture a particular moment in an ongoing process. ISAW Papers is achieving its motivating goal of distributing high-quality scholarship relevant to the Ancient World. While there is much more work to do, particularly on the experience of reading an article online, it is fundamental that readers have current access to this scholarship at no cost and that it is made available in such a way that ongoing access is likely. Those aspects of the journal themselves adhere to the principles of Linked Open Data. To the extent that articles provide machine-readable data, the specific patterns used should be considered models and suggestions. Their utility will come clear as the data is consumed and as the data conforms more fully to best-practices developed by the wider Linked Open Data community, particularly those parts of that community focused on the Ancient World.
Works Cited
Adida, B., M. Birkbeck, S. McCarron and I. Herman (2013). RDFa Core 1.1 - Second Edition. <http://www.w3.org/TR/rdfa-syntax/>
Bransbourg, G. (2012). Rome and the Economic Integration of Empire. ISAW Papers, 3. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/>
CNRI (n.d.) Handle System. <http://handle.net>.
ISAW (n.d.). Institute for the Study of the Ancient World. <http://isaw.nyu.edu>.
Jones, A. and Steele, J. (2011). A New Discovery of a Component of Greek Astrology in Babylonian Tablets: The “Terms.” ISAW Papers, 1. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/1/>
McCollum, A. (2012). A Syriac Fragment from The Cause of All Causes on the Pillars of Hercules. ISAW Papers, 5. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/5/>
Meadows, A. and E. Gruber (2014). Coinage and Numismatic Methods. A Case Study of Linking a Discipline. ISAW Papers, 7. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/meadows-gruber/>
OCLC (2010-2013). VIAF: Virtual International Authority File. <http://viaf.org>.
Perseus Digital Library (n.d.). The Perseus Catalog. <http://catalog.perseus.org>.
Sporny, M., Ed. (2013). HTML+RDFa 1.1. <http://www.w3.org/TR/rdfa-in-html/>.
Zarmakoupi, M. (2013). The Quartier du Stade on late Hellenistic Delos: a case study of rapid urbanization (fieldwork seasons 2009-2010). ISAW Papers, 6. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/6/>
©2014 Sebastian Heath. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.9 (2014)
Beyond Maps as Images at the Ancient World Mapping Center
Ryan Horne
Background
When the Barrington Atlas (Talbert and Bagnall 2000) reached publication in 2000, it represented the culmination of nearly a decade of development using both traditional cartography and modern GIS systems, providing the first comprehensive coverage of ancient Mediterranean geography since Smith and Grove (1872). The atlas features 102 maps, extensive metadata in a map-by-map directory, and served as the catalyst for the establishment of the Ancient World Mapping Center (AWMC) at the University of North Carolina, Chapel Hill and for the subsequent development of the Pleiades Project.
The goals of the AWMC are in part to refine, expand, and curate the data created for the atlas, to continue research into ancient geography, and to present data in an accessible, instructive form. This includes the production of maps for publications and larger projects such as a series of wall maps. It was during the production of the wall maps that limitations to the AWMC's previous approach of primarily viewing maps as flat images became readily apparent. The sheer scale of the project, coupled with the myriad changes suggested by reviewers, revealed the necessity of an easily reproducible, readily searchable, and scalable system to display and sort the over 200,000 individual objects residing in the center's database. As an outgrowth of this realization, the AWMC began an initiative to create an end-user customizable web application to generate maps and replace the older static images provided by the center. The first result of these efforts, the Antiquity À-la-carte application,
was launched in March 2012 with a major update following in October of the same year.
In order to complete these projects, the AWMC began to catalog, analyze, and offer its data to the larger scholarly community under a Creative Commons CC BY-NC 3.0 license. AWMC's data is primarily geographic and contained within Shapefiles, with coastlines, water coverage, water courses, and other natural and human constructed features adjusted for their state in the ancient world. All of the data was initially offered as Shapefile and GeoTiff downloads that presented easily accessible resources which nevertheless had severely limited potential for dynamic linking, searching, and other complex interactions. Further steps were needed to move beyond these largely static data sets.
As the AWMC is partnered with the Pleiades project, the association of geographic items with Pleiades IDs coupled with a database-driven interface was the obvious next phase in the evolution of the center's offerings. Key to this step was the creation of an interface and the modification of the AWMC's data in order to adhere to URI best practices and 5-star principles as outlined by Tim Berners-Lee (1998 and 2007), including permanence, simplicity, and a naming scheme that is agnostic to the underlying content of data. The alignment of the center's resources with Pleiades IDs for all features, along with adding custom metadata where appropriate, became the foundation for the center's movement towards these goals. The first step at the AWMC was the creation of a PostGIS database to house and display Pleiades data, which is followed by the use of GIS software to identify and add appropriate metadata to the Shapefiles produced by the center (a task which is largely done by hand), then the subsequent importation of completed Shapefiles into the PostGIS database. As this work was underway, a further step towards deeper integration with the linked data community occurred in January 2013, when the AWMC created an API interface to its database using RESTful URIs and joined the Pelagios Project. Due to the earlier use of Pleiades IDs, creating the relevant RDF files and linking to the Pelagios system proved to be a trivial matter (for further technical discussion on the linkages of the AWMC to Pelagios, please consult the AWMC
blog post on the Pelagios site).
Results
An example of the API and linkages can be found at http://awmc.unc.edu/api/omnia/168940, representing the urban area of Rome c. 200 CE. The default splash page is generated dynamically by PHP with a PostGIS backend, presenting a reference map, links to Pleiades and Pelagios, and general information on the entry in a human-readable format as shown below:
Any entry in the database can easily be displayed in JSON, RDF, or WKT by clicking on the provided buttons or appending the desired representation to the end of the base URI as follows:
http://awmc.unc.edu/api/omnia/168940/json
http://awmc.unc.edu/api/omnia/168940/rdf
http://awmc.unc.edu/api/omnia/168940/wkt
Each one of these links leads to a dynamically generated page that presents the desired format and content. This grants easy access to the center's data without creating a local data dump or mirroring data in a remote machine, and can be consumed directly by mapping libraries such as OpenLayers.
The current iteration of Antiquity À-la-carte deploys the results of the API work by providing a web-based GIS application to display, sort, search, and disseminate the center's data. The application is built upon a suite of GIS technologies, exports a user-created map into GeoJSON, KML, CSV, and PDF formats, and can import a custom map generated from GeoJSON objects. The application also allows for the selection of different display languages, time periods, and the creation of new features to meet a user's needs, along with offering a set of interactive tools and in-map linkages to Pleiades and Pelagios content.
To further integrate the application with the linked data community, À-la-carte maps can be customized by simply imputing Pleiades IDs into a JSON structure in the URL as demonstrated here:
http://awmc.unc.edu/awmc/applications/alacarte/?jsonGet={"zoom":"6","center":"lon=26.865381755052,lat=38.848414981353","pids": [{"pid": "550812"},{"pid": "599612"},{"pid": "550893"},{"pid": "550908"},{"pid": "222192"},{"pid": "507469"},{"pid": "550898"},{"pid": "599947"},{"pid": "897849"},{"pid": "550595"},{"pid": "550696"},{"pid": "550497"}]}
More discussion and examples of the linking capabilities can be found on the application's site.
The lessons of À-la-carte and the API have significantly influenced current efforts at AWMC, as the Atlas of Ancient Waters (Benthos), the Asia Minor wall map, a new milestones initiative, and all other projects are now built from the very start with the needs of metadata and linked data clearly in mind. All features for AWMC maps are first associated with appropriate Pleiades IDs through name and location matching in the AWMC database, with the results then added to the API. This allows the AWMC to easily deploy the À-la-carte framework to quickly produce new web applications and provide resources to Pleiades, Pelagios, and any other projects which use Pleiades IDs. New features that are not currently contained within the Pleiades data set are posted to the AWMC API and proposed for inclusion into the Pleiades site, allowing other initiatives access to the current work of the center.
Towards the future
Both the À-la-carte and API projects have been extremely successful, but the center still has a significant amount or work to fully align all of its geographic resources with the larger linked data community. Efforts are now underway to provide appropriate metadata for all rivers, inland water bodies, coastline changes, and other features which have not yet been aligned with Pleiades, a task which has proven to be extremely complex. There is also a need for more descriptive and verbose metadata for existing resources, which will necessitate further refinement of the center's database. The AWMC is also actively looking to simplify the process of accessing accurate background tiles and is exploring the use of MapBox to provide an easily accessible tile server to the wider community, along with a new release of the À-la-carte software to provide an extensible, reusable, and simple framework for the development of further mapping applications. Although the work of the AWMC is far from complete, the impact of linked data has already caused a paradigm shift in the operations of the center, and it is our belief that the simple act of coupling linked data principles with accessible and accurate visualizations of ancient geography will lead to new questions, new approaches, and exciting new developments in ancient world studies.
Works Cited
Berners-Lee, Tim. (1998). “Cool URIs don't change”. Style Guide for online hypertext. Available at http://www.w3.org/Provider/Style/URI.
Berners-Lee, Tim. (2007). “Linked Data”. W3 Design Issues. Available at: http://www.w3.org/DesignIssues/LinkedData.html
Smith, William, George Grove, and Karl Müller. (1872). An Atlas of Ancient Geography, Biblical and Classical, to Illustrate the Dictionary of the Bible and the Classical Dictionaries, the Biblical Maps from Recent Surveys, and the Classical Maps Drawn by Dr. Charles Müller. London. OCLC: 181885336.
Talbert, Richard J. A and Roger S. Bagnall. (2000). Barrington Atlas of the Greek and Roman World. Princeton, N.J: Princeton University Press. OCLC: 43970336.
©2014 Ryan Horne. Published under the Creative Commons Attribution 3.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.10 (2014)
Open Context and Linked Data
Eric C. Kansa
Introduction
Archaeologists have long grappled with the challenges inherent in data sharing. They have traditionally relied on monographs and site reports to communicate, in detail, the results of excavations and surveys. However, growing dependence on digital technologies has eroded the utility of these traditional dissemination strategies. Archaeologists now collect far more (digital) documentation than can be feasibly and cost-effectively shared in print. There is also more to digital data than sheer quantity. Archaeologists routinely organize data into structures (usually tables or relational databases) in order to use software to search, query, analyze, summarize, and visualize data. As interest in structured data grows, archaeologists need new venues to access and share structured data.
“Data sharing” usually means sharing structured data in formats that can be easily loaded into data management software (ranging from Excel, to a GIS, to something more specialized), queried, visualized and analyzed. New rules imposed by granting agencies, especially “data management plans”, as well as changing professional expectations are all converging to make data dissemination a regular aspect of their scholarly communications. Archaeologists increasingly recognize the need to preserve the documented archaeological record by accessioning data into preservation repositories. At the same time, more researchers regard data sharing an aspect of good professional practice, so that data underlying interpretations and narratives of the past are available available for independent reinterpretation.
The following discussion outlines Open Context’s current approach to publishing archaeological data. The discussion explores ways Open Context attempts to situate data dissemination in professional practice, particularly with respect to Linked Data approaches toward making data easier to understand and use.
Why a “Publishing” Metaphor for Data?
While we currently see increasing interest in the management, preservation and sharing of structured data, we still do not have well-established venues and processes to support these activities (Faniel et al 2013). Many researchers focus on the need to preserve these data, especially because of the destructive nature of many archaeological field methods. Though data archiving is of critical importance, data management needs extend well beyond preservation for the sake of preservation. To be understood and useful in the future, and to be comparable to other datasets, datasets usually need rich documentation and alignment to standards and vocabularies used by other data sources. Though researchers often see integration as a desirable goal in data sharing, the challenges inherent in documenting and describing data for reuse, especially reuse that involves integrating data from multiple projects, need to be better understood.
Preparing data for reuse, especially integration with other data, can involve significant effort and special skills and expertise. Most archaeologists are not familiar with RDF, ontologies, controlled vocabularies, SPARQL or a whole host of other Web related technologies and standards. While wider appreciation and fluency in these technologies will be most welcome, not every archaeologist needs to become an expert Web technologist. Just as we do not expect every archaeologist to personally develop all of the expertise needed to run a print publication venue, a neutron activation analysis lab, or other specialization, we should not expect every archaeologist to become a Web technology guru. In other words, data dissemination can often benefit from collaboration with specialists that dedicate themselves to exploring informatics issues.
Collaborating with “informatics specialists” can take multiple forms. With Open Context, an open access data dissemination venue for archaeology, we are adapting a “publishing” model to help set expectations about what is involved in meaningful data dissemination involving the support of people specializing data issues (Kansa and Kansa 2013). The phrase “data sharing as publication” helps to encapsulate and communicate the investment and skills needed to make data easier to reuse. It conveys the idea that data dissemination can be a collaborative undertaking, where data “authors” and specialized “editors” work together contributing different elements of expertise and taking on different responsibilities. A publishing metaphor also helps communicate the effort and expertise involved in data sharing in a metaphor that is widely understood by the research community. It helps to convey the idea that data publishing implies efforts and outcomes similar to conventional publishing. Ideally, offering a more formalized approach to data sharing can also promote professional recognition, helping to create the reward structures that make data reuse less costly and more rewarding, both in terms career benefits and in terms of opening new research opportunities in reusing shared data.
Publishing Linkable and Linked Data
We initially launched Open Context in 2007 and the site has gone through a number of iterations reflecting both our growing understanding of researcher needs and reflecting larger changes on how scholars use the Web. Over the past few years, we have moved to a model of “data sharing as publication” in order to publish higher-quality and more usable data. Similar to the services conventional journals provide to improve the quality of papers, we provide data editing and annotation services to improve the quality of the data researchers share. Part of our shift toward greater formalism in sharing data centers on increasing our participation in the world of “Linked Open Data”.
Linked Open Data represents an approach to publishing data on the Web in a manner that makes it easier to combine data from different sources. It is an inherently distributed approach to promote the wider interoperability and integration of structured (meaning easily computable) data. Open Context contributes the larger body of Linked Open Data resources in two main ways (see also Kansa 2012):
First, Open Context mints a unique and stable Web identifier for every individual item contributors describe in their of data. This “one URL per artifact” approach facilitates research by removing any ambiguity about exactly which item is being referenced. Because Open Context uses Web identifiers, readily recognizable by beginning with “http://”, users and software will have little trouble in retrieving information associated with Open Context identifiers. To readers familiar with relational databases, Open Context’s Web identifiers make it easier for others to “join” data from any source around the Web to Open Context records. This approach toward Web identifiers represents a fundamental aspect of Open Context’s participation in a larger information ecosystem.
Secondly, Open Context references Linked Data published by other expert communities. Like many of the projects discussed at the LAWDI meetings, Open Context references the Pleiades gazetteer (Elliot and Gillies 2009; http://pleiades.stoa.org/). This helps remove ambiguity about ancient places that may be referenced in data published by Open Context. Referencing Pleiades also makes it easier to relate Open Context content with content from other sources that also reference Pleiades. However, because of the nature of the content Open Context publishes, relatively few records link to Pleiades. Most of the data in Open Context comes from excavations focusing on prehistoric periods, or on subdomains of archaeology where Pleiades has less relevance. Besides Pleiades, Open Context increasingly references the British Museum’s controlled vocabulary and the Wikipeida (for stable identifiers to relevent concepts), as demonstrated in this example of an object from Poggio Civitate (http://opencontext.org/subjects/AF3090B0-301C-41A0-D290-3F616AC074EF). In addition, Open Context has published several zooarchaeological datasets from prehistoric sites. To make these datasets more intelligible and more interoperable, Open Context references natural history vocabularies and ontologies, particularly the Encyclopedia of Life (http://eol.org to annotate biological taxa classifications) and UBERON (http://uberon.org to annotate anatomy classifications). This simple vocabulary alignment enables Open Context to offer simple map-based visualizations features, such as Figure 1’s map of EOL-linked cattle.
Figure 1: EOL-Linked Cattle
See: http://opencontext.org/sets/?map=1&geotile=1&geodeep=7&eol=http%3A%2F%2Feol.org%2Fpages%2F34548
Ontology and Schema Mapping and the CIDOC-CRM
A last area where Open Context participates in Linked Open Data centers on referencing shared schemas (models for organizing data). We are currently experimenting with mapping data published by Open Context with the CIDOC-CRM (see: http://opencontext.org/about/services#rdf) Open Context started in 2007 and we initially chose “ArchaeoML” (the Archaeological Markup Language) developed by David Schloen (2001) with the OCHRE project (formerly XSTAR). We chose ArchaeoML because it provided a simple and very general organizational schema that we could readily apply with very diverse forms of archaeological data. The fact that Open Context now successfully publishes more than 35 different projects of wide geographic, chronological and thematic scope illustrates the utility of ArchaeoML. For our purposes, ArchaeoML worked and continues to work. Also, in 2007 when we first launched Open Context, we found XML technologies to be relatively straightforward and easy deploy, whereas RDF based technologies seemed more experimental and challenging at the time.
However, since 2007 the landscape has changed dramatically. ArchaeoML never saw widespread adoption. The OCHRE project itself has since deprecated ArchaeoML, so its usefulness as a data interchange format was never realized. At the same time, more and more cultural heritage information systems began adopting the CIDOC-CRM as a standard for organizing data. The CIDOC-CRM became enshrined as an ISO standard, and is all but required by many funding agencies, particularly in the European Union. CIDOC-CRM therefore seems like a natural choice for the publication of archaeological data according to widely accepted standards.
Over the past two years, Open Context began experimenting with publishing RDF data organized according to the CIDOC-CRM. Our experience in doing so has made us somewhat ambivalent about the effort and returns involved in aligning data to a complex ontology like the CIDOC-CRM, at least at this stage. The CIDOC-CRM represents a tremendous intellectual achievement. It results from a great amount of effort and thought by leading experts in cultural heritage informatics. Recent archaeology extensions of the CIDOC-CRM, led by English Heritage (Tudhope et al. 2011), also represent important informatics contributions.
However, to paraphrase a famous meme, “one does not simply map to the CRM.” The CIDOC-CRM’s sophistication also makes it difficult to use in practice. For example, we recently had a discussion with a librarian trying to use the CIDOC-CRM to organize some archaeological data from a survey for publication in Open Context. The librarian used the CIDOC-CRM property “P3 has_note” as a predicate for use with Munsell color readings of potsherds. This raised some interesting issues. It is probably debatable if a Munsell color reading is simply a descriptive “note” or if a Munsell color reading is more of a measurement. If the latter, then the CIDOC-CRM property “P43F has_dimension” would probably be a more appropriate predicate. In theory, Munsell can be seen as an objective measurement. In practice, many researchers take Munsell readings because they vaguely think they should, and then they do not adequately control for all sorts of issues (lighting conditions, dampness, color blindness, etc.) that may impact a Munsell reading. The example above illustrate how difficult the CIDOC-CRM can be to use in practice. The CIDOC-CRM contains many conceptual nuances that can lead to different potential mappings. In addition, mapping to the CIDOC-CRM, or any other vocabulary or ontology for that matter, carries with it interpretive decisions. One has to make a judgement call if a Munsell reading measures a dimension or if it is simply a note. Finally, in many cases, one may not have sufficient information about a dataset to make these judgement calls. Sebastian Heath (https://github.com/lawdi/LAWD/issues/3#issuecomment-18934276) raised similar issues with respect to modeling archaeological contexts, especially from legacy excavations where the tacit knowledge behind excavation documentation may be lost.
These issues would be easier to navigate if one could refer to established practice, and look at other examples of the CIDOC-CRM in use as a guide. However, despite the prominence of the CIDOC-CRM, it is surprisingly hard to find actual CIDOC-CRM organized datasets to use as examples, at least in archaeology. More real-world implementations of the CIDOC-CRM would provide invaluable guidance. Part of the value of referencing Pleiades comes from Pelagios (see Simon et al. 2012), a system that aggregates Pleiades annotations. The services provided by Pelagios make investing in Pleiades annotations worthwhile. Unfortunately, the CIDOC-CRM has no clear analog to Pelagios. We currently need to wait for mandarins of the CIDOC-CRM to review our mappings in order to get feedback, and even if this happens, our efforts would seem only relevant and noticed by a narrow audience of CIDOC-CRM aficionados. Systems that aggregate CIDOC-CRM content would be ideal, since such systems could help provide feedback about which mappings make sense and which do not. Without implementations that give such feedback, our experiments with mapping Open Context data to the CIDOC-CRM will go untested and mainly have theoretical value. In other words, right now, Open Context’s mappings to the CIDOC-CRM feel a little bit like eating spinach: in theory, it is good for us, but in practice, it is hard to identify its tangible benefits.
Lessons on Linking Data in Practice
Our struggles with the CIDOC-CRM illustrate some of the tensions behind different visions of “Linked Data” and the “Semantic Web.” In my view, the CIDOC-CRM represents an approach that seems very much at home with the Semantic Web. I see the Semantic Web as much more of a totalizing vision that emphasizes ontology and schema alignment between datasets across the Web. By reference to common conceptual models, the Semantic Web could enable powerful inference capabilities that draw upon logical relationships between data and ontologies. The problem with this vision is that non-trivial ontologies like the CIDOC-CRM can be hard to use in practice. They can be also be used inconsistently (as illustrated with the example about Munsell values above). Beyond these practical problems, the research community has yet to really grapple with the theoretical implications of ontology standards. Is the CIDOC-CRM really universally appropriate for all cultural heritage data? Should there be room for alternative ontologies that reflect different research priorities and assumptions? In enshrining the CIDOC-CRM as an ISO standard, are we enshrining and privileging one particular (and contingent) perspective on the past without first adequately exploring other options?
Again, to my knowledge, nobody has harvested CIDOC-CRM mapped data from Open Context. So I lack feedback about the quality of our implementation of the CIDOC-CRM and I lack examples of inferences made using the CIDOC-CRM and Open Context data. Thus, realizing the benefits of a Semantic Web vision of ontology aligned data seems remote in the areas of archaeology Open Context serves. Archaeological excavation data typically has relevance for very narrow research interests and communities. The highly specialized nature of excavation data makes it harder to build a critical mass of relevant data that would benefit from integration and comparative analysis.
It is only in a few cases where Open Context has published enough relevant data to make “data integration” useful. Open Context recently published zooarchaeological datasets from 13 sites in Turkey that help document the transitions between hunting / gathering and agriculture / pastoralism in Anatolia between the Epipaleolithic through the Chalcolithic. The Encyclopedia of Life (EOL) sponsored the publication, integration and shared analyses of these data through a Computational Data Challenge ward (see Table 1 below):
Table 1: EOL Computational Data Challenge Projects
Site
Data Contributor / Key Project Participant
Project DOI
BarÇin Höyük
Alfred Galik
http://dx.doi.org/10.6078/M78G8HM0
Çatalhöyük (East and West Mounds)
David Orton
http://dx.doi.org/10.6078/M7G15XSF
Çatalhöyük (TP area)
Arek Marciniak
http://dx.doi.org/10.6078/M7B8562H
Cukurici Hoyuk
Alfred Galik
http://dx.doi.org/10.6078/M7D798BQ
Domuztepe
Sarah Whitcher Kansa
http://dx.doi.org/10.6078/M7SB43PP
Erbaba Höyük
Ben Arbuckle
http://dx.doi.org/10.6078/M70Z715B
Ilipinar
Hijlke Buitenhuis
http://dx.doi.org/10.6078/M76H4FBS
Karain Cave
Levent Atici
http://dx.doi.org/10.6078/M7CC0XMT
Kösk Höyük
Ben Arbuckle
http://dx.doi.org/10.6078/M74Q7RW8
Okuzini Cave
Levent Atici
http://dx.doi.org/10.6078/M73X84KX
Pinarbasi (1994)
Denise Carruthers
http://dx.doi.org/10.6078/M7X34VD1
Suberde
Ben Arbuckle
http://dx.doi.org/10.6078/M70Z715B
Ulucak Höyük
Canan Cakirlar
http://dx.doi.org/10.6078/M7KS6PHV
Open Context’s editors, in collaboration with the authors of the datasets, spent four months decoding and editing over records of 294,000 bone specimens from the twelve archaeological sites, and linked the data to Encyclopedia of Life and UBERON concepts. Incorporating Linked Data into editorial practices is not unique to Open Context. Sebastian Heath similarly includes Linked Data annotation into editorial work for the ISAW Papers publications, and Shaw and Buckland (2012) note similar editorial approaches in other humanities applications. In order to facilitate citation as well as search, browse, and retrieval features on Open Context, each dataset needed additional metadata documentation (Table 3). This documentation included authorship and credit information, basic project and site descriptions, keywords, relevant chronological ranges, and geospatial information needed for basic mapping (site latitude / longitude coordinates). Open Context editors also asked contributing researchers to include information on data collection methods and sampling protocols and provide documentation on each field (meaning of the field, units of measure, how determinations were made, etc.) of their submitted dataset.
Rather than having all participants in this study analyze the entire corpus of data, each participant addressed a specific research topic using a sub-set of the data. Participants met in April 2013 at the International Open Workshop at Kiel University1 to present their analytic results on the integrated data. Project Director Arbuckle assigned each participant a topic related to taxon and methodology. Participants presented on topics such as “sheep and goat age data” and “cattle biometrics” (see Table 2). Following the presentations, participants discussed the results and the implications of the presented analyses for addressing the potential research topics. These presentations formed the basis of the data and discussion presented in the forthcoming collaborative research publication (Arbuckle et al. forthcoming). We will also publish a more in depth discussion of the editorial workflow behind the project (Kansa et al. forthcoming).
The semantic issues inherent to schema mapping and the CIDOC-CRM seemed largely irrelevant to making the analysis and interpretation of these aggregated zooarchaeological datasets. Instead, more prosaic issues about vocabulary control became more important. We mainly used the EOL and UBERON as controlled vocabularies. Even though UBERON is a sophisticated ontology that can support powerful inferences, including inferences relating bone elements to developmental biology, embryology and genetics, making such inferences remained outside the scope of this particular study. Instead, linking these different zooarchaeological datasets to common controlled vocabularies formed the basis for aggregation and comparison.
Open Context’s experience with zoooarchaeology suggests that vocabulary alignment can help researchers more, at least in the near-term, than aligning datasets to elaborate semantic models (via CIDOC-CRM). Furthermore, the zooarchaeologists participating in the EOL Computational Data Challenge worked with their shared data using the most simple and widely understood of data analysis technologies. Open Context simply made the vocabulary aligned data available as downloadable tables in CSV format. CSV is a very simple and rigid format that lacks the power of XML or RDF formats to express sophisticated models or schemas. Nevertheless, one can easily open a CSV file in a spreadsheet application like Excel, so it greatly simplifies use of shared data by researchers that lack sophisticated programming skills. In the case of the EOL Computational Data Challenge project, CSV’s ease of use trumped its modeling limitations.
Summary
The point of this discussion is not to dismiss the CIDOC-CRM or the need for intellectual investment in semantic modeling. Again, the CIDOC-CRM represents a tremendous intellectual achievement and informatics researchers need to thoughtfully engage with it (rather than blindly accept it). However, many of the benefits and applications that can come with elaborate semantic modeling are thus far aspirational, especially in the context of distributed systems deployed by many different organizations and people with different backgrounds and priorities. To aspire to certain goals, even if not readily achievable today, is perfectly acceptable.
However, long-term aspirational goals typically need to be complemented by shorter term objectives that can be realized with more incremental progress. This discussion suggests that there may be some lower-hanging, easier to reach fruit in our efforts to make distributed data work better together. The distinctions I see between the shared modeling emphasis of the “Semantic Web” and simpler cross-referencing approach of “Linked Data” can help identify the low hanging fruit. In Open Context’s case, we are currently using Linked Data to annotate datasets using shared controlled vocabularies. For now, that seems to meet more immediate research needs. And since applying any standard or technology involves time and effort, we see that the most cost-effective strategy to making more usable data centers on editorial practices that cross-reference Open Context data with vocabularies like EOL, UBERON and Pleiades.
The above discussion explores our response to how we see the information environment as of late 2013. For the past six years, Open Context has worked to make data dissemination a more normal and expected aspect of scholarly practice. During this time, we’ve changed our approach to emphasis more formalism and editorial processes to promote quality. At the same time, the technology landscape and expectations of researchers has continually changed. I have no doubt that our approach toward Linked Open Data and semantic modeling will continue to evolve as expectations and needs evolve.
Notes
1 International Open Workshop: Socio-Environmental Dynamics over the Last 12,000 Years: The Creation of Landscapes III, April 16-19, 2013, Kiel University. These presentations took place in the session “Into New Landscapes: Subsistence Adaptation and Social Change during the Neolithic Expansion in Central and Western Anatolia.” The session, which was chaired by Benjamin Arbuckle (Department of Anthropology, Baylor University) and Cheryl Makarewicz (Institute of Pre- and Protohistoric Archaeology, CAU Kiel), included a panel of presentations followed by an open discussion.
Works Cited
Elliot, Tom, and Sean Gillies (2009). Digital Geography and Classics. Digital Humanities Quarterly 3(1). Available at http://digitalhumanities.org/dhq/vol/3/1/000031.html, accessed January 6, 2010.
Faniel, Ixchel, Eric Kansa, Sarah Whitcher Kansa, Julianna Barrera-Gomez, and Elizabeth Yakel (2013). The Challenges of Digging Data: a Study of Context in Archaeological Data Reuse. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries JCDL ’13 (295–304). New York, NY, USA: ACM. http://doi.acm.org/10.1145/2467696.2467712, Open Access Preprint: http://www.oclc.org/content/dam/research/publications/library/2013/faniel-archae-data.pdf, accessed September 30, 2013.
Kansa, Eric (2012). Openness and Archaeology’s Information Ecosystem. World Archaeology 44(4), 498–520. Open Access Preprint: http://alexandriaarchive.org/blog/wp-content/uploads/2012/Kansa-Open-Archaeology-Self-Archive-Draft.pdf
Kansa, Eric C., and Sarah Whitcher Kansa (2013). We All Know That a 14 Is a Sheep: Data Publication and Professionalism in Archaeological Communication. Journal of Eastern Mediterranean Archaeology and Heritage Studies 1(1), 88–97. Open Access Preprint: http://escholarship.org/uc/item/9m48q1ff
Schloen, J. David (2001). Archaeological Data Models and Web Publication Using XML. Computers and the Humanities 35(2), 123–152.
Shaw, Ryan, and Michael Buckland (2011). Editorial Control over Linked Data. Proceedings of the American Society for Information Science and Technology 48(1), 1–4.
Simon, Rainer, Elton Barker, and Leif Isaksen (2012). “Exploring Pelagios: a Visual Browser for Geo-tagged Datasets.” International workshop on supporting users' exploration of digital libraries [conference]. Paphos, Cyprus. 27 Sep. 2012. http://ixa2.si.ehu.es/suedl/index.php?option=com_content&view=article&id=53:program&catid=36:categoryhome&Itemid=63, accessed September 30, 2013.
Tudhope, Douglas, Ceri Binding, Stuart Jeffrey, Keith May, and Andreas Vlachidis (2011). A STELLAR Role for Knowledge Organization Systems in Digital Archaeology. Bulletin of the American Society for Information Science and Technology 37(4), 15–18.
©2014 Eric Kansa. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.11 (2014)
Geolat: Geography for Latin Literature
Maurizio Lana
URL: http://www.geolat.it
The scope of Geolat: allowing scholars and citizens to discover the geography contained in literary texts, offering new ways to assess and read the geographical relations which link texts. This overview describes the state of the project as of 2013.1
Concise description:
Step 1) the digital library of latin is built, with philologically sound editions of texts.
Step 2) every geographical name is tagged (what means that are recognized as places only the places which are given a specific name; "the hut of Eurilocos sheperd" identifies indeed a place, but this place has not a specific geographical identity) both recurring to existing resources (Pleiades is the main example) and creating new ones (a geographical ontology for classical texts/world).
Step 3) the geographically tagged texts can then be accessed through a geo/graphic interface (see for example GAPvis http://nrabinowitz.github.io/gapvis/index.html#index) that is if the traditional access to texts is made through a "pages interface" here we have instead a "places interface": the scholar accesses the texts starting form a map and not form a summary, or a name index, etc.
The principles which drive the project development:
Multidisciplinary team: latin classical literature, ancient history, philosophy of science, philosophy of language, computer science, librarianship, geography, archeology; geolat started as an Italian only group (the components are now raffaella afferni, margherita benzi, fabio ciotti, maurizio lana, diego magro, cristina meini, roberta piastri, gabriella vanotti) but we are building relations in order to submit proposal to next EU calls as an international group.
Open access: the system will be freely accessible to everyone.
Free software: we want to build a system which doesn't waste of public money and which can run for years with minimal costs.
CC licenses: the openness of access is formally defined regulated in order to protect both the creators and the users.
Crowdsourcing: the system must allow for continual evolution of the knowledge it contains, and this can't be done with the only contribution of the research team and its collaborators.
URIs for identification of places: we want to build a stable resource, which reliably cites (links) other resources and which can reliably be cited (linked) by other resources; all data will be published according to the Linked Data principles.
What characterizes the project:
The geographical ontology is meant to start offering the user some minimal type of automatic reasoning.
The development of the geographical visualization is a sort of evolution from lexical cooccurrences to geographical/geometrical cooccurrences: for example your circle an area on a map, you optionally select a time span, and you see which authors in which works do mention places in that area, and which places are mentioned; or you choose a place and then ask the system to select all the places within 1 day travel time (thanks, Orbis! http://orbis.stanford.edu/).
Possibly, an ontology of geographical assertions (it appears to be very difficult to build, but its usefulness and interest is probably directly related to the difficulty).
Technical aspects of development
The project is intended to put together 'existing pieces' in order to go beyond them and with a global complete system which multiplies the usefulness of its composing parts: the digital library and annotations management of Perseus (thanks, Perseus!, http://www.perseus.tufts.edu/), the geo/graphical visualization of GAPvis (thanks GAPvis!, http://nrabinowitz.github.io/gapvis/index.html#index), the geographical gazetteer Pleiades (thanks, Pleiades!, http://pleiades.stoa.org/), can be glued together in order to get a completely new access to the classical latin texts (but not only them, see below). the main reason is avoiding to reinvent the wheel if we are (as we are in fact) in a time of scarcity of resources. would the project obtain all the funds it needs, it will be possible to (re)think the system software from scratch ) risorse scarse.
Funding
In 2012 and 2013 funds where searched by submitting proposals to EU ERC Synergy calls: the project was awarded two times a B score meaning 'top scientific level but money goes to other more appealing projects'. In 2013 the project after a blind peer evaluation of European Science Foundation got a starting grant from Fondazione Compagnia di San Paolo, a bank fondation, allowing for a prototypical and investigating phase.
With the start of EU research framework "Horizon 2020" in January 2014 geolat research group will search for new EU calls; but private funding will be searched too.
Future perspectives and meaning
This whole structure has nothing specifically bound to classical Latin language. So one could envisage that all European literatures' texts be geographically tagged, allowing to read the geography of Europe through what writers said of its places, or conversely to read the texts starting from the places they mentions and wondering about the meaning of the waves of presence and absence of places in the texts according to the time spans.
So for Europe, an area where languages and interests have many historical intersections (in a same geographical area many different languages were spoken through times; or a same geographical place is mentioned by many different works in different languages) this would be a way to show in practice the deep intertwining of places, cultures, languages, giving solid substance to theoretical, programmatic statements.
Notes
1 This research is financed by Compagnia di San Paolo.
©2014 Maurizio Lana. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.12 (2014)
The Europeana Network of Ancient Greek and Latin Epigraphy (EAGLE)
Pietro Maria Liuzzo
EAGLE is a Best Practice Network (BPN) that brings together several European institutions1 and archives in the field of Classical Latin and Greek epigraphy.2 This is a different Eagle from that of the Electronic Archive of Greek and Latin Epigraphy, although all members of this Eagle are also in the project.
The “E” in EAGLE project stands for Europeana,3 the “european catalyst for change in the world of cultural heritage” brought to us by The Europeana Foundation. Europeana is a Network, and so is Eagle, sharing a vision which is:
“making cultural heritage openly accessible in a digital way, to promote the exchange of ideas and information”4
This Eagle is therefore The Europeana Network of Ancient Greek and Latin Epigraphy and its main aim is to provide Europeana with a broad collection of inscriptions, which are a specific type of content currently only sparsely present. By the end of the project Eagle will supply inscriptions coming from 25 EU countries, providing more than 1.5 M sets of metadata regarding images of inscriptions, inscriptions, contextual material (monuments and materials) and translations (of a selected group of inscriptions).
Through a careful mapping of existing data models in use by the Content Providers, 5 EAGLE will implement an inscription-specific metadata model. This will be based on Epidoc Guidelines6 for the editions of texts and on existing standards in use at Arachne7 for metadata of photographs, while following recommendations developed through sister Europeana projects. All edited texts and all images will be available in XML and be interoperable with other epigraphic projects, and will then be represented in EDM (Europeana Data Model)8 to enter in the Linked Open Data World.9
EAGLE is thus an aggregator10 and will have for the same reason also a platform to search and browse inscriptions. On the side of what Eagle would like to link with, there is certainly the possible collaboration in the future,11 to develop an editing tool for inscriptions which will allow further contributors and participant to join the BPN.12
Eagle will also develop two Applications:
a mobile application to enable users to get metadata about inscriptions they find on location by sending in a picture to the EAGLE portal.
a storytelling application to allow users to assemble epigraphy-based narratives (mostly for the benefit of non professional users).13
Finally, Eagle is proud to work in partnership with Wikimedia Italia14 with which will set up a Wikibase15 MediaWiki for the enrichment and curation of epigraphic images and texts, with special emphasis on translations. This will allow wikipedia users to link images and metadata of inscriptions from authoritative sources, and will make references in wikipedia much richer, providing a basis for a future collaborative effort towards translations of all inscriptions in many languages.
EAGLE will make connections within its BPN, being part of Europeana and being part of Wikidata and Wikisource, with different kinds of users and with all those people, institutions and projects who will decide to join. But EAGLE main concern is to link within its data to the Linked Ancient World Data, to enrich the possibility to discover scholarly significant connections among scientific databases. EAGLE will take inscriptions out of isolations opening them up for relations and connections.
One first way to do this will be to provide one stable Trismegistos URI16 to each inscription, the value of which is not only that of providing a stable link and identifier, but also to give a clear and univocal reference which can be used in printed publications, linking in this way also outside the web.
A second way will be to have all Geographical informations converge to Trismegistos Places, to find inconsistencies and get a stable ID17 which will be then linked following the Pelagios Cookbook18 to Pleiades,19 which will in turn make the richness of possible links to geographical information exponential.
The third main way among others will be the definition of common structured thesauri.20 EAGLE is already taking up a major task of alignment and definition of each term in use by the BPN in five different fields of information: Type of Inscription; Object Type; Material;21 Writing technique; Decoration.22 Each term will be associated with the values in use by each other content provider and will be given definitions with examples. This is because there is a shared need for an alignment and full definitions, as vocabularies in use in different projects often differ not only in their names and number, but in definitions, attribution to a specific field, not to mention translations of values which are often not univocal at all, and might create ambiguities and problems. EAGLE will develop these thesauri23 and will keep them exposed and accessible all the way long with stable URIs.24
EAGLE has a main focus on images for two reasons. The first is that to achieve the functions of the application described above, CNR-ISTI will also develop within EAGLE a full image recognition system which will enable people to get more information from all these networks of data via a smartphone while enriching the database of photographs of inscriptions.
Secondly, according to the DEA all metadata (including all the EPIDOC-XML editions of texts) will by licensed under Creative Commons CC0 licence (1.0 Universal Public Domain Dedication),25 but for images it is not always the case that owners are willing to “give them up”. Most of the images in EAGLE are already licensed under CC BY-SA and EAGLE will put effort in renegotiating current permissions to foster the CC-BY-SA or CC0 licensing for the images of its BPN and in any case provide clear labelling for all materials according to Europeana guidelines and requirements.26 EAGLE BPN believes that open licensing is a very powerful tool, together with LOD to guarantee the continuous increase of openly accessible and reusable material.27
Works Cited
ATHENA 2009: ATHENA project (2009), “Overview of IPR legislation in relation to the objectives of Europeana (1 November 2008 – 30 April 2009)”.
Bodard 2010: Bodard, EpiDoc: Epigraphic Documents in XML for Publication and Interchange, in Latin on stone ed. Francisca Feraudi-Gruénais 2010, 101f.
Cayless, Roueché, Elliott, Bodard 2009: Hugh Cayless, Charlotte Roueché, Tom Elliott, Gabriel Bodard, Epigraphy in 2017, Digital Humanities Quarterly 3.1 (2009), http://www.digitalhumanities.org/dhq/vol/3/1/000030/000030.html.
Evangelisti 2010: Evangelisti Silvia, EDR history, Purpose, and Structure, in Latin on stone ed. Francisca Feraudi-Gruénais, 2010, 127f.
Felle 2012: A.E. Felle 2012 Esperienze diverse e complementari nel trattamento digitale delle fonti epigrafiche. Il caso di Eagle ed EpiDoc in Diritto romano e scienze antichistiche nell’era digitale. Convegno di studio (Firenze, 12-13 settembre 2011), Torino 2012 [Collectanea Graeco-Romana. Studi e strumenti per la ricerca storico-giuridica, 10], 117-130.
Harping 2010: Harping Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works 2010 (http://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/what.html)
Vincent Wickham 2013: N. Vincent and C. Wickham, Debating Open Access http://www.britac.ac.uk/openaccess/debatingopenaccess.cfm. June 2013.
Notes
1 Please see the temporary project website for more informations: http://www.eagle-network.eu/about/partners/ for a complete list of initial members of the BPN and http://www.eagle-network.eu/about/get-involved/ if you would like to join EAGLE.
2 The coordinator of the project is Silvia Orlandi (http://viaf.org/viaf/24736430 Sapienza Università di Roma), and the technical coordinator is Claudio Prandoni (http://www.digitalmeetsculture.net/people/claudio-prandoni/ Promoter Srl).
3 http://www.europeana.eu
4 See http://pro.europeana.eu/documents/900548/a03b2598-a3b2-4cb8-8cc5-33eeadbc7aa8.
5 Thanks mainly to the work of the University of Alcalà de Henares (task leader) and of the Cyprus Institute (work package leader). It is to be noted that some partners in the EAGLE BPN already have their data in EPIDOC XML, namely PETRAE http://petrae.tge-adonis.fr/ and the British School at Rome.
6 http://www.stoa.org/epidoc/gl/dev/ for history, philosophy and introduction to Epidoc, see Cayless, Roueché, Elliott, Bodard, 2009 and Bodard 2010
7 http://arachne.uni-koeln.de/
8 More details and documentation can be found at http://pro.europeana.eu/edm-documentation. EDM refers to major vocabularies and namespaces, including CIDOC-CRM.
9 See also http://pro.europeana.eu/linked-open-data
10 The structure of which is being designed by the Consiglio Nazionale delle Ricerche - Istituto di Scienza e Tecnologie dell’Informazione.
11 Perhaps with Perseus projects on Epigraphy (see http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/digital-humanities-in-the-classroom-introducing-a-new-editing-platform-for-source-documents-in-classics/) and with the Papyrological Navigator (http://papyri.info)?
12 An approach which should guarantee and integration of different approaches to the study of epigraphy: the inscription as text or as inscribed monument. Cfr. Felle 2012.
13 Using tools as https://storify.com/
14 http://www.wikimedia.it/
15 The software used by Wikidata (https://www.wikidata.org/wiki/Wikidata:Pagina_principale) which is actually an extension of MediaWiki developing on Semantic Media Wiki (http://www.semantic-mediawiki.org/) to improve it and solve several of its existing issues (http://semantic-mediawiki.org/wiki/Wikidata_to_Bring_Semantic_Data_to_Wikipedia).
16 http://www.trismegistos.org/about
17 And a thorough work of cleaning and harmonization.
18 https://github.com/pelagios/pelagios-cookbook/wiki
19 http://pleiades.stoa.org/
20 According to the definition in Harping 2010. I call them all thesauri according to the more specific definition by the same author although some EAGLE thesauri will be flat lists with synonyms ring lists based on other current uses in the BPN, others poly-hierarchical lists. Some general concepts will be also defined which might perhaps find their place in LAWD. There is no intention to provide arbitrary translations of terms, but consistency of translations from content provider will be harmonized and checked by native speakers. For example of translations and definitions see Evangelisti 2010 and http://edh-www.adw.uni-heidelberg.de/hilfe/liste/inschrifttraeger or http://edh-www.adw.uni-heidelberg.de/hilfe/liste/inschriftgattung
21 The petrology section of the material vocabulary will mostly refer to the “Simplified Petrography” of the University of Salzburg, an highly accurate thesaurus developed by natural scientists and classicist which is already available with documentation at http://chc.sbg.ac.at/sri/thesaurus/
22 EAGLE will also put effort in harmonizing the dating criteria in use, with precise guidelines for controlled format of date mark-up. This as the previously mentioned vocabularies will be defined and publicly accessible.
23 Although the testing phase is not yet finished, there is a very strong likelihood that tematres will be used to develop these vocabularies http://www.vocabularyserver.com/.
24 The website and technology will be maintained by PROMOTER srl and by Consiglio Nazionale delle Ricerche - Istituto di Scienza e Tecnologie dell’Informazione.
25 A full description can be found at http://creativecommons.org/publicdomain/zero/1.0/
26 A useful overview can be found by the ATHENA 2009 p.6 with definitions from http://www.wipo.int/about-ip/en/. Useful “calculators” are available at http://www.europeanaconnect.eu/, http://www.outofcopyright.eu/ and now also at http://www.digitalmeetsculture.net/article/specialized-ipr-support-from-europeana-photography/.
27 An issue related but basically distinct from the one faced in Vincent and Wickham 2013.
©2014 Pietro Maria Liuzzo. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.13 (2014)
Bryn Mawr Classical Review
Camilla MacKay
Bryn Mawr Classical Review or BMCR (bmcr.brynmawr.edu and www.bmcreview.org) was founded in 1990 as an electronic open access journal (the second such journal ever published in the humanities), and over 8000 book reviews have since been published. It is an international journal that publishes in English, French, German, Italian, and Spanish, and reviews books published anywhere in the world. Bryn Mawr Classical Review reaches a daily email and online readership of 13,000 people worldwide; it is well known in the humanities internationally and because it has always been open access, is widely read not only within academic communities but by the interested public. Following LAWDI 2012 and my introduction to linked data, for which I'm particularly grateful to the organizers, Tom Elliott, Sebastian Heath, and John Muccigrosso, I obtained grant support in 2013 from the Trico (i.e., Bryn Mawr, Haverford, and Swarthmore Colleges) Digital Humanities initiative to begin publishing linked data for BMCR.
BMCR is a publication of Bryn Mawr Commentaries, Inc., a non-profit organization that sells Bryn Mawr Greek and Latin Commentaries, inexpensive Greek and Latin textbooks that provide text and explanatory notes geared toward a college-level readership. Expenses for BMCR include editorial assistance and postage costs (despite its place in the forefront of digital publications, reviews are almost exclusively of print books which must be sent to the reviewers, and over half the reviewers are outside North America). Bryn Mawr College provides space (and faculty and staff time), and the journal is supported by a large board of about 100 volunteer editors. Ongoing expenses cannot grow, and neither reviewers (who are already asked to assist with basic formatting) nor editors can be asked to take on significant additional responsibilities in order to enhance BMCR. The challenge of adapting the web publication of BMCR is to do so within the limited budget and limited staff and editorial time available.
All reviews are stored as TEI-encoded files, with the result that structured metadata exists behind each file. BMCR reviews have (with one or two exceptions from the early years in the 1990s) clean URIs. Tagged ISBNs and titles identify each book reviewed, and tagged names identify both authors and reviewers. A separate blog (on Blogger) was added in 2008 to allow comments on reviews; reviews are automatically published to the blog and linked to the main site.1
We investigated using RDFa to publish linkable data in BMCR reviews, and led by Karen Coyle, we plan to repurpose the existing TEI metadata as schema.org data.. A particular challenge is to provide data that uses ontologies and vocabularies that are and will be widely used. Using WorldCat identities for books seems obvious, and we hope we can thus enable linking to all formats of a particular book. We now usually publish one ISBN per review—the ISBN of the physical format of the book provided by the publisher—but there may be multiple ISBNs: hardback, paperback, e-book. For this reason, using the WorldCat identity rather than or in addition to the ISBN, where possible, may increase the utility of BMCR's data. Since BMCR has, for several years, used Endnote to pull citations from WorldCat, the WorldCat ID number exists in our Endnote files, and with some effort we may be able to generate WorldCat identities (as published in worldcat.org) fairly easily for a couple of thousand reviews, and going forward, can incorporate their inclusion in the schema.org data.
Another form of data perhaps useful for publishing as linked data consists of the author of the review, one of the tagged fields. But in this case, we have no automated way of providing, for example, the VIAF identity for the person (who may or may not exist in VIAF in any case). Would identifying the name as the reviewer provides it him or herself (usually without a middle name, sometimes only with initials) be useful? (Reviewers who review more than once for BMCR often provide different forms of their own names from review to review; we do not have our own authorities, so the same reviewer can be indexed differently even in BMCR.)
A particular problem is figuring out how BMCR's linked data might be useful in a real world environment.2 It is easy to imagine that publishing data to the effect that a particular BMCR URI is a review of a particular book, identified by the worldcat.org URI, could allow a library catalog, for example, to automatically pull in a BMCR review about a particular book, regardless of the format. For example, http://bmcr.brynmawr.edu/2008/2008-04-35 is a review of http://www.worldcat.org/oclc/314219734, which has both print and e-book ISBNs (but we only published the hardback ISBN). Having a partner in place to make use of this data would be most encouraging; I worry that we might otherwise make the wrong choice and risk publishing data that no one will use.
Given the limited resources of BMCR, and the constant editorial demands of continuous publication of some 60 or so reviews per month, implementing publication of linked data has to be something that requires little additional work of the editors, with a guaranteed (or as close to guaranteed as we can make it) result. The process must be close to automatic and thus our pressure to make a few good choices. Will our publication of schema.org data enable this? It is unclear exactly what benefit we (or our readers, or searchers) will gain, but the cost is low.3
Another promising possibility for linking out from BMCR arises from recent discussions with Josh Sosin, Hugh Cayless, and Ryan Baumann of the new Duke Collaboratory for Classics Computing about linking papyrological and epigraphic texts cited in BMCR reviews, using Trismegistos numbers. Relatively few BMCR reviews cover books on these topics, and so even retrospective linking for these reviews back to 1990 would be manageable. Reviewers of books on these topics too (especially papyrology) might add coding for these links on their own should we start requesting links in the review. This project may help determine the feasibility and usefulness of adding more links from the content of BMCR reviews.
Notes
1 The experiment with comments on reviews has not been a resounding success: readers rarely comment on bmcreview.org, and prefer either to write directly to the reviewer (we hear anecdotally) or to publish formal responses in BMCR.
2 Eric Kansa's discussion of Open Context, CIDOC-CRM and linked data (http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/kansa/) is a more sophisticated example of the challenge of making linked data truly useful.
3 schema.org FAQs are vague about results: "over time you can expect that more data will be used in more ways. In addition, since the markup is publicly accessible from your web pages, other organizations may find interesting new ways to make use of it as well."
©2014 Camilla MacKay. Published under the Creative Commons Attribution 3.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.14 (2014)
Byzantine Cappadocia: Small Data and the Dissertation
A. L. McMichael
By incorporating Linked Data, a previously-siloed, solo project can become a connected, open, collaborative means for contributing to scholarship and public knowledge. Documenting Cappadocia is a website focused on Byzantine monuments from approximately the 6th to 11th centuries in Cappadocia, a region in central Turkey. I created the website while a PhD student in art history at the Graduate Center, City University of New York (CUNY), where it is hosted and supported by the New Media Lab in collaboration with GC Digital Fellows program. The project’s primary purpose is to offer a scholarly introduction to the area and facilitate an online community. While Byzantine monuments in Cappadocia have been the subject of extensive research, there are very few open access reference materials on the topic. The monuments are also visited by millions of tourists each year, yet there is a lacuna of photos with adequate and accurate captions describing them. Documenting Cappadocia began addressing these issues with a bibliography, photos, and links to open access resources.
Following LAWDI 2012, I used the project as a case study in how a solo researcher with limited resources can contribute to the wider Ancient World network. Documenting Cappadocia’s photographs are now available with a CC-BY license in the Ancient World Image Bank Flickr group and are annotated with Pleiades machine tags. Entries in the bibliography now link to permanent URIs in WorldCat or JSTOR, following the lead of Phoebe Acheson’s Ancient World Open Bibliographies. I have also had fruitful discussions with LAWDI participants about future contributions to the Pleiades gazetteer. Since the LAWDI network emphasizes both human connections and links between data sets it enabled me to collect and organize data with like-minded scholars and relevant projects in mind.
These implementations brought to light issues that many non-developers have in understanding Linked Data principles, and the website became a vehicle for advancing my own digital literacy. A blog post titled, “Linked Data for the Uninitiated” addresses jargon and introductory concepts. Not surprisingly, the most actionable principle of Linked Data is a focus on openness. For small projects, utilizing out-of-the-box content management systems (i.e. WordPress plug-ins or Omeka’s Dublin Core metadata standards) with an emphasis on openness and stable URIs can have a profound effect on digital scholarship.
The website remains a work in progress, generating small data sets that are constructed to be reused and remixed. This strategy has been crucial in the extension of Linked Data principles to my dissertation research, which is also on Byzantine Cappadocia. Referring to such models, Rufus Pollock insists that small data “packages” are scalable and solve problems, calling them the “real revolution” in democratizing data (Pollock 2013). Joris Pekel adds that experts such as curators and archivists should work alongside the wider public to create and enrich small data sets, granting them a significant stake in opening up public dialogue (Pekel 2013). Their commentary highlights the contributions that dissertations can make to the semantic web.
Recent academic dialogue calls in question the underlying value of dissertations, criticizing the lack of collaborative process and isolated research environment required to produce a monographic text (Patton 2013). However, even the most traditional dissertations are built on data, much of which is relegated to appendices. My dissertation has a bibliography, map of sites, and catalog of images, all of which can be valuable data in their own right. Alongside my traditional art history dissertation, I am building a database of Byzantine monuments using Dublin Core elements, controlled vocabularies, and stable URIs for each. Since the dissertation provides a foundation for the bridge between student and early career researcher, the goal is to experiment with the visualization and sharing of data in order to assert the research into the wider range of ancient and medieval scholarship.
In conclusion, Linked Data principles are a practical way to integrate small projects, including dissertation research, into a wider community and to develop collaborative methods surrounding a solo project. Structuring this data in such a way that it can be remixed by others offers a number of benefits for a solo researcher or small team. First, controlled vocabularies and metadata standards provide precedents and parameters for the scope of the work and encourage the use of best practice guidelines. Also, Linked Data expands the definition of collaboration, offering possibilities to network with projects of a similar subject matter or scope. It incorporates niche topics into the wider realm of scholarship and public knowledge, offering context. It also helps identify and widen the potential audience for the work.
Works Cited
[Patton 2013] Patton, Stacy. “The Dissertation Can No Longer Be Defended.” The Chronicle of Higher Education 11 February 2013. Available at: <http://chronicle.com/article/The-Dissertation-Can-No-Longer/137215/>
[Pekel 2013] Pekel, Joris. ‘Big Data vs. Small Data: What about GLAMs?’ 2 May 2013. OpenGLAM. Available at: <http://openglam.org/2013/05/02/big-data-vs-small-data-what-about-glams/>.
[Pollock 2013] Pollock, Rufus. ‘Forget Big Data, Small Data is the real revolution’ 22 April 2013. Open Knowledge Foundation Blog. Available at: <http://blog.okfn.org/2013/04/22/forget-big-data-small-data-is-the-real-revolution/>
©2014 A. L. McMichael. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.15 (2014)
Coinage and Numismatic Methods. A Case Study of Linking a Discipline
Andrew Meadows and Ethan Gruber1
1. The Opportunity
As a type of evidence for the ancient world, coinage is unique. Coins are monetary objects, and thus a key element in modern attempts to reconstruct the workings of the ancient economy. For example, through the process of ‘die-study’ it is possible to determine with some degree of accuracy how many dies were used to strike a given coinage. This provides us with a way to quantify ancient monetary production. Since coins can also be attributed to particular rulers or cities with some degree of certainty, this makes it possible to ascertain the monetary output of different cities, kingdoms and empires, and to compare them with one another. There now exists a substantial body of scholarship devoted to the estimation of size of production, but comparatively little as yet to its broad analysis or representation in interactive media such as timelines or maps.2
Coins are also archaeological objects in that they have find spots. Coins within archaeological contexts have much to tell excavators about the contexts they are digging, but also more broadly about the monetary profile of the site they are excavating compared to others of similar or different types; from multiple sites a regional history may emerge (see e.g. Reece, 1982). But find spots also give coins a trajectory. If we know where a coin was made and where it was found, we have evidence for movement, connectivity and economic circulation (see Map 1).
Map 1. Hoard find spots of coins minted at Alabanda in Caria. From http://nomisma.org/id/alabanda.
Few archaeological objects from antiquity can be mapped from source to deposition with such certainty as coins, and yet again we are only beginning to exploit the possibilities of this evidence in analytical and representational tools. Moreover, with the advent of the metal detector, individual coin finds and their recording are no longer confined to excavation material. The Portable Antiquities Scheme in the United Kingdom, for example, now has recorded find spots for some 283,000 coins (Pett this volume; see Map 2). Already it is leading to new works of synthesis (e.g. Leins, 2012; Walton, 2012), but again work is just beginning. Only when it becomes possible to compare data sets across multiple modern source countries will it become possible to write the larger monetary history of ancient imperial spaces. With other coin-finds projects in other countries beginning to come online (see e.g. http://www.ecfn.fundmuenzen.eu/Partners.html), it will not be long before the quantity of such material available enters the realm of Big Data.
Map 2. Coin finds in the UK PAS database (http://finds.org.uk/database/search/map/objecttype/COIN)
Unlike most other forms of archaeological evidence, coins are official objects: their designs and inscriptions can tell us about the intentions of their issuers and, perhaps, the preconceptions of their users (See eg. Fig. 1). The iconographic and epigraphic repertoire of ancient coinage is a huge, and substantially un-mined resource for examining areas from local religion to imperial economic policy; from individual political ambition to communal statements of identity. And there is scope here, as recent work has shown (Kemmers, 2006; von Kaenel and Kemmers, 2009), to marry the evidence from findspots to that of the iconography of the objects, to expose patterns of administration invisible from other sources.
Fig. 1. Reverse of a denarius of Augustus depicting a Cippus inscribed: S(enatus) P(opulus)Q(ue) R(omanus) IMP(eratori) CAE(sari) QVOD V(iae) M(unitae) S(unt) EX EA P(ecunia) Q(uae) I(ussu) S(enatus) AD A(erarium) D(elata) E(st). (American Numismatic Society:http://numismatics.org/collection/1944.100.38334)
Finally, there is the sheer quantity of material that survives. As we have already noted, the United Kingdom PAS scheme after 16 years of recording contains information on 283,000 coins, and the rate of discovery, and thus growth of material is not yet showing signs of abatement. This is just one country. Initiatives exist to record individual finds as well as hoards in numerous other European countries. Moreover, the collections of Museums - both local and national - raise the numbers of coins available for study literally into the millions (Callataÿ 1997b). To these sources we must add also those coins that appear in commerce every year, to a large extent since c. 2000 online (eg. Fig. 2).3
Fig. 2 The first of 101,546 results for a search for ‘denarius’ in the Coinarchives Pro subscription website (http://pro.coinarchives.com/a/results.php?search=denarius&firmid=&s=0&upcoming=0&results=100)
Numismatic material thus presents an exciting set of opportunities to address questions unanswerable through other source material, and to do so with substantial quantities of data, a significant amount of which is already available online in a variety of forms.
2. The problems
However, in the sheer volume of data, and its location in a variety of institutional and non-institutional settings lie two of the the principal barriers to its exploitation. Many, many coins are described online, but they are described in different systems, in different formats, according to different standards, with different aims and in different languages (compare figs. 3.a-d).
Fig. 3a. Tetradrachm in the name of Alexander the Great, mint of Alexandria. Bode Museum, Berlin, online catalogue (IKMK). http://www.smb.museum/ikmk/object.php?id=18202968
Fig. 3b. Tetradrachm in the name of Alexander the Great, mint of Alexandria. ANS, New York, online catalogue (MANTIS). http://numismatics.org/collection/1944.100.35623
Fig. 3c. Tetradrachm in the name of Alexander the Great, mint of Alexandria. Bibliothèque Nationale, Paris, online catalogue (Gallica). ark:/12148/btv1b8476481p http://catalogue.bnf.fr/ark:/12148/cb417462537
Fig. 3d. Tetradrachm in the name of Alexander the Great, mint of Alexandria. Freeman & Sear, sold 4.i.2011. (Coinarchives): http://pro.coinarchives.com/a/lotviewer.php?LotID=391000&AucID=707&Lot=32
To add to the complexity, numismatics has its own way of describing numismatic objects, necessitated by and tailored to features specific to coinage. Coins, for example, have two sides, both of which must be described. They are both pictorial and textual. Physical characteristics such as material, weight, diameter and the relationship of the heads to tails (obverse to reverse) directions (axis) can be recorded. Coins have denominational systems that vary with time and place; and those times and places may require systems of chronology and geography that vary from standard, modern formats: ‘154/3-153/2 BC’, ‘first quarter of the second century’, ‘Byzantium’, ‘Constantinople’. And there is a vast array of information about individuals involved in the production of coinage, whose names and titles may vary over time, but in any case probably find no easy analogue in any other discipline: ‘Augustus’, ‘Octavian’, ‘Tresvir’, ‘moneyer’, ‘die-engraver’. And, in a further twist, it is often necessary to record not just the details of an individual coin, but also its relationship to others. This may be in the context of its circumstances of discovery, a taxonomic arrangement, a commercial transaction, or its current physical disposition.
3. Solutions: Creating a Linked Discipline
Fig. 4 The linked nature of Numismatic Study
Coinage is thus a rich source for the study of the ancient world, and the study of Roman Imperial coinage in particular is now well established in the print medium. Roman numismatists have, over the past century divided their discipline into four discrete areas of study. The basic structure of Imperial coinage has been the focus of a type corpus known as Roman Imperial Coinage (RIC). This is now complete in 10 print volumes, and provides a basic description of each of the 40,000+ recorded varieties of the coinage. It is the standard reference work for all who catalogue and publish Roman coins in any context. These contexts may be divided into three separate types, which have formed the subject of the other three areas of focus of numismatic study: collections, hoards and individual finds. Roman coins exist in the hundreds of thousands in the major public collections across the world. Hoards (coins buried together in Antiquity) are found today in astonishing numbers across the former territory of the empire. Single finds are similarly common, both within archaeologically excavated contexts, where their scientific value is enhanced, or as chance or metal-detector finds. As Fig. 4 describes, all four of these areas of study are fundamentally interlinked since collections may contain hoards and single finds, and any coin from any context must be described by RIC type for it to be properly published and usable in historical synthesis.
In a very obvious sense, therefore, Roman numismatics is a prime candidate for the introduction of Linked Data approach to the entire range of publications required by the discipline. In 2011 the American Numismatic Society began, in collaboration with a number of strategic partners, the process of developing the necessary infrastructure for the creation of Linked Roman Numismatic Data. In part this built on existing initiatives and in part it required the establishment of major new projects.
a. Vocabulary and Ontology
A key element of this infrastructure are the stable URIs required to describe numismatic concepts. As noted above, there are elements of coin-description that are peculiar to numismatics that require a tailored approach to the creation of a discipline-specific vocabulary. Here we were able to harness a project (http://nomisma.org/) established by Sebastian Heath and Andrew Meadows in 2010 to provide stable digital representations of numismatic concepts in the form of http URIs that also provide access to reusable information about those concepts, along with links to other resources. This allows us to build a graph of Roman coin data that is linked within Roman numismatics by the use of discipline-specific terms such as denominations (eg http://nomisma.org/id/denarius) or mints (http://nomisma.org/id/lugdunum), but also allows us to join the broader graph of ancient world data through the use of common identifiers such as Pleiades URIs (e.g. https://pleiades.stoa.org/places/167717), and indeed a broader graph still through links to such resources as wikipedia (e.g. http://en.wikipedia.org/wiki/Lugdunum) or geonames.org (e.g. http://www.geonames.org/2996944/lyon.html). By this simple decision we were able to ensure, in theory at least, the integration of numismatic data within the field of Roman numismatics, but also the permeation of numismatic material into other fields of study. The launch In July 2013 of a revised nomisma.org system, based on Apache Fuseki,4 with a SPARQL endpoint and improved APIs, has allowed us to integrate nomisma IDs fully into a number of numismatic projects.
b. A Type Corpus
The existence of a full print corpus of Roman Imperial coinage (RIC) provided a ready-made framework for the creation of an online type corpus. In 2011, therefore, a project was established jointly by the ANS and New York University’s Institute for the Study of the Ancient World (ISAW) to build an online adaptation of this resource on the principles of Linked Data. This project benefited from head-starts in two areas. First the resource for the creation of necessary URIs existed in the nomisma.org project; and second, the collection database of the ANS (http://numismatics.org/search/) already contained the descriptive elements of approximately one third of the c. 50,000 known types of Imperial Coinage. From these two resources we were quickly able, with technical implementation by Gruber and data under the management of Dr. Gilles Bransbourg, to establish Online Coins of the Roman Empire (OCRE: http://numismatics.org/ocre/), a type corpus of Roman coinage. As work has progressed on the creation of type records within OCRE, so we have created the nomisma.org URIs necessary for their description. The Linked Data approach we have taken to the creation of OCRE has a number of obvious payoffs. An attractive feature is that by providing alternative names for all nomisma.org concepts in multiple languages, we have been able quickly and easily build a mutlilingual interface (compare Figs. 5a and 5b), derived from SKOS-defined preferred labels in RDF extracted real-time from nomisma.org’s APIs. To date eleven languages are supported: English, French, German, Spanish, Italian, Greek, Russian, Bulgarian, Romanian, Swedish and Dutch). But the advantages to the Linked Data approach run deeper too. Where the Roman Imperial Coinage type corpus can do nothing but describe the types themselves, and illustrate a single representative example of a selected few types, OCRE has the power to link to multiple examples of a given type from multiple contexts. This opens up huge new possibilities for research.
Fig. 5a. OCRE browse page displayed in English
Fig. 5b. OCRE browse page displayed in Greek
c. Type corpus to hoards
In parallel to the development of OCRE, the ANS has also been working with Dr Kris Lockyear of the University College London’s Institute of Archeaology to create (with implementation again by Gruber) an online database of Roman Republican coin hoards (CHRR: http://numismatics.org/chrr/).5 This is based on Dr Lockyear’s personal research database (originally in MS Access: see Lockyear 2007), which is a substantially enlarged version of a print volume (Crawford 1969). Although the bulk of the contents of this database are not directly relevant to the issues of the Imperial period represented in OCRE, the coin types defined in Crawford’s corpus, Roman Republican Coinage, are defined as concepts in nomisma. It is therefore possible to extract machine-readable data about these types from nomisma in order to render HTML pages, KML for maps and timelines, or to facilitate quantitative analyses. Moreover, while the majority of the hoards recorded in CHRR are from the Roman Republic, a significant number of the later hoards (about 10% of all hoards in the database) do contain issues of the Imperial period. Since we have used nomisma.org identifiers within CHRR, it has been possible both to enrich the CHRR database with type descriptions from OCRE (see eg. Fig. 6a), but also to enrich OCRE with findspots derived from the CHRR database (see eg. Fig. 6b). From the interrelationship of these two projects, the benefits of the utilization of stable URIs of everything from findspots, to coin types becomes immediately apparent.
Fig. 6a. OCRE typological data deployed in the description of the contents of the Cetateni (Romania) hoard (http://numismatics.org/chrr/id/CET)
Fig. 6b. CHRR findspot data for Roman Imperial Coins type Augustus 2B deployed in mapping interface in OCRE (http://numismatics.org/ocre/id/ric.1(2).aug.2B?lang=en)
d. Type corpus to collections
There are also obvious benefits to establishing links from OCRE typological records to individual specimens housed in public collections. The practice of assigning unique URIs to individual objects, now recommended by the International Council of Museums (ICOM: http://www.cidoc-crm.org/URIs_and_Linked_Open_Data.html), and adopted by the majority of major collections, allows for stable connections to be built between those objects and OCRE typological records that describe them. So, for example, a coin type record within OCRE can link to multiple instances of that coin type held in multiple collections. Fig. 7, for example, shows the OCRE record for RIC Augustus 4B, with references to 6 specimens of the type: 4 in the collection of the ANS in New York and 2 in the collection of the Coin Cabinet of the Bode Museum in Berlin.
Fig. 7. Multiple specimens of a coin type referred to an illustrated within OCRE.
Through the ingestion into OCRE of the specific characteristics, such as weight or die axis, of each specimen it is possible to create tools to analyse the characteristics of individual types or of coins issued in particular places or by particular emperors (see Fig 8a). Analysis can also be carried out, needless to say, on the generic characteristics of types using a similar set of set of filters. See, for example, Fig. 8b, which compares the frequency of the depiction of deities on the coinages of Augustus and Nero.
Fig 8a. The weight of denarii linked to OCRE struck from AD43-135
Fig. 8b. OCRE-generated comparison of coin types of Augustus and Nero.
e. OCRE to coin finds
Given the huge numbers of coins that are now being recorded by schemes across Europe, there is obviously huge potential to ingest the details of these coins into OCRE. In this case it is not just the physical characteristics of the specimens concerned, but also their find spots which serve as the basis of enhanced analysis through mapping. Fortunately the price of admission to the Linked Data community for Roman numismatics is low. Existing projects, with, in some cases, well developed and longstanding databases of finds do not need to change their recording practices or software. So long as their data can be mapped to nomisma URIs when it is exposed to the web, then it may enter the graph of numismatic data on the web. Already the UK’s Portable Antiquities scheme is using nomisma.org URIs. The database currently under development in Germany for the recording of finds there, in Poland and the Ukraine will also use nomisma.org URIs. In time, all of this data too will be available to OCRE.
And the exchange of data is potentially, also, a two-way street. OCRE type descriptions are fully and freely downloadable, and may serve to populate other databases, without the need for their creators to redo the work of cataloging that has already gone into the creation of the OCRE record. Just as now few librarians catalogue books from scratch, so in the future there will be little need for the cataloguer of coins to generate new descriptions of coins long known.
f. Linked Open Data Architecture, Applied
How does Linked Data help in technical terms? The most efficient method for maintaining the relationships between coin types in OCRE and their associated coins and hoards is with the use of an RDF database and SPARQL endpoint. OCRE directly queries the endpoint with SPARQL to deliver some types of services. In other cases, OCRE interacts with REST APIs offered by nomisma (simple web service interfaces which conduct more complicated SPARQL queries in the background). The second, SPARQL-aware version of OCRE was released in October 2013.6 The RDF requires three components of data: first, RDF representations of coin types, second RDF descriptions of physical coins or coin hoards, and finally, the RDF data from nomisma.org, which enables links to be made from typological attributes (e.g., http://nomisma.org/id/ar for silver coins or http://nomisma.org/id/augustus for those of Augustus) between types and coins/hoards. The RDF model conforms to the numismatic ontology established by nomisma. It is relatively simple, especially compared with CIDOC-CRM, and lowers the barrier for participation in OCRE. To date, Berlin, the Fralin Museum at the University of Virginia, and CHRR have joined the ANS in contributing data into OCRE.
We can see the relationship between ‘ideal’ type description and individual specimen more clearly by looking at an example. The structure for RIC Augustus 1A is represented as an nm:type_series_item defined by the URI http://numismatics.org/ocre/id/ric.1(2).aug.1A.7 A relationship is established by the nm:type_series_item in the RDF model, below, associating object number 18207296 in the Berlin Münzkabinett to Augustus 1A:
<rdf:Description rdf:about="http://www.smb.museum/ikmk/object.php?id=18207926">
<nm:type_series_item rdf:resource="http://numismatics.org/ocre/id/ric.1(2).aug.1A"/>
<dcterms:title>Augustus, ca. 25-23 v. Chr.</dcterms:title>
<dcterms:publisher>MK Berlin</dcterms:publisher>
<nm:collection rdf:resource="http://nomisma.org/id/mk_berlin"/>
<nm:axis rdf:datatype="xs:integer">6</nm:axis>
<nm:diameter rdf:datatype="xs:decimal">13</nm:diameter>
<nm:weight rdf:datatype="xs:decimal">1.32</nm:weight>
<nm:reverseReference rdf:resource="http://www.smb.museum/mk_edit/images/8099/rs_opt.jpg"/>
<nm:obverseReference rdf:resource="http://www.smb.museum/mk_edit/images/8099/vs_opt.jpg"/>
</rdf:Description>
The RDF contains a few metadata fields from Dublin Core (dcterms:title and dcterms:publisher) and a handful of Nomisma-defined concepts for encoding measurements (nm:axis, nm:diameter, and nm:weight). Image URLs are also recorded.
Note that the RDF description of this object in the Berlin collection does not explicitly denote its material (silver), denomination (denarius), or other typological attributes. Like a relational database, an RDF database enables queries by these attributes by the association of the nm:type_series_item between the coin and the coin type. Practically speaking, how does this affect OCRE? Rather than updating records in OCRE when the collections of new partners become available, the RDF database stands apart, and thus it is easier to update Fuseki with new collections and manage changes or deletions in collections already contained in the system. Using SPARQL, OCRE (through the Numishare application code) can query Fuseki to display images of coins associated with particular types in record or search results pages. SPARQL query results can be serialized into KML directly and displayed in maps, making it possible to make use of findspots from online coin hoard catalogs (Fig. 7). Using mathematic functions inherent to SPARQL, average weights of coin types or specific typologies (e.g., denarii of Augustus) can be delivered to OCRE directly and rendered in the form of charts and graphs (Fig. 8a) by the Javascript library, Highcharts, which is capable of interpreting this data into HTML5 graphics. The updating of this RDF database is independent of progress made in adding new types into OCRE, and thus more findpots, images, and measurements become available immediately upon ingestion into Fuseki.
Conclusion
Roman numismatics and, indeed, numismatics more broadly is exceptionally well suited to the implementation of Linked Data model for the publication of collections, finds and reference works. There is much still to be done; but it must only be done once. Thereafter, all may benefit, at little or no cost, and with no need to surrender their own working practices.8
Works Cited
Callataÿ, F. (1997a). Recueil quantitatif des émissions monétaires hellénistiques. Wetteren, Belgique: Editions Numismatique romaine.
Callataÿ, F. (1997b). ‘Quelques estimations relatives au nombre de monniaes grecques: les collections publiques et privées, le commerce et les trésors’, RBN 143, pp. 21-94.
Callataÿ, F. (2003). Recueil quantitatif des émissions monétaires archaïques et classiques. Wetteren, Belgique: Editions Numismatique romaine.
Callataÿ, F. (2005). ‘A quantitative survey of Hellenistic coinages: recent achievements’ in Archibald, Z., Davies, J. K., & Gabrielsen, V. (2005). Making, moving and managing: The new world of ancient economies, 323-31 BC. Oxford: Oxbow Books, pp. 73-91.
Callataÿ, F. (2011). Quantifying monetary supplies in Greco-Roman times. Bari: Edipuglia.
Crawford, M. H. (1969). Roman Republican coin hoards. London: Royal Numismatic Society.
Gruber, E. (2013). “Recent Advances in Roman Numismatics.” M.A. thesis, University of Virginia.
Kemmers, F. (2006). Coins for a legion: An analysis of the coin finds from [the] Augustan legionary fortress and Flavian canabae legionis at Nijmegen. Mainz am Rhein: Philipp von Zabern.
Leins, I. (2012). Numismatic data reconsidered: coin distributions and interpretation in studies of late Iron Age Britain. Unpublished PhD Thesis, Newcastle University: http://hdl.handle.net/10443/1467
Lockyear, K. (2007). Patterns and process in late Roman Republican coin hoards, 157-2 BC. Oxford: Archaeopress.
Reece, R. (1982). ‘Economic history from Roman site-finds’, Proceeding of the Ninth International Congress of Numismatics (Berne), Vol. I, p. 495-502.
von Kaenel, H.-M., Kemmers, F. (2009). Coins in context I: New perspectives for the interpretation of coin finds: colloquium Frankfurt a.M., October 25-27, 2007. Mainz am Rhein: Philipp von Zabern.
Walton, P.J. (2012). Rethinking Roman Britain: Coinage and archaeology. Wetteren: Moneta.
Notes
1 The American Numismatic Society, 75 Varick St, 11th Floor. New York, NY10013; meadows@numismatics.org; ewg4xuva@gmail.com.
2 For recent compendia of evidence see Callataÿ, 1997a and 2003. For discussion of progress so far and the possibilities offered, see also eund., 2005 and 2011.
3 The commercial website Coinarchives, which archives images and descriptions of coins sold online by major dealers contains in its subscription service, as of October 2013, information on 597,962 ancient coins from 1,084 auctions: http://www.coinarchives.com/a/. See Fig. 2 for a sample search.
4 http://jena.apache.org/documentation/serving_data/index.html
5 Project management was provided by Meadows and Rick Witschonke; data for Republican coins types was created and generously supplied by Ian Leins and Eleanor Ghey at the British Museum.
6 For more information on the development evolution of OCRE from its initial release July 2012 to its present, LOD-aware architecture, see Gruber (2013).
7 See http://numismatics.org/ocre/id/ric.1(2).aug.1A.rdf for further details of the model.
8 We are very grateful to Tom Elliott and Sebastian Heath for the the opportunity to present this work at the two LAWDI workshops. To Sebastian also, who might and perhaps should have been a co-author of this paper, we owe much of the inspiration for the Linked Data path we have taken.
©2014 Andrew Meadows and Ethan Gruber. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.16 (2014)
Exploring an Opportunity to Link the Dead in Ancient Rome
Katy M. Meyers
Death is a constant, however social responses to this event and subsequent treatments of the dead have varied widely through time and space. Since the beginning of archaeology, there has been interest in burials and funerary monuments with the hope of interpreting these different responses to death (Parker Pearson 1999). As theory, method and technology have developed through time, our interpretations of mortuary sites have been able to create more nuanced interpretations of why particular societies respond to death in specific manners (Rakita and Buikstra 2005). The physical remains of the dead, the burial context, and the funerary rites they received can reveal much about the lifestyles of the people in the past. However, all of this interpretation relies on the presence of one type of evidence: primary data. Linked open data (LOD) provides a much needed approach for archaeological studies of the deceased, as it would increase sharing of data, improve links between archaeologists, and further study within the discipline at an increased rate. Using Roman Imperial burial practices as an example, I argue here that mortuary archaeology would benefit greatly from the implementation of LOD.
Despite the popularity of the Roman Empire in modern academia, it is still not well-understood how imperial expansion was felt by the provinces, and how this affected their mortuary programs (Hingley 2005). Across the period of Roman occupation, burial practices varied based on a number of factors including region, citizen status, social rank, family beliefs, and religious cult affiliation. At the beginning of the empire, cremation was the most popular form of burial, with inhumation a secondary option. By the mid-2nd century, the frequencies between the two forms had reversed, and by the 4th century cremation was a rare practice (Nock 1932). Due to imperial tolerance, burial type was not state-mandated and the only stipulation was that the burial itself must take place outside the walls of the city. However, the types of tombs, the funerary rites, and the choice to cremate or inhume was left to the mourning community. This meant there was great diversity in the types of burials found throughout the empire, despite the strong Roman cultural influence (Toynbee 1996). The extent of adoption of the Roman identity in their various territories meant that burial practices were interpreted in a variety of ways, with some provinces changing with Roman invasion, some forms of hybrid mortuary programs and others maintaining their ancestral ways (Webster 2001).
The questions that stem from a brief overview of Roman Imperial burials are what were the different mortuary practices and how did Roman influence change, influence, or be rejected by different social groups across the provinces? Answering these questions would help us to better understand how Roman Imperial identity was differentially accepted or rejected by the provinces, and what happened to these practices over time with waning imperial power. The data required to answer this question is immense and expansive- but it does exist. Excavations have been done of Roman Imperial and Roman Provincial sites from around the Mediterranean, Europe and Britain (Keegan 2002, Killgrove 2005, Morris 1992, Murali and Girard 2000, Philpott 1991, Wahl 2008, Williams 2004).
There are two primary problems that prevent synthesis: there is no standard method for classifying cemeteries that would allow for direct comparison of the mortuary sites, and this data is rarely found in accessible formats digitally or in print. In a study by Roberts and Mays (2010), they found that of the over 250 articles written on bioarchaeology in Britain from the top four journals, 79% of them were based on collections from only 5 locations. While this uneven use of skeletal collections can be attributed to a number of reasons, the one that they highlight is the availability and knowledge of collections.
Applying standardized LOD methods to the domain of Roman Imperial mortuary practices requires developing and reconfiguring data to fit four principles as stated by Tim Berners-Lee (2007), the inventor of the World Wide Web. First (1), all data must have stable uniform resource identifiers (URIs) that allow for them to be identified as unique objects at any time. Second (2), these URIs must have hypertext transfer protocol identifiers so that they can be accessed via the web. Third (3), the URI must have useful metadata that is created using a universal standard and provides information to the individual looking up the data. Finally (4), the metadata includes links to other related URIs in order to build an information network. If the goal of archaeology is indeed to interpret the past from its physical remains, then the concept of linked open data provides an opportunity to gain access to the physical remains and interpretations freely through interconnected digital networks. The progress of archaeological work requires building upon the works of others, which could be aided by the introduction of LOD.
For example, in an ideal system I could look up a specific Romano-British burial site in York, the grave would have a stable HTTP URI so that it could easily referenced and could be accessed online. It would contain useful data about the grave such as what physical remains were found in it, the age and sex of the individual, types of grave goods found with it, but also who excavated it, what methods they used, and who put this information online. Finally, each of the pieces of metadata would be linked to other stable HTTP URIs so that connections between evidence could be made. In regards to being open, it also means that we would not have to pay, gain permission or be part of a specific organization in order to freely download or access this information.
Despite the increased use of digital applications and databases, we are far from the goal of having a linked and open network of mortuary archaeology data for any region or era. The sharing of mortuary and skeletal material has just begun, with sites like Open Context providing a framework for linked open bioarchaeology data. Other groups like Archaeology Data Service and the Museum of London’s Centre for Bioarchaeology are putting primary data from mortuary sites online in an open and accessible, though not linked, format As archaeologists, we not only want to interpret the past, we have a responsibility to share our data so that others can build upon it. "If I have seen further it is by standing on the shoulders of giants” (Newton). The data itself is just as important as the interpretations we draw from it. If we had full access to all the primary data that has been previously collected, we could achieve so much more. Increased access and improved connections between datasets within bioarchaeology and archaeology would reduce sampling bias and perhaps reveal new patterns. While this scenario may not be possible, it is up to us to try. Even the small amount of data that has been placed online has already shown (Roberts and Mays 2011) that it is shared more, utilized more, and has produced more nuanced interpretations due to its high availability.
Works Cited
Berners- Lee 2007 Berners-Lee, Tim, 2007. “Linked Data”. W3 Design Issues. Available at: http://www.w3.org/DesignIssues/LinkedData.html
Heath and Bizer 2011 Heath, Tom and Bizer, Christian, 2011. “Linked Data: Evolving the Web into a Global Data Space” (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology. Available at: http://linkeddatabook.com/editions/1.0/
Hingley 2005 Hingley, Richard, 2005. Globalizing Roman Culture: Unity, Diversity, and Empire. Routledge: London & NY.
Keegan 2002 Keegan, Sarah, 2002 Inhumation rites in Late Roman Britain : the treatment of the engendered body. Oxford: British Archaeological Reports.
Killgrove 2005 Killgrove, Kristina, 2005. “Bioarchaeology in the Roman World”. M.A. Thesis, Department of Classics, University of North Carolina. Available at: http://www.piki.org/~kristina/Killgrove-2005-classics.pdf
Morris 1992 Morris, Ian, 1992. Death ritual and social structure in Classical Antiquity. New York: Cambridge University Press.
Murali and Girard 2000 Murali, Pascal and Girard, Louis, 2000. “Biology and burial practices from the end of the 1st century AD to the beginning of the 5th century AD: the rural cemetery of Chantambre”. Burial, society, and context in the Roman World. Pearce, Millet and Struck, eds. Pp. 105-111. Oxford: Oxbow Books.
Newton 2006 [1676] Newton, Isaac, 2006 [1676]. Correspondence with Robert Hooke. Wikiquote. Available at: http://en.wikiquote.org/wiki/Isaac_Newton
Nock 1932 Nock, Arthur. “Cremation and Burial in the Roman Empire”. The Harvard Theological Review 25.4 (1932), pp. 321-35.
Parker Pearson 1999 Parker Pearson, Michael, 1999 The Archaeology of Death and Burial. College Station: Texas A&M University Press.
Philpott 1991 Philpott, Robert, 1991. Burial practices in Roman Britain: a survey of grave treatment and furnishing A.D. 43-410. Oxford: Tempus Reparatum.
Rakita and Buikstra 2005 Rakita, Gordon, and Buikstra, Jane, 2005. “Introduction”. Interacting with the Dead: Perspectives on MortuaryArchaeology for the New Millenium. Rakita, Buikstra, Beck and Williams, eds. Gainsville: University Press of Florida. Pp. 1-11.
Roberts and Mays 2011 Roberts, Charlotte and Mays, Simon. “Study and restudy of curated skeletal collections in bioarchaeology: A perspective on the UK and the implications for future curation of human remains”. International Journal of Osteoarchaeology, 21.5 (2011), pp. 626-630
Toynbee 1996 Toynbee, Jocelyn, 1996. Death and Burial in the Roman World. John Hopkins University Press
Wahl 2008 Wahl, Joachim, 2008. “Investigations on Pre-Roman and Roman Cremation Remains from SW Germany: Results, Potentialities and Limits”. Analysis of Burned Human Remains. Schmidt and Symes, eds. Pp. 145-162.
Williams 2004 Williams, Howard. “Potted histories–cremation, ceramics and social memory in early Roman Britain”. Oxford journal of archaeology (2004). Available at: http://works.bepress.com/cgi/viewcontent.cgi?article=1019&context=howard_williams
©2014 Katy Meyers. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.17 (2014)
RAM3D Web Portal
William Murray
In November, 1980, the first authentic large-scale warship ram was pulled from the sea near Haifa, Israel ([Casson and Steffy 1991]). Since the discovery of the so-called Athlit ram (named from its findspot), an increasing number of these artifacts have been discovered in various contexts (for a list dated to 2011, see [Murray 2012, 49-50]). Thanks to the recent discovery of the battle zone where Rome and Carthage waged the last naval battle of the First Punic War (241 BCE), we have added 12 new rams to our collection (see [Tusa and Royal, 2012, 12-25] for a discussion of seven of these weapons). At present (September 2013), the number of authentic weapons has risen to 16—and represent rams that are either completely or partially preserved.
Despite their ubiquitous presence in antiquity throughout the Mediterranean, and the roles they play in our major historical texts, our knowledge of warships that carried these weapons is appallingly poor. Except for these rams, no securely identifiable part of an ancient warship survives in multiple numbers from antiquity. The rams are not only important for their uniqueness, but also for their ability to help us address such questions as: 1) What were the physical sizes and approximate weights of different kinds of warships? 2) What forces were generated during ancient naval battles when one ship purposefully collided with another? 3) What were important regional and chronological differences in warship design? 4) Were different technologies utilized in the production of these rams, and if so, why? Once we better understand these authentic rams, we might then use them to explain differences in ram iconography in sculpture, on coins and seals, and in paintings, graffiti and mosaics. We might even generalize from their physical properties to simulate collisions and battle maneuvers on computer through a process called Finite Element Analysis.
Because these authentic weapons now allow us to study REAL rams and not representations of rams, they essentially represent a new artifact class. This demands the development of a vocabulary to describe their intricacies, not only in English, but in the languages represented by all those who study these weapons. And since these new rams hold the key to answering (or at least addressing) many questions that are important to ancient naval historians and naval architects, I am developing a web portal called RAM3D at the University of South Florida to assist researchers with the study of these weapons. The goal is to make the site a repository for different kinds of graphic evidence such as detailed study photos, 3D models of authentic rams, and useful comparanda, like large-scale 3D representations (in stone sculpture) of warships and their bow structures. We also plan to include on this site a detailed descriptive vocabulary with equivalent terms in English, French, German, Greek, Hebrew, Italian and in any other language whose researchers care to advise us. The on-line nature of the database will allow for this list to expand as new artifacts are found, and new terms are developed to describe them.
Warships were built in many different shapes and sizes in order to fulfill different roles in the fleets they populated. Although the authentic rams we possess seem to come from smaller classes of warship, we can use these rams to help us make sense of evidence for other, larger rams. This evidence comes to us from a number of reverse engineered weapons (currently 8), which I call “virtual rams” because they exist in 3D form solely on computer. These computer rams were developed from sockets or complex holes in a retaining wall that once held the back ends of authentic rams used in the Battle of Actium (31 BCE). The sockets can still be seen at Nikopolis in western Greece on a Victory Monument built by Augustus to commemorate his defeat of Antony and Cleopatra in a great naval battle fought nearby. The rams were large, if we may judge from their sockets, whose details allow for the recovery of the rams shapes. Since the process involved in creating these rams is a complex one that has yet to be fully explained in print, we envision the web portal as a place where we can demonstrate the methodology used to create these weapons, and explore how best to present our results in a traditional 2D print format.
In conclusion, we hope that the RAM3D portal will serve as a useful node where researchers can share and exchange various kinds of information with one another and thus promote the study of ancient Mediterranean warships. Like others who attended the 2013 LAWDI workshop, I am now seeking the necessary resources that will allow for the development and maintenance of this site. As we move forward, I expect to draw on my connections with the LAWDI community so that we may code our site in ways that will most efficiently link its vocabularies and site references to others studying the ancient world. The web portal, currently hosted at the University of South Florida, can be found at the following URL: http://aist.usf.edu/ram3d/.
Works Cited
Casson, Lionel and J. Richard Steffy, eds. 1991. The Athlit Ram. College Station, TX: Texas A&M University Press.
Murray, William M. 2012. The Age of Titans. The Rise and Fall of the Great Hellenistic Navies. Oxford and New York: Oxford University Press.
Tusa, Sebastiano, and Jeffrey Royal. 2012. “The landscape of the naval battle at the Egadi Islands (241 B.C.).” Journal of Roman Archaeology (2012), pp. 7-48.
©2014 William Murray. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.18 (2014)
Assessing the Suitability of Existing OWL Ontologies for the Representation of Narrative Structures in Sumerian Literature
Terhi Nurmikko-Fuller
Originally a term from the realm of philosophical thought, the label of ‘ontology’ has been adopted and adapted by Computer Science and refers to a formalised structure used for organising knowledge. Together, a knowledge base (a triple store), an ontological structure and a reasoner (software) form an expert system which enables automated inference over a given data set – whilst this is possible without publication via the Web, in doing so, the expert system effectively gains access to an enriched dataset, as further relevant information becomes available from separate external data-streams across the entirety of the Web. This linking of datasets is subject to the same challenges as any equivalent exchange between humans: in order to share knowledge, systems need an effective method of communication. Our task as human experts in charge of data sets is to publish that data in machine-readable, non-proprietary formats, with clear URIs. The adoption of existing ontologies and controlled vocabularies enables the linking of new datasets to existing projects via shared elements such as locations (Pleiades gazetteer1, for example), people (perhaps though projects such as the Berkeley Prosopography Service2) or any other defined, identified element which occurs in more than one dataset.
The tools, the data
Ontologies have played a major role in my on-going doctoral research project, which combines elements from the domain of Semantic Technologies but focuses on Ancient World Data. This work in progress involves issues of Knowledge Representation, Description Logic, Coreference, and its tools are SPARQL, RDF, URIs and OWL. The aim of this research project is to assess the suitability of existing tools for the representation of the ambiguous, incomplete and at times unknown literary narratives that play out within the content of compositions written in the ancient language of Sumerian, and published by the Electronic Text Corpus of Sumerian Literature (henceforth ETCSL)3, an online resource from the University of Oxford, which allows the public free and unlimited access to the lemmatised transliteration and English translation of some 400 composite texts.
Two OWL ontologies
This research project began with an extensive review, and led to the identification and subsequent evaluation of two OWL ontologies thought to be suitable for the representation of cultural heritage data and narrative structures: the CIDOC Conceptual Reference Model (CIDOC CRM)4 and Ontomedia (OM)5. The decision to include the CIDOC CRM was two-fold. Firstly, this event-based reference model is specifically designed as the “semantic glue”6 for linking cross-domain cultural heritage data, and has been implemented by institutions with large cuneiform collections, such as the British Museum.7 Although the ETCSL carries little object data directly, the print, electronic and cuneiform sources for each composite text are listed with each transliteration and serve as potential anchoring points for clusters of RDF triples that would allow the enrichment of the ETCSL data from other sources within Assyriology and the wider Digital Heritage community. Secondly, OM, the second ontology and one designed for the representation of narrative, was purposely designed to link to the CIDOC CRM (Lawrence 2008). Unlike the CIDOC CRM which is one large, all-encompassing structure, OM consists of several interlinking sub-ontologies, amongst which the user-consumer can opt to pick and choose any that bear relevance to their data. OM is essentially an domain-specific upper-level ontology : this oxymoron can be justified if one agrees that the ontological representation of narratives is itself a niche topic but that since OM seeks to be applicable to all fictional narratives (regardless of sub-genre), it can be seen as an example of the class of upper-level ontologies.
mORSuL
The combination of the CIDOC CRM and OM and the addition of a number of specific elements deemed necessary resulted in the creation of mORSuL (the multi-Ontology Representing Sumerian Literature). Thus far, it has been implemented in the Stanford University ontological editor, Protégé8 and serves as part of a proof-of-concept. Preliminary trials with mORSuL have led to two initial conclusions: the ontology is to be extended further to allow for the mapping of bibliographical data as published by the ETCSL (BIBO9 and FRBRoo10 are likely candidates) and the representation of literary tools such as similes, metaphors and analogies. Secondly, the resulting large and complex structure ought to be reduced to contain only those classes and properties which truly match the data available from the ETCSL.
Sumerian humour as an example case study
The next step was to test mORSuL with a case-study example. The chosen composition Three Ox-drivers of Adab is thought to be a humorous one (Alster, 1991-1993; Foster, 1974) and has a narrative structure with repetitive patterns within a frame story – due to the incomplete nature of the latter part of the piece (a result of the fracture and loss of the lower parts of the witness tablets on which the composite text is based) the representation was limited to the first 35 lines only. The composite text as published by the ETCSL is based on the witnesses AO 7739 (TCL 16 80), AO 9149 (TCL 16 83) and CBS 1601.
The story unfolds as follows: There are three friends, all citizens of Adab, who are quarrelling. Unable to solve their dispute, they decide to seek justice and approach the king. They account their story to the king: They are three ox-drivers, one of whom owns an ox, the other a cow, the third a wagon. They became thirsty and suggested that one of them should go and fetch water so that they could all drink. They asked each other, each in turn to go, but all refuse in turn, citing reasons relevant to their possessions: the owner of the ox is afraid it will be devoured by a lion if left unattended; the owner of the cow that his animal will wonder into the desert; the owner of the wagon that the goods will be stolen from it. They agree to all to go together, and in their absence, the ox mounts the cow, the cow gives birth to a calf and the calf eats the load on the wagon. Who, they ask the king, does the calf belong to? The king, unable to provide the solution, seeks the council of a “cloistered lady”, to whom he repeats the story of the ox-drivers verbatim. Sadly, the remainder of the composition which presumably contained the solution to this riddle is fractured and incomplete, and we, the audience, are left in suspense and without closure.
Ontological representation of inscription content
In terms of representing the narrative content via ontological structures, a number of points of interest arise. Although these issues are discussed in greater detail in my thesis (forthcoming), a few examples can be cited to exemplify the types of decisions which need to be made when representing the narrative and the fabula (as defined by Bal, 2009). Firstly, although the events depicted are fictional, the story is set in, or at least refers to, the historical city of Adab (modern Bismaya). The natural laws that govern our reality are applicable to the main ome:Context too (the “reality” in which the story takes place) and at no point is the reader required to suspend their disbelief or encounter entities or events that are magical, supernatural or within a dream. The events as told by the ox-drivers may appear farcical, but are not beyond the remit of plausible realism. The same can be said of the protagonists and their possessions, none of which are supernatural or anthropomorphised.
All the protagonists are considered as instances of omb:Character – they are fictional but can be argued to have a perceivable personality. Each protagonist has at least one unique and definable quality (in the case of the ox-drivers, their possessions) and the bonds of friendship and allegiance are fairly straightforward as is the fall into a dispute (over a given timeline, these omb:Characters who share a positive omb:Alliance bond acquire a negative omb:Enmity). The representation of the ox-drivers’ decision to seek justice is however more complex as is the decision whether to treat it as an instance pertaining to a ome:Social subclass of Legal. It is also worth noting that from the perspective of each ox-driver, they are (presumably) each seeking a decision favourable to themselves and not one which is truly fair. Furthermore, the only account of the events that form the main part of the narrative are the focalised memories of the ox-drivers, and the subsequent retelling of these events by the king, who did not witness them first hand. It may also be argued that the focalised story told to the king is to be seen as an amalgamation of three separate accounts of the events that took place outside the narrative, and the story told to the king is effectively a “composite memory”.
Further work
Research into the representation of Sumerian literary narratives using OWL ontologies continues. The next, imminent stages of the project include the extension and reduction of mORSuL, as well as the addition of data from other Sumerian literary compositions, so that the structure can be queried in terms of intertextuality, reoccurring motifs and possible instances of literary allusions.
Author's note: This research was funded by the Research Councils UK Digital Economy Programme, Web Science Doctoral Training Centre, University of Southampton. EP/G036926/1.
Notes
1 http://pleiades.stoa.org/.
2 http://berkeleyprosopography.org/.
3 http://etcsl.orinst.ox.ac.uk/.
4 http://www.cidoc-crm.org.
5 http://www.contextus.net/ontomedia.
6 http://www.cidoc-crm.org/index.html.
7 http://collection.britishmuseum.org/.
8 http://protege.stanford.edu/.
9 http://bibliontology.com/.
10 http://www.cidoc-crm.org/frbr_inro.html.
Works Cited
Alster, B. (1991-93). "The Three Ox-Drivers from Adab". In Journal of Cuneiform Studies 43-45. 27-38.
Bal, M. (2009). Narratology: Introduction to the Theory of Narrative (3rd Ed), University of Toronto Press.
Black, J.A., Cunningham, G., Ebeling, J., Flückiger-Hawker, E., Robson, E., Taylor, J., and Zólyomi, G. (1998–2006). The Electronic Text Corpus of Sumerian Literature, Oxford. Available at <http://etcsl.orinst.ox.ac.uk/>.
Foster, B. R. (1974). "Humor and Cuneiform Literature". In JANES 6. 69-85.
Lawrence, F. (2008) “The Web of Community Trust Amateur Fiction Online: A Case Study in Community Focused Design for the Semantic Web”, PhD thesis, University of Southampton. Available at: <http://eprints.soton.ac.uk/264704/2.hasCoversheetVersion/thesis.pdf>.
©2014 Terhi Nurmikko-Fuller. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.19 (2014)
Berkeley Prosopography Services
Laurie Pearce and Patrick Schmitz
Berkeley Prosopography Services (BPS, berkeleyprosopography.org) is a complete package, an interactive tool-kit for analyzing and visualizing prosopographical datasets, available to researchers working in diverse disciplines and operating on data that derive from a variety of text sources and formats. BPS developed as a collaboration between University of California Berkeley researchers in Near Eastern Studies eager for digital tools to facilitate prosopographical research, and a central Research IT team working to develop digital resources that served actual research needs.
BPS innovates by providing a complete package of software tools to perform association and computation tasks for name disambiguation, by adding a new model for curation and collaboration, and by connecting Social Network Analysis (SNA) tools and visualizations. At the heart of the BPS productivity and visualization tools, and the workspace support for exploration and collaboration, is an assertion model that is predicated on hueristics conventially (and manually) implemented by researchers working with onomastic and prosopographical data.
BPS tools include 1) functionality to import TEI documents and convert to our data model, 2) a disambiguation engine to associate names to persons based upon configurable heuristic rules, 3) an assertion model that supports flexible researcher curation and tracks provenance, 4) social network analysis and 5) graph visualization tools to analyze and understand social relations, and 6) a workspace model supporting exploratory research and collaboration. The assertion model poses a challenge to the assignment of unique identifiers and application of Linked Open Data (LOD).
The processing steps reflect the BPS architecture, which is divided into three major areas (diagram available at: http://berkeleyprosopography.org/docs/BPSarchitecture#FigureC):
1. In Text Preprocessing, a corpus is converted from some native format to TEI. The development corpus for BPS is a group of ~500 Akkadian cuneiform legal documents from Hellenistic Uruk, a corpus of the project Hellenistic Babylonia: Texts, Images and Names (HBTIN, oracc.org/hbtin). That project is a component of the Oracc consortium (On-line Richly Annotated Cuneiform Corpora, oracc.org), represented at LAWDI 2012 by Steve Tinney. HBTIN adheres to the shared standards and best practices of the Oracc community and the Cuneiform Digital Library Initiative (CDLI, cdli.ucla.edu).
The TEI markup (in the case of the HBTIN documents, a Unicode representation of transliterated Akkadian) includes elements denoting the individual documents, activities within each document, and persons that have roles in those activities. This markup may be generated by hand or by some semi-automated processes to recognize names, filiation, roles and activities (in any case, most of this happens external to the BPS system). Oracc generates the TEI for HBTIN texts used in BPS. Planned work includes the addition of services to support a broader range of corpora formats as input (e.g., direct from an existing database), and to support simple NLP plug-ins to enrich TEI (e.g., with role markup, based upon patterns).
2. In Disambiguation and Social Network Analysis, TEI is ingested and parsed by corpus services, and a native data model is built internally. The workspace services share this model, and leverage authentication and authorization components to support login and access controls on corpus and workspace resources. The disambiguation engine incorporates configurable rules that may be generic or corpus-specific, and associates the name citations in each document with actual persons depicted in the texts. It includes support for assertions that researchers make to confirm or reject the possibilities suggested by the engine. Finally, GraphML is passed to the SNA services to compute significant features of the social networks.
3. The Presentation, Visualization, and Reporting area presents results from various core model and analysis components, including the declared data model in each corpus (names, activities, etc.), assertions that the researcher has made or imported from others, family tree visualizations, as well as interactive network graphs for exploration and understanding.
The assertions model underlies several areas of BPS functionality, but is described in the primary context of making assertions about disambiguation.
A primary task in prosopography is to determine which real-world person corresponds to a given name instance. All name instances in a corpus, both within a single document (intra-document) and in documents across the corpus (inter-document), provide evidence for disambiguation. The algorithmic model is based upon the heuristics that researchers have long used, and so is familiar to BPS users. To begin, a unique person is posited for each name instance in each document. Then, the model attempts to collapse persons into one another, so that the persons posited for name instances that refer to a given real-world person are collapsed into a single person in the model as well. It does this according to user-configured rules that operate on various features (properties) of each original person. Filiation (declaration of parents and ancestors) is a primary feature used by the model. Additional features include the activity in which each associated name instance is cited, the roles that the citation had in the activity, the date of the respective activities, etc.
The rules of the model operate on these features and then can have one of three functions:
Shift rules shift weight from one person to another
Boost rules magnify the effect of applied shift rules
Discount rules reduce the effect of applied shift rules
A rule that produces a conclusive match between two person/name instances may shift 100% of the weight from one to the other. A rule that is only likely but not certain, may shift less weight. Name-matching rules are generally modeled as shift rules. Rules that provide additional evidence for a match are modeled as a boost, and tend to leverage features like location or activity. Rules that provide evidence of counter-indication are modeled as a discount; examples include date rules that consider the typical life-span and span of activity, along with the dates of respective activities (if two activities are 30 years apart, there is less likelihood that two person/name instances refer to the same real-world person, and so even if a name matches, a discount reduces the effect of the collapse).
Rules may apply only to person/names within a document (intra-document rules), or to persons across the corpus (inter-document rules). Many rules can operate in either model, but function slightly differently in the two contexts. The end result of applying the rules is a set of probabilities for each name-instance, for the set of real-word persons to which that name instance may correspond (low weight probabilities can be filtered out to simply results). Each researcher can configure their confidence in each rule that is configured for their corpus, and thereby individually control how the heuristic proceeds. Changing these values can also allow researchers to explore what-if scenarios.
Once the disambiguation model has run and produced weighted probabilities for each name-instance in a document, a researcher can review these in the UI, and then decide to confirm or discard the results of the model. These assertions are modeled as an action that the model can take to override the computed results, and so operate something like a boost or discount (in fact, the user can optionally set a confidence on each assertion, so it may be more of a hint than a conclusion on their part).
Within the BPS model broadly, assertions have (one or more) anchors in corpus document(s) that specify the resources (in the most general sense) upon which the assertion operates, an action that must be realizable in the model, and provenance (the researcher ID and date when it was created). The common assertions described above allow researchers to specify that a given name-citation is or is not the same person as another name-citation. However other types of assertions are also possible. Users may assert the date of a document for which metadata was missing, or where damage precludes conclusive dating in the original corpus. Users implicitly assert their confidence in each rule when they use the rules configuration interface.
Since the assertions are abstract (although tied to a corpus or application model), they can be serialized and published from their workspace. Other researchers working with the same corpus can accept selected assertions from a peer and apply them in their own workspace, and reject or ignore those with which they do not agree. As the each pass of the disambiguation process effectively reflects the computation of an algorithm defined by each user, the results (that is, the individuals disambiguated from multiple name-instances) may differ.
Challenge for implementing LOD in the BPS model:
In the realm of prosopographical research, LOD should find immediate reception and application. In traditional prosopographical research, some attributes within any given domain (e.g., toponyms, names of rulers, and generally agreed-upon terms) may readily be assigned unique identifiers. In traditional prosopographical research, the promotion of results of a single-authority disambiguation model may equally prompt the assignment of unique identifiers, in spite of a range of uncertainty that may surround a disambiguation. BPS differs from all other prosopography tools and projects in formalizing and integrating the probabilistic heuristics prosopographers naturally apply in their research, and in providing a workspace environment in which individual or collaborating researchers may approach a single corpus with different assertions. Each modification may result in variant disambiguations, which the researcher explores and may accept or reject. In view of the mutability of results that the probabilistic tools may generate, BPS is faced with a challenge with respect to assigning unique identifiers to disambiguated individuals. Recognizing the value of LOD, BPS researchers continue to investigate the application of unique identifiers to the results the tools generate.
©2014 Laurie Pearce and Patrick Schmitz. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.20 (2014)
Linking Portable Antiquities to a Wider Web
Daniel E.J. Pett
Abstract: This paper will discuss the impact that the two LAWDI events had on the digital work and output of the United Kingdom's Portable Antiquities Scheme, based at the British Museum, London. It discusses the progress of the author's work in developing the Scheme's online presence towards Berners-Lee's 5 Stars of Linked data following the two iterations of the LAWDI programme in 2012 and 2013. This article also gives examples of Linked Data principles being utilised by the Portable Antiquities Scheme website.
Subjects: Humanities--Study and teaching
Keywords: archaeology, antiquities, semantic web, linked data, classics, numismatics, portable antiquities scheme
The Portable Antiquities Scheme (hereafter the PAS) was one of the few projects to attend the LAWDI events, which did not truly fall under the Classical World umbrella that neatly encompassed many of the other attending projects. The PAS is a government funded project in the United Kingdom, which promotes the voluntary recording of archaeological objects found by members of the public in England and Wales and administers the Treasure Act process. These objects that are recorded on the PAS database range from the Prehistoric to Post-Medieval periods (as defined in the UK by English Heritage) with a large corpus of material that can be attributed to the Roman Empire, of which the majority are coins. Indeed, numismatic material provides the greatest opportunity for the Scheme to link to other resources (or URIs) through their regular attributes such as the mint (geography), era (time), issuer or moneyer (people), place of discovery (geography) and so on.
Since 2003 (Pett 2010a) these data have been placed online within a dynamic database that is updated in real-time, and over 900,000 objects have been recorded with images and extensive metadata (over 200 possible fields, many using controlled and agreed vocabularies) being collected for each. These metadata present an exciting and very practical opportunity for implementing linked data techniques, collaborating with many of the attending projects, and this paper will briefly touch on this.
The majority of attending projects were beginning their adventure into the world of Linked and Open Data (LOD) and many could provide data that the PAS software could and can consume and use for enrichment of its own website (Pett 2010b, Gruber et al. 2012). In turn these linkages can be tied to external resources such as Virtual Identity Authorities Files (VIAF) or dbPedia or as Kansa (2013) showed in his excellent presentations, to projects such as the Encyclopaedia of Life.
Pre-LAWDI: where did the PAS website stand?
The PAS software is written entirely by the paper's author (all source code available at GitHub), and builds on the original content management system that was provided by Oxford ArchDigital before their liquidation in 2007 (Pett 2010a). (Like Kansa's superb OpenContext, choices of technology and implementation are very similar: PHP, Solr etc). (This software provides a platform for the 'real-time' capture and publication of artefacts discovered by the public within England and Wales, whilst pursuing their hobbies.) From this period onwards, the author began to explore best practices for web implementation and early on in the development of the site (Pett 2012); a decision was made to begin the journey towards what was to become Berners-Lee’s 5 stars of linked data (see Berners-Lee 2006 and Hausenblas 2010) with the implementation of cool URIs (W3C 2008) which attempted to describe the resource that the consumer would find; for example: http://finds.org.uk/romancoins/personifications/named/as/Apollo which leads to details of the page within the Roman coin guide describing the depiction of the personification of Apollo. Data driven pages within the site could be obtained in various representations - for example as JSON, XML, CSV or KML – but content negotiation was still lacking and is at the present time as the author has not managed to find time to implement this. The lack of finds.org.uk serving up data with content negotiation has been highlighted by Light (2011) and is something that does need resolving.
Within the site templates structured data has been used for example microformats (these are now 7 years old as a concept) and rdfa (Herman et al, 2013) are used extensively. For example within HTML contact templates, FOAF standards were implemented (Brickley and Miller 2010). This can be shown in Google’s webmaster tools structured data section in figure 1 below:
Figure 1: An example of structured data from the author’s profile page as seen by Google’s webmaster tools.
Post-LAWDI 2012
After the 2012 event at ISAW (Elliot et al. 2012), efforts were made to bring the PAS into the Pelagios family. This was quite a simple task to expedite with expert advice available online (Barker, Isaksen and Simon 2012) and the PAS database already using Pleiades identifiers within its schema. A weekly compiled dump of the ever changing PAS data that had Pleiades IDs (Pett, 2012b) was produced and this was integrated into the Pelagios ecosystem (at present over 80,000 Roman coins records on the PAS database have attributions to Roman mints which have been aligned with Pleiades identifiers, leaving around 90,000 unattributed.) These data were used to great effect in Cayless’ data explorer visualisation that was unveiled to the world at large at LAWDI-2013 (Cayless 2013) with Rome being our highest attribution.
In the spirit of collaboration and as a mutual benefit, the PAS also hosted a mirror tile store of the maps that Johan Åhfeldt (2012) created and these are available for anyone to use, with the PAS picking up the associated bandwidth costs (Pett 2012d). Further integration with LAWDI resources included the use of the Pelagios widget, Nomisma and Pleiades identifiers and the ISAW javascript library (Rabinowitz & Heath 2012).
Attempts were also made to implement more structured data within HTML templates, for example the use of Schema.org and Facebook’s OpenGraph metadata tags. A good example of structured data in action can be seen through the author’s implementation of Twitter cards (Pett 2012), a pretty simple process where meta-tags are added to the head section of an HTML document and this allows the Twitter user-interfaces to parse a concise preview of your content (figure 2 below demonstrates this for the Roman cavalry helmet found with an Iron Age hoard in Leicestershire – PAS record PAS-984616 - (Leins & Hill 2012).
Figure 2: An example of a Twitter card, produced via the parsing of metadata tags applied within mark-up on finds.org.uk templates.
Concerted efforts were also made to try and bring the authority lists used for recording coins to external authority lists, and to achieve this, the author used OpenRefine to reconcile terms against VIAF (see Page 2013 for details on how to do this) and dbPedia. Some of the results of this can be seen in Elliot’s "About Roman Emperors" project (2013) and also within the templates used in the PAS website numismatic guides and this process of enhancement is ongoing.
Post LAWDI 2013 – production and consumption of linked data.
Following the 2013 event at Drew University, the author began to embark on further development of the LOD capabilities of the PAS website (Pett 2013). A decision was made to integrate with resources provided by English Heritage, the Ordnance Survey, Nomisma, VIAF, dbPedia and the British Museum in a more regimented fashion throughout the records of objects held within the database.
Production of linked data via the PAS website
RDF is now produced by using XSLT on XML returned from the SOLR indexes that drive the search engine that power PAS pages and this RDF would attempt to be modelled in the same manner as the British Museum's representation of the CIDOC-CRM model (also building on previous work conducted by the University of Vienna as part of the European funded BRICKS project (Nussbaumer and Haslhofer, 2007: 7–18) but with links out to the resources described above. Through personal correspondence, and face to face meetings with the Research Space project team and via detailed consultation of their draft mapping document (Oldman et al 2013), an attempted modelling has been produced (yet to be documented). The author has changed/ignored some aspects and adopted the nested RDFXML (Klyne 2010) that has been used by the Claros project at Oxford University and also linked to external resources (something that the British Museum implementation does not do yet.)
Integration with the British Museum thesauri was reasonably complicated, but was managed through querying their development endpoint and exporting results as CSV. The resulting tables are now available on GitHub and these were imported into our MySQL database for querying and joining with our existing thesauri. The same process was implemented on the AHRC funded Seneschal endpoint ( to obtain their URI structures, and these were also imported and linked to the PAS schema. These identifiers can then be compiled within the RDF that is ultimately produced from the PAS site, alongside the already integrated Nomisma and Pleiades identifiers. Linking to these mentioned resources provides a rich foundation on which to build either within the confines of the PAS website or on third party sites.
As the PAS database is updated in real-time, there are multiple changes daily to the dataset and our RDF is regenerated nightly through a scheduled cronjob calling a script that transforms the SOLR XML and saves it to our server (http://finds.org.uk/rdf/ provides a list of available files as {date}.rdf or pelagios-{date}.rdf) and also to Amazon S3 for archiving purposes (15 days saved on an incremental basis.)
Consumption of linked data within the PAS website
With the integration of external identifiers into the PAS database schema, the enrichment of resources can be much improved. Prior to LAWDI-2012, the author had integrated data from various resources, usually via the use of Applications Programming Interfaces (APIs), but sometimes via consumption of RDF data. This was achieved by the use of the ARC2 library (Nowak 2011), but this has now been superseded by using the EasyRDF PHP library (Humfrey 2012) and has led to a wider consumption of RDF throughout the site. It is now possible to extract more data for the enrichment of our issuer and ruler biographical pages, for example Augustus, where via the use of identifiers drawn from Nomisma, the British Museum and dbPedia an aggregated biographical page can be produced and presented with dynamic data drawn directly from the PAS database. This principle has also been applied to the coin guides for other periods of British history, with the same enriching effect. Extra information can be gleaned from the structured data returned from dbPedia, with information relating to parents, titles, battle commands and wives readily available. The return on investment and time spent tying these identifiers to our vocabulary and authority lists is therefore apparent! Other examples are also easy to find within the PAS website, for instance, by combining Pleiades and Nomisma identifiers, enriched pages relating to Roman mints can be produced, (for example, see Rome) with images obtained from Flickr when they have been machine tagged appropriately (see Gillies 2012.) Data can also be consumed from the excellent Pelagios project, as shown below in figure 3:
Figure 3: Data from Pleiades and Pelagios, used to enrich the mint page for Aquileia, Italy. By integrating this data, the PAS enables further resource discovery and collaboration.
One caveat that the PAS website has shown to ingesting these data is the reliance on uptime of the other resources – dbPedia is frequently slow or down and this can have, and Pleiades has a period of instability which have significant impact on enriched resources.
Where does the road lead now?
Following on from LAWDI 2012 and 2013, it is intended to develop the LOD capabilities of the PAS further. Work is presently underway to enable a SPARQL endpoint to a self-hosted triple store through a choice of software currently being evaluated: Apache Marmotta and Fuseki (which has been used to good effect by Gruber (2013)) which is hoped will allow PAS data to be used in more applications within the LAWDI family. Beyond this initial audience, there is also the possibility of feeding into the Mellon funded Research Space initiative and to provide data for other LOD projects.
It is also hoped that it will be possible to modify the Zend Framework based PAS system to allow for full content negotiation and ideally, removal of the ‘Share-Alike’ Creative Commons licence and replacement with a simple CC-BY version Work will also continue to increase the amount of structured data available via the xhtml+rdfa pages of the website and database and further resources to link to will be sought out and partnerships forged. With good fortune and some extra human resources in the offing, it is hoped that the PAS might reach the 5 stars level by quarter one of 2014. Funding has been forthcoming from the Arts and Humanites Research Council (AHRC) for two projects that will deal with the production and consumption of linked data - one project (looking at crowd-sourcing, crowd-funding and 3D visualisation) with UCL (£318,000) and a visualisation project with Tracemedia (£5,000). The results of these will be published in print and online during 2014 and the first quarter of 2015.
Works cited
Åhfeldt, J. (2012) "A digital map of the Roman Empire" Available at: http://pelagios-project.blogspot.co.uk/2012/09/a-digital-map-of-roman-empire.html (accessed 27th September 2013.)
Barker, E., Isaksen, L. & Simon, R. (2012) "Pelagios Cookbook Wiki" Available at: https://github.com/pelagios/pelagios-cookbook/wiki (accessed 3rd October 2013.)
Berners-Lee,T. (2006) "Linked Data" Available at: http://www.w3.org/DesignIssues/LinkedData.html (accessed 19th September 2013.)
Brickley, D. & Miller, L. (2010) "FOAF Vocabulary Specification 0.98" Available at: http://xmlns.com/foaf/spec/ (accessed 3rd October 2013.)
Cayless, H. (2013) "Visualisation of places attributed to Rome" Available at: http://isaw2.atlantides.org/lawdi/force-graph.html?s=http://nomisma.org/id/rome (accessed 29th May 2013.)
Elliot, T. (2013) "About Roman Emperors" Available at: http://www.paregorios.org/resources/roman-emperors/ (accessed 2nd October 2013.)
Elliott, T., Heath, S., Muccigrosso, J. (2012) "Report on the Linked Ancient World Data Institute" in "Information Standards Quarterly, 2012 Spring/Summer, 24(2/3)" pp: 43-45. Available at: http://dx.doi.org/10.3789/isqv24n2-3.2012.08 (accessed 2nd October 2013)
Gillies, S. (2012) "Pleiades: a guest post" Available at: http://code.flickr.net/2011/12/16/pleiades-a-guest-post/ (accessed 3rd October 2013.)
Gruber, E. (2013) "New and Improved Nomisma.org Released" Available at: http://numishare.blogspot.co.uk/2013/07/new-and-improved-nomismaorg-released.html (acccesed 29th September 2013)
Gruber, E., Heath, S., Meadows, A., Pett, D., Tolle, K. and Wigg-Wolf, D. (2012) "Semantic Web Technologies Applied to Numismatic Collections" in "CAA2012 Proceedings of the 40th Conference in Computer Applications and Quantitative Methods in Archaeology, Southampton, United Kingdom, 26-30 March 2012" Abstract available at: https://www.ocs.soton.ac.uk/index.php/CAA/2012/paper/view/707
Hausenblas, M. (2010) "5 Star Open Data." Available at: http://5stardata.info/ (accessed 19th September 2013.)
Herman, I., Adida, B., Sporny, M. and Birkbeck, M. (2013) "RDFa 1.1 Primer - Second Edition Rich Structured Data Markup for Web Documents" Available at: http://www.w3.org/TR/xhtml-rdfa-primer/ (accessed 3rd October 2013.)
Humfrey, N. (2012) "EasyRDF php library" Available at: https://github.com/njh/easyrdf (accessed 27th September 2013.)
Kansa, E. (2013) "A publication approach to linked data in archaeology" Available at http://www.slideshare.net/ekansa/lawdi-open-context-publishing-linked-data-in-archaeology (accessed 27th September 2013.)
Klyne, G. (2010) "CIDOC CRM RDF/XML" Available at: http://www.clarosnet.org/wiki/index.php?title=CIDOC_CRM_RDF/XML (accessed 3rd October 2013.)
Leins, I. & Hill, J.D. (2012) "The Iron Age hoard from Hallaton" Available at: http://finds.org.uk/database/artefacts/record/id/509553 (accessed 27th September 2013.)
Light, R. (2011) "Using the PAS REST framework" Available at http://light.demon.co.uk/wordpress/?p=26 (accessed 27th September 2013.)
Nowack, B. (2011) "ARC RDF Classes for PHP" Available at https://github.com/semsol/arc2
(accessed 27th September 2013.)
Nussbaumer, P. & Haslhofer, B. (2007) "Putting the CIDOC CRM into Practice - Experiences and Challenges Vienna" University of Vienna. Also available at: http://eprints.cs.univie.ac.at/404/1/covered.pdf (accessed 19th September 2013.)
Oldman, D., Mahmud, J. and Alexiev, V. (2013) "The Conceptual Reference Model Revealed: Quality contextual data for research and engagement: A British Museum case study" (Internal. Discussion document version 0.98 (Unpublished: the latest version is available on request from Dominic Oldman).
Page, R.D.M. (2013) "Reconciling author names using Open Refine and VIAF" Available at: http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html (accessed 23rd September 2013.)
Pett, D.E.J. (2010a) "The Portable Antiquities Scheme’s Database: its development for research since 1998." In "A Decade of Discovery: Proceedings of the Portable Antiquities Scheme Conference 2007", Worrell, S., Egan, G., Leahy, K., Naylor, J., & Lewis M. (eds.) pp 1-18. London: David Brown Book Company. Available at: https://docs.google.com/file/d/0B1zHuVdu5LYnYWUyMmQ0NTctOGI5Mi00NjY3LTg2MTQtNjdmMDFjNTdhNmRj/edit?usp=docslist_api&authkey=CKOk8tUE (accessed 4th October 2013.)
Pett, D.E.J. (2010b) "Distributing the Wealth: Digital knowledge transfer for Numismatics." in "The British Museum and the Future of Numismatics" edited by Cook, B. (pp 71-80) London: British Museum Press. Available at: http://www.academia.edu/2259658/Distributing_the_wealth (accessed 27th September 2013.)
Pett, D.E.J. (2012a) "Linked Portable Antiquities Data for #LAWDI" Available at: http://www.slideshare.net/dejp3/presentation-for-linked-ancient-world-data-institute (accessed 23rd September 2013.)
Pett, D.E.J. (2012b) "The Portable Antiquities Scheme joins Pelagios" Available at: http://pelagios-project.blogspot.co.uk/2012/10/the-portable-antiquities-scheme-joins.html (accessed 23rd September 2013.)
Pett, D.E.J. (2012c) "Implementing Twitter Cards with Zend Framework" Available at: http://finds.org.uk/blogs/labs/2012/12/04/twittercard-zend-framework/ (accessed 23rd September 2013)
Pett, D.E.J. (2012d) "Using the Imperium map layer from the Scheme server for Google maps" Available at: http://finds.org.uk/blogs/blog/2012/11/27/using-the-imperium-map-layer-from-the-scheme-server-for-google-maps/ (accessed 23rd September 2013.)
Pett, D.E.J. (2013) "#LAWDI: or how I learned to stop worrying and love linked data. (2012 – 2013)" Available at: http://bit.ly/lawdiPett2013 (accessed 23rd September 2013.)
Rabinowitz, N. and Heath, S. (2012) Ancient World Linked Data JS Lib (awld.js) Available at: http://isaw.nyu.edu/members/sebastian.heath-40nyu.edu/awld-js (accessed 29th September 2013)
Sauermann, L. & Cyganiak, R. (eds.). (2008) "Cool URIs for the Semantic Web." Available at: http://www.w3.org/TR/cooluris/ (accessed 25th June 2012.)
©2014 Daniel Pett. Published under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
To the extent possible under law, Daniel Pett has waived all copyright and related or neighboring rights to Linking Portable Antiquities to a wider web. This work is published from:
United Kingdom.
This article is part of ISAW Papers 7.
ISAW Papers 7.21 (Preprint)
Pompeii Bibliography and Mapping Resource
Eric Poehler
The ancient city of Pompeii “is at once the most studied and the least understood of sites. Universally familiar, its excavation and scholarship prove a nightmare of omissions and disasters. Each generation discovers with horror the extent to which information has been ignored, neglected, destroyed and left unreported and unpublished” (Wallace-Hadrill 1994, 65). When Andrew Wallace-Hadrill published these words in 1994, they were merely the most succinct and cogent expression of what every scholar already knew: working within the ruins of the ancient city is significantly easier than digging through the equally vast and uneven archive of scholarship. Even what has been published is spread across the world in hundreds of libraries and archives, hidden in obscure and defunct journals, travel diaries, monographs and illustrations. Paradoxically, with the efficiency of interlibrary loan services and electronic means of transmission, it is easier to acquire the actual publication than it is to first discover that source as a relevant citation. Fortunately, in 1998 Laurentino Garcia y Garcia published his landmark compendium of 14,596 annotated citations, the Nova Bibliotheca Pompeiana (NBP), bringing these far-flung titles together for the first time. The arrival of the NBP has not, however, revolutionized the study of Pompeii as expected. The reasons for this are simple: the book’s high cost prevented a wide distribution and being a physical book limited the organization of its contents to a single format.
The Pompeii Bibliography and Mapping Resource
With funding both an NEH Digital Humanities Start-Up grant and the ACLS Digital Innovation Fellowship, the Pompeii Bibliography and Mapping Project is working to create a unique resource for the study of ancient Pompeii that will overcome this failing and others: an exhaustive subject repository searchable through a GIS map. The Pompeii Bibliography and Mapping Resource (PBMR) is a web-based research tool composed of three parts: 1. Bibliographic Database and Full-Text Document Repository, 2. Geographical Information System (GIS) and 3. User Interface. Because Pompeii lacks both a single, searchable bibliography and a standard, up-to-date map, the creation of a resource that solves these problems and simultaneously offers new and powerful search methods will revolutionize research on the ancient city. More specifically, the addition of the Geographical Information System introduces the ability to easily broaden a bibliographic investigation to an adjacent building or related subject without apprehension of investing dozens of hours in new bibliographic searches. One simply moves the mouse and clicks. We believe this will encourage the exploration of imaginative new connections between central and tangential datasets, between what the user was looking for and what interesting side road she was tempted to go down. The GIS will also bring uniquely powerful search tools – spatial analysis tools, such as proximity, density, and distribution analyses – that will allow the user to begin or to refine their research based on the landscape of the city. Additionally, because the geographical files will also be available for download, individuals can conduct more advanced analyses and produce new interpretations of Pompeii without each bearing the prohibitive burden of digitizing the entire ancient city. When users complete such advanced analyses, the PBMR will also provide a location to upload, maintain and serve the files they generate so that the carto-bibliography of the ancient city can continually improve.
Modalities
In its most basic functioning, the PBMR is a research tool that affords the user the ability to navigate Pompeii’s landscape and discover an extensive account of the information about that location, including (but not limited to) name, type, images, size and bibliography. Students and the public will need no special training to find and access information; they can simply search the subject repository or peruse the map as easily as they use familiar web services. Academics will gain the ability to quickly conduct exhaustive searches on multiple subjects, simultaneously producing both a series of comprehensive bibliographies for each location and maps illustrating their spatial relationships in the landscape of Pompeii. Because performing either of these tasks today – conducting an exhaustive search or mapping the results – is prohibitively time consuming, the PBMR will revolutionize research on Pompeii, even in its most basic functionalities.
The GIS component provides a powerful mapping tool that can generate custom maps for diverse user groups. Students working on a particular building can create both overview and detailed maps to illustrate their study, which would be based on information provided by the PMBR as a research tool. Visitors and researchers to Pompeii both will use the map via mobile devices to navigate the actual ruins of Pompeii, a utility of great value considering the regular shortages of paper maps available to visitors at the site. Similarly, in the academic realm, the perennial problem of poor quality, lack of standardization, and differing interpretations of space in published maps can be overcome by the PBMR’s free access to mapping data, mapping tools and data versioning archive.
The most powerful use of the PBMR is as an analysis tool, as a means to simultaneously ask a series of questions and receive data-rich answers. The combination of the bibliographic database and GIS allows the user to vacillate between spatial analysis tools and bibliographic analysis tools, a process that produces results impossible to achieve in any other method. To illustrate this point, imagine the results of a multilingual search for “House / Casa / Haus / Maison” in the PBMR: hundreds of citations will appear along with hundreds locations highlighted in the map. Of course, these results are practically impossible to use. The spatial analysis tools, however, will permit a user to filter these results by the area each house occupied. Thus, a spatial query for houses between 100m2 and 400m2 will limit the results to only ‘average’ sized houses. These results can then be further refined by bibliographic criteria, such as year of publication, to find the houses investigated in the last twenty years. Finally, the user might once again choose a spatial characteristic to get at a still more nuanced picture of these houses. For example, she can search the refined results for those houses that ‘touch the boundary of’ or ‘contain’ a shop or workshop. The final results, produced in the matter of minutes rather than weeks, reveal the instances of and provide extensive documentation for those residences of average size that were most recently investigated and likely had a commercial profile.
These three modes of use make the PBMR a powerful resource for users across the spectrum of interest. Although the bibliography is populated with citations relevant to Pompeii and the GIS is based on maps of the ancient city, applying the model created by the PBMR to other subjects or disciplines in the humanities can be as simple as populating the shell with bibliographic and spatial data and building the connections between them. For example, other large, densely occupied archaeological sites such as Teotihuacan in Mexico or Angkor Wat in Cambodia could benefit from this system. Broader geographies are certainly possible as well, such as historical mapping of the bibliography of the crusades. Moreover, more disparate fields, such as English or other modern languages might map the literary landscapes of important genres: naturalists of the Romantic period, Victorian era London, or the diffusion and impact of early Buddhist texts. We believe that even simply visualizing the spatial relationships among texts (and the locations within them) will lead to new ways of thinking about the subjects.
Works Cited
Wallace-Hadrill 1994 Wallace-Hadrill, Andrew. Houses and Society in Pompeii and Herculaneum Princeton, NJ: Princeton University Press, 1994.
©2014 Eric Poehler. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.22 (2014)
It’s about time: historical periodization and Linked Ancient World Data
Adam Rabinowitz
Most of us who have taught students about the material remains of the ancient world would agree that time in the past is harder to conceptualize than space in the past. This is especially true of the conventional temporal divisions -- "periods" -- we use to group and categorize the art, archaeological remains, and historical events of the ancient world. When asked to mark ancient places on a map, students excel. When asked to place objects within a historical, archaeological, or art-historical period and to assign a corresponding date range, however, the same students sometimes seem to be picking terms and dates at random. We can see ancient places inscribed in present spaces, and we can imagine ourselves in them. But ancient time undergoes a curious flattening process, in which developments that took place over centuries appear to have happened all at once in a great jumble. Introduce into this confusion the idea that "periods" are essentially arbitrary conventions about which scholars disagree, and many students will decide that the periodization of the ancient world is simply impossible to learn.
Similar problems apply to the role of temporal divisions in the Linked Ancient World Data environment. Conceptual entities that we can associate with concrete physical manifestations -- people, places, texts -- have provided the foundations for a growing Linked Ancient World Data infrastructure in which it is increasingly easy to discover semantically-related information. Authoritative spatial gazetteers like the Pleiades platform have not only made it possible to extract maps from texts by matching place names to geographic coordinates, they have also provided a conceptual connection that initiatives such as the Pelagios Project (Simon, Barker and Isaksen 2012) have been able to use to bring together material associated with the same site in different databases (for example, material associated with ancient Athens). The use of Uniform Resource Identifiers (URIs) for people, places, and things sidesteps the barriers to discovery created by the use of different metadata schemata, different vocabularies, or different languages. At the same time, however, it requires general agreement about the identity of the underlying entity: Athens, Athenae, Athen, Afen, Athina have to refer to the same socio-political unit located at more or less the same geographic coordinates.
Periods have been modeled as abstract components of formal ontologies (e.g. the CIDOC-CRM: http://www.cidoc-crm.org/crm-concepts/#_Toc224984167; Doerr, Kritsotaki and Stead 2005; Doerr, Kritsotaki and Stead 2010), and directory or gazetteer-based approaches have been attempted on several occasions -- for example, in the Electronic Cultural Atlas Initiative (Petras et al. 2006) and the Common Eras project (Isaksen et al. 2009). But periods have proven to work poorly with Linked Data principles, which require well-defined entities for linking. As concepts of convenience, often invented by a single scholar in a seminal work but then modified and debated over long periods of time, deliberately left vague in temporal terms in order to protect their usefulness for relative chronologies, individual periods have no objective existence. Yet they are deeply entangled with both physical space and absolute time, the latter measured by concrete calendar systems and ultimately by the physical rotation of the earth and its movement around the sun. Particular period terms are associated with particular absolute dates in particular places; in turn, changes in the identity of a place at a particular geographic location can only be described in terms of the passage of time, often in the form of historical periodization.
The creation of an agreed-upon controlled vocabulary for periods therefore requires either highly local definitions or definitions that are so broad and vague as to be all-encompassing. Both types exist: local vocabularies in use in the UK heritage community, for example, define the "Iron Age" as a period (presumably only in Britain) lasting from 800 BC to AD 43, the year of the Roman conquest (e.g. http://www.fish-forum.info/i_apl_e.htm), while the Getty Thesaurus of Art & Architecture, which seeks to offer more universal concepts, describes a global "Iron Age" in terms of the three-age system and notes generally that it has different dates in different places. It also offers a facet for the European Iron Age, which has several stylistic-chronological subgroups: here, the only value associated with Britain is the La Tène period/culture, which is placed in space in "Northern Europe and the British isles" and in time from "the mid-fifth century BCE" to the "mid-first century BCE". Though this definition and that of the UK heritage community overlap partially in time, space and concept, they employ different terms; conversely, a single period term such as "Iron Age" can change dramatically in meaning according to where it is used, or by whom. For Linked Data purposes, then, this is not a simple question of using URIs from a shared gazetteer to align different terms for the same period concept. A solution is needed that recognizes the fact that different scholars can conceptualize the same time range or cultural phenomenon quite differently, while others use the same terms to describe time ranges or cultural phenomena that differ to varying degrees.
I first encountered this problem during the development of GeoDia, an interactive spatial timeline of ancient Mediterranean history and archaeology (Rabinowitz 2013). The original design was focused on archaeological and art-historical periodization, as a way to explain visually to students how ancient remains were grouped together (Figure 1).
Figure 1. View of Athens in GeoDia: location on map, periodization on
timeline, images divided by period on right.
We began with the idea that we would have general, Getty-AAT-style periods that would then be attached to individual archaeological sites. But major problems with this approach soon began to emerge. Using a generic periodization meant, in some cases, that sites would be represented with periods that were not used to describe them in the literature. In other cases, it meant a conflict between the date-range of a period to be represented in the timeline and the history of a particular site. Pompeii is a good example: though Pompeii exists during the more general "Flavian period", that site ends abruptly in AD 79, in the middle of that period, with the eruption of Vesuvius. The idiosyncrasies of particular regions and sites therefore led us to a data-model in which the basic unit was the "site period", in most cases a local manifestation, with a site-specific date range, of a more general period concept (Figure 2).
Figure 2. The GeoDia period data model.
Because GeoDia, as a synchronized mashup of a map and a timeline, relies on places with geographic coordinates, it was relatively easy to join the Linked Ancient World Data ecosystem, at least peripherally, by attaching Pleiades URIs to its sites. But it quickly became apparent that allowing links back in was impossible, because the interface does not display "places": it displays a dynamic combination of location and "site periods" drawn from separate tables in the underlying database. We could consume URIs, but we couldn't produce them. Furthermore, while Pleiades URIs could be pulled in on a semi-automatic basis, it was impossible to connect GeoDia periods or site periods with controlled vocabularies. It was therefore impossible to take advantage of the possibilities for automated data integration offered by Linked Data approaches -- for example, the association of Flickr photos with ancient places in Pleiades through the use of URI-based machine tags. Frustration with this situation led me to join what I soon realized was a growing chorus of voices calling for the better integration of time in spatial approaches to history (Gregory 2007, Berman 2011, Janowicz 2012, Elliott 2013, Grossner 2013).
The addition of "when" to a Linked Ancient World Data landscape that currently focuses on "where" and "who" will not come, however, through the development of a consensus-based gazetteer for globally-recognized period concepts. Even if such a vocabulary could be developed, and even if a majority of data managers were to adopt it for future work (a big "if"), it could not be retrofitted to deal with the diversity of period definitions already present in existing print and digital resources without a significant loss of information. Furthermore, I do not think it is desirable to smooth out scholarly disagreements and differences regarding the definition of periods, for this discussion is a fundamental part of the rich history of scholarship on the ancient world.
A solution to this problem must therefore take into account the diverse ways in which we use and understand periodization in our scholarship, and it must have the flexibility to accommodate conflicting or changing definitions for periods. At the same time, it should facilitate the discovery of the sorts of information people are likely to want to find using Linked Period Data. Three come immediately to mind: material from the same absolute date-range across datasets employing a variety of periodizations; information from spatially-linked datasets filtered by period; and conflicting definitions of periods proposed by different scholars. In the first case, one might wish to find all records associated with the range 400-300 BC from a series of datasets that use various period terms for the same span (Hellenistic, Classical, Late Classical, Early Republican, Iron Age, etc.). In the second, one might wish to see only records associated with Classical Athens, or only references to the name used for a site in a given period (e.g. Cherson, the name used for Crimean Chersonesos in the Middle Byzantine period), or only records associated with period terms used to describe remains in a particular geographic region. And in the third case, a user might want to compare different temporal spans assigned to the same period term in the same place (Figure 3). So a Linked Data resource for periods should include both the terms used to refer to periods and the coordinates used to describe them in time and space. In order to follow scholarly practice, it should also provide references to the source(s) from which those coordinates were derived.
Figure 3. Various temporal definitions assigned to the period term or concept "Iron Age" in the Levant, from Kreuger 2013.
It would be difficult, if not impossible, to provide this information in a structured form in a top-down gazetteer of period concepts. Ryan Shaw, Eric Kansa and I, however, think that we have found a way to provide it in a bottom-up Linked Data gazetteer of period assertions -- that is, a set of stable references for what authorities say about periods, rather than a thesaurus that seeks to impose consensus about what periods are. This is the goal of the Periods, Organized (PeriodO) project, for which we are actively seeking funding. Beginning with the period assertions collected for sites and regions in GeoDia and a set of structured period vocabularies for a large group of Linked Ancient World Data projects, PeriodO will present period assertions in an JSON-LD schema that includes the term or label used by the source; a date range expressed in Julian Days in scientific notation, with the number of significant digits indicating the degree of precision of the start and end dates; an associated geographic entity (country, region, or site) from a Linked Data gazetteer (Pleiades, Wikidata, GeoNames, etc.); and an authoritative source, also linked to a URI, e.g. from VIAF, wherever possible (Figures 4 and 5).
Figure 4. A period assertion according to the PeriodO data model.
Figure 5. The same assertion expressed in JSON-LD with dates as Julian
Days in scientific notation.
We plan to house the repository in Github, at least initially, and to mint http DOIs through the EZID system of the California Digital Library. The DOIs will be resolvable to structured, machine-readable definitions for individual period assertions (Heath and Bizer, 2011). Github will also provide the user-management tools to begin to build a community of contributors to the gazetteer: if an authoritative user cannot find a URI for a period definition she prefers, she will be able to add her own. In the first phase of the project, we will create a visualization interface that makes it easier to find periods, see their spatial or temporal extent and the degree of precision of the latter, and compare different temporal definitions for the same period terms or the temporal overlap between different terms. Eventually, we hope to build a reconciliation service and a SPARQL endpoint that will make it easier to implement period assertion URIs and reuse the dataset itself.
As scholars, we do not talk about periods in the same way that we talk about places. With places, we generally tend to agree about the location of those that are well-known, and we generally seek to establish consensus on the location of those that are ambiguous. With periods, although we refer to them in a general way to communicate with each other, we prefer to use precise but idiosyncratic definitions that best fit our own material. We are also careful to cite the sources of those precise definitions, if we did not invent them ourselves: specificity about such matters is a basic tenet of scholarly communication. It seems reasonable, then, to attempt to mirror scholarly practice in the description of periods as Linked Data. The gazetteer PeriodO proposes to create would make room for the disagreement, imprecision, and multivocality that characterizes scholarly discourse, even while providing URIs for structured assertions about periods that include spatiotemporal coordinates and authority citations. This approach will not solve all the problems that face the implementation of period concepts in the Linked Ancient World Data ecosystem, but it does offer a grassroots starting point from which we can begin to explore possibilities. The success of this effort, like the success of Linked Ancient World Data initiatives in general, will depend on the development of a scholarly community of practice, with members willing to implement period assertion URIs in their own datasets and to contribute new period assertions to a gazetteer. Periods are too important to the ways we talk about and conceptualize the past to leave out of the Linked Ancient World Data cloud: it really is about time.
Works cited
Berman 2011 Berman, Merrick. "Historical Gazetteer Elements: Temporal Frameworks". Paper delivered in the "Symposium on Space-Time Integration in Geography and GIScience" at the Annual Meeting of the American Association of Geographers 2011 in Seattle, Washington. Available at https://cga-download.hmdc.harvard.edu/publish_web/2011_AAG_Gazetteer/Berman.pdf.
Doerr, Kritsotaki and Stead 2005 Doerr, Martin, Athina Kritsotaki, and Steven Stead. "Thesauri of Historical Periods - A Proposal for Standardization". In Proceedings of the CIDOC Conference (2005).
Doerr, Kritsotaki and Stead 2010 Doerr, Martin, Athina Kritsotaki, and Steven Stead. "Which period is it? A methodology to create thesauri of historical periods". In Beyond the Artefact: Digital Interpretation of the Past. Proceedings of the Computer Applications and Quantitative Methods in Archaeology Conference 2004, eds. Franco Niccolucci and Sorin Hermon. Budapest: Archaeolingua.
Gregory 2007 Gregory, Ian. Historical GIS: Technologies, Methodologies, and Scholarship. Cambridge Studies in Historical Geography 39. Cambridge?; New York: Cambridge University Press.
Grossner 2013 Grossner, Karl. "A space-time datatype for historical place?". Paper delivered at the Digital Humanities 2013 conference in Lincoln, Nebraska. Available at https://cga-download.hmdc.harvard.edu/publish_web/SpaceTime/DH2013_Grossner_SpaceTimeDatatype.pdf.
Elliott 2013 Elliott, Thomas. "Stitching together ancient geography online". Paper delivered at the 2013 Digital Classics Association Conference in Buffalo, NY. Video available at http://www.youtube.com/watch?v=-apXSbU5O1A.
Getty Research Institute. "Art & Architecture Thesaurus® Online". http://www.getty.edu/research/tools/vocabularies/aat/. Accessed September 20, 2013.
Heath and Bizer 2011 Heath, Tom and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space (1st ed.). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. San Rafael, CA: Morgan & Claypool. Available at http://linkeddatabook.com/editions/1.0/.
ICOM/CIDOC Documentation Standards Group. "Definition of the CIDOC Conceptual Reference Model". http://www.cidoc-crm.org/crm-concepts/. Accessed September 20, 2013.
Inscription. "RCHME Archaeological Periods List". http://www.fish-forum.info/i_apl_e.htm. Accessed September 20, 2013.
Isaksen et al. 2009 Isaksen, Leif, Chris Gutteridge, Gianluca Correndo, Zurina Muda, Lin Xu, and Thanassis Tiropanis. 2009. CommonEras. University of Southampton. http://commoneras.ecs.soton.ac.uk/.
Janowicz et al. 2012 Janowicz, Krzysztof, Simon Scheider, Todd Pehle, and Glen Hart. "Geospatial semantics and Linked Spatiotemporal Data -- past, present, and future." Semantic Web 3:4 (2012), 1-13.
Kreuger 2013 Kreuger, Kristi. A Case-study of the Iron Age and Implications for Temporal Metadata Definition. MA Thesis, University of North Carolina.
Pelagios Project. "Pelagios". http://pelagios-project.blogspot.com/. Accessed September 20, 2013.
Petras et al. 2006 Petras, Vivien, Ray R. Larson, and Michael Buckland. "Time Period Directories: A Metadata Infrastructure for Placing Events in Temporal and Geographic Context". In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). Chapel Hill: ACM Press. http://dx.doi.org/10.1145/1141753.1141782.
Rabinowitz 2013 Rabinowitz, Adam. "GeoDia: or, Navigating archaeological time and space in an American college classroom". In CAA 2012: Proceedings of the 40th Annual Conference of Computer Applications and Quantitative Methods in Archaeology (CAA), Southampton, England, eds. G. Earl, T. Sly, A. Chrysanthi, P. Murrieta-Flores, C. Papadopoulos, I. Romanowska, and D. Wheatley. Oxford: Archaeopress, 263-272.
Simon, Barker and Isaksen 2012 Simon, R., E. Barker, and L. Isaksen. "Exploring Pelagios: a visual browser for geo-tagged datasets". Workshop presented at the International Workshop on Supporting User’s Exploration of Digital Libraries. Available at http://eprints.soton.ac.uk/343484/.
UT Liberal Arts Instructional Technology Services. "GeoDia". http://geodia.laits.utexas.edu. Accessed September 20, 2013.
©2014 Adam Rabinowitz. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.23 (2014)
Publishing Archaeological Linked Open Data: From Steampunk to Sustainability
Andrew Reinhard
Publishers of archaeology largely continue to follow the “Steampunk model” of publication: we use 21st-century technology to produce 18th-century books and journals both in print and digital editions. These digital editions look and behave like traditional publications with their fixed layout, columnar text, and actual page-flipping. Like traditional printed archaeological monographs, there is no linking to external resources, images are most frequently black-and-white (and always two-dimensional), and the communication of information is one-way, the reader trusting an author without having access to the underlying data for fact-checking or to foster alternate interpretations of that data.
For the American School of Classical Studies’ (ASCSA) publications, we continue to grow with our digital editions while re-assuring our more traditional readers that we are not scuttling print. We publish digital issues ahead of print with our quarterly journal Hesperia. As of June 2013, our distributor, the David Brown Book Company/Oxbow Books/Casemate Publishing, now allows customers to buy print+digital bundles to satisfy our readership who wants their content that way. We continue to educate our authors, and encourage them to submit stable URIs to multimedia and data sets cited/linked within their manuscripts, and we are seeing an increase in that material upon final submission of their work. We have been able to repurpose content through various export features in Adobe InDesign for use on the Web and for other platforms.
More recently, the ASCSA has been trending toward Open Access and its dissemination of content. Hesperia is now largely Open Access outside of JSTOR's 3-year moving wall with over 1,500 articles now freely available for download and sharing via the ASCSA's website. The Publications Committee of the ASCSA will vote on a policy change in January 2014 to make digital editions of monographs available for free download three years after publication. I have been giving PDFs away for free to anyone upon request, however, ahead of this vote. I successfully negotiated with JSTOR to receive all metadata and files at the page-level for all of our hosted content (Agora “Blue Books”, Corinth “Red Books”, Hesperia Supplements, and Hesperia issues), and have provided this data to the Athenian Agora so that the staff there can enable linking from scanned excavation notebooks and other places to the exact spot on a precise page that published an inventoried item. I created a Zotero account and now actively publish citation data for all new ASCSA publications, and also uploaded legacy citation data for our backlist of monographs and articles published from 1932-2010 (https://www.zotero.org/adreinhard/items/collectionKey/CGEBIQJC).
I also created my first app for the Agora Site Guide on the Inkling Habitat platform for iOS devices (Android will come later in 2013) to create a non-linear reading experience that can be used both online and without Internet that integrates links to Open Access content including articles as well as data and additional images and archaeological context. I have also been discussing with Google and its NianticLabs the possibility of integrating ASCSA content (data and images) into its free Field Trip app for Android and iOS, utilizing native GPS functionality to inform users of nearby monuments while in Athens, Corinth, and elsewhere.
Four-Dimensional Archaeology: Publishing Openly Online
It would seem obvious that text is text, and is by its nature two-dimensional. The writer writes what the reader reads. Writing an article or a monograph is a one-way form of communication. However, if one extracts this text from its two-dimensional setting and places it online, that text has the native ability to become something more. The content gains context. One can embed links reaching out to Open Access data repositories for people- and place-data. Making this publication available online also facilitates linking in the opposite direction, making the author’s content discoverable by anyone in the world, provided the text is given a stable URI. Widgets are now available that enable readers to roll over a placename and retrieve a pop-up window with a map and data along with a clickable link. In time, I hope to see a similar widget crawl through bibliographies and citations in notes, allowing readers to reference cited material as they proceed through the book or article. How often have readers wished to check a reference or look up a place, but have instead put it off, not wanting to trek to the library or even run a Google search? Embedding these links and reading tools are a service to readers and are becoming increasingly easy to implement from an author’s/publisher’s perspective.
This “multi-dimensional” text takes what is good about the printed word, and adds practical improvements that help deliver more robust content more quickly to the reader:
Note-taking on the printed page is limited to the space in the margins or between the lines. Note-taking on a digital document allows for notes of massive length that can then be emailed/shared outside of that document. If you lose your book, you lose your notes. Digital editions allow you to save a “clean” copy as well as an annotated copy, and if you email/share your comments, losing your annotated copy is only an inconvenience, not a disaster.
What if we could go one step further, making the author’s primary text “four-dimensional.” In physics, three dimensions incorporate length, width, and depth. Add time to a three-dimensional thing, and it now has a fourth-dimension. All objects exist in space-time, and as the arrow of time moves us forward year by year, those three-dimensional objects change. While this observation will be more readily applied to imaging artifacts, we can apply the four-dimensional concept to an author’s text.
A published monograph is like a finished temple. It’s as good as the makers can produce at the time. As time moves along, things happen to the building. It can receive additions. It can be shored up. It might be demolished, lending its parts as spolia to other structures in future times. As archaeologists, we can also reduce the structure to its individual parts, seeing how the whole was completed, and also understanding how that building changed over time, from realized vision to revered monument, or derelict footprint.
It is a misconception that a published monograph or article is the “final publication” of archaeological material. Upon publication, that text (and its related content of photos, maps, tables, etc.) becomes the starting point for rigorous discussion and dialogue. In the past, some journals have published rebuttals to earlier articles in later issues, a kind of time-delayed chess match. By integrating online publication with mature social networking/commentary technology, those discussions can be opened to a global audience. Should a counter-argument be made successfully, it is also possible for the author to make a change to the main text, or to add new bibliography, and to update notes over time, keeping current with future scholarship. The content of the published piece must change over time, and opening that content up to scrutiny can help to either preserve and promote excellent scholarship, or to mend, repair, or demolish research.
Seeing text as four-dimensional also allows the readers to uncover the foundations of an archaeological publication. In the instances of preliminary excavation reports or “final” reports of a class of objects from a site, I would strongly urge authors to provide their readers with links to complete data sets. This data can be checked, and can be used as a reference by readers. Should errors be discovered in the math and logic of tables, these can be corrected right away. And should there be a difference of opinion between author and reader, the data can be consulted, and a dialogue started. With traditional publication, the reader is presented with the author’s interpretation of the data, and that interpretation might or might not be reliable and might include biases, either conscious or unconscious. Opening up the data, and opening up the dialogue can help an author’s argument become more objective.
Next Steps
Putting theory into practice requires time, care, and attention, and it is my hope to apply the theory as stated above into living and breathing archaeological publications produced by the ASCAS. To that end, I have finally found an author who will submit for peer review a born-digital, online-only “monograph” on pottery that will incorporate a data set, interactive maps and tables, linked data, and 3D reconstructions of pots. This publication is driving discussion in various committees as the ASCSA decides how peer review of online-only material will be conducted, how it will be published, and how it will be archived/preserved. I suspect this publication will be the first of many, creating a shareable template that can help standardize publications of this type without being to constricting, allowing for the messiness of archaeology.
I need to encourage the Athenian Agora, Corinth, or another affiliated ASCSA excavation to provide me with 3D printing specs to accompany traditional 2D plates and/or images for a testbed project. How much better and more useful would it be to allow readers to print out 3D copies of pots, bones, etc., featured in monographs and articles? Think of the study collections that could be created.
On a personal level, I do hope to partner up with Bill Caraher, Kostis Kourelis, and others to create a new kind of online publication that publishes archaeological work as-it-happens while at the same time writing about trends in archaeology, something that goes beyond a traditional blog.
I need to begin (finally!) to link from Hesperia articles and monographs to the data of others (including Pleiades), something I've promised but have yet to deliver.
I need to fundraise so that I can get the $1.3M required to endow Hesperia to make it completely Open Access for all time, following the new “Diamond” level of Open Access as described by Christian Fuchs and Marisol Sandoval in their September 2013 article published in tripleC.
Conclusion
As an archaeological publisher with feet set in print and digital publications, I have to continually evaluate my readership and their needs. Education of what's possible with publication remains a top priority for both my authors and my readers. What we do with linked data is quite valuable as we forge these links and create environments (real, virtual, and social) receptive to this kind of linking, but we must keep the general reader in mind, the scholar or enthusiast who has little knowledge of what goes on behind the scenes who should be able to find and use online data and its underlying relationships without a second thought of how it all came to be. We build for ease of discoverability and for ease of use, and that is very difficult to achieve. But we're getting there.
Archaeology is messy, and it deals with three-dimensional artifacts in four-dimensional space-time. Its publications should reflect that. At our current level of technology, it is possible to create archaeological publications in an open, online environment that incorporates text, 2- and 3-D imagery, interactive 2- and 3-D maps, interactive data sets, and omni-directional links to content and context managed by others. Our new publications must incorporate all of these elements to create a record and interpretation of what we have discovered, leaving that data and interpretation open to criticism, dialogue, and growth over time. Universities, archaeological field schools, and publishers need to make a concerted effort to educate archaeologists to the potential provided by new media and existing technology as it can serve to document work done. The editor’s role should be to apply standards and style, to fact-check, to clean up inconsistencies, to verify and standardize notes and bibliography, at which point it can be published, handed over to the crowd for the necessary, but until now missing step of post-publication peer review.
There are two major issues that all publishers of archaeology (and of scholarship generally) must address now: 1) how to publish archaeology online, moving away from a traditional, two-dimensional, print-informed model, toward a multi-dimensional, interactive one that accepts that archaeological data is messy and continues to grow and change over time, and 2) how to publish archaeology in an open fashion that makes content easily discoverable and immediately accessible, promoting linking from external sources while linking itself to other open online resources.
Author’s Note
This article articulates my thoughts on archaeological publication originally expressed in the two LAWDI (un)conferences in 2012 and 2013. Although I have been the Director of Publications for the American School of Classical Studies at Athens (ASCSA) since 2010, the views expressed here are my own. Some elements presented here require policy changes on the part of the ASCSA; this entails voting by its Publications and Executive Committees, and I continue to work with both to ensure a healthy, sustainable, and open future for our journal and books. Policy changes for the past three years in this regard have been both encouraging and progressive.
Works Cited
Fuchs, C. and M. Sandoval (2013). The Diamond Model of Open Access Publishing: Why Policy Makers, Scholars, Universities, Libraries, Labour Unions and the Publishing World Need to Take Non-Commercial, Non-Profit Open Access Serious. tripleC 13(2), 428-443. <http://triple-c.at/index.php/tripleC/article/view/502/0>
©2014 Andrew Reinhard. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.24 (2014)
Mining Citations, Linking Texts
Matteo Romanello
Canonical citations are the standard way of citing primary sources (i.e., the ancient texts) in Classics: the ability to read them, which requires knowing what numerous abbreviations stand for, is part of the early training of any classicist. Having an expert system to capture automatically these citations and their meaning is one of the aims of the project of which the research presented in this paper is part. The desire for such a system has existed for a considerable amount of time (Crane, Seales, and Terras 2009, 26) but has yet to be solved (Romanello, Boschetti, and Crane 2009; Romanello 2013).
Such a system has great potential both for scholars in Classics and for the study of Classics as a discipline: capturing the citations of ancient texts that are contained in journal articles, commentaries, monographs and other secondary sources, allows us, for example, to track over time how and how often texts were studied, essential pieces of information for a data-driven study of the discipline and its evolution.
Another possible use of the system is to display related bibliographic references within a reading environment for ancient texts. The examples that are used in this paper are taken from work that has been done to provide the GapVis interface of the Hellespont project1 with such a functionality (see Fig. 1). One of the goals of the project is to create an enhanced virtual reading environment for one specific section of Thucydides’ Histories, the so-called “Pentecontaetia” (Thuc. 1,89 to 1,118). The references that are displayed in the secondary literature view of the reading interface are mined automatically from JSTOR and are shown together with links to the full text of the journal article as well as to the cited passage in the Perseus digital library (Romanello and Thomas 2012).
Figure 1: Secondary literature view of the GapVis-based reading interface of the Hellespont project.
Mining Citations: Extraction and Disambiguation
Extracting citations requires performing two different tasks. First, the strings that constitute the citation are captured. Second, the referent of that citation is established—the specific section of text to which the citation refers. In Natural Language Processing (NLP) jargon these two steps are called respectively Named Entity Recognition (or extraction) and Named Entity Disambiguation.
My approach to citation extraction (see Fig. 2, no. 1 and 2) is essentially based on state-of-the-art NER techniques with the only difference being what it takes to adapt these techniques to the new domain (Romanello 2013). Instead of considering only the usual named entities (NEs)–such as names of people, places and organizations–I treat as NEs the different components of a citation in addition to any mention of ancient authors and works occurring in the context that surrounds the citation itself. For this purpose four distinct entities were identified: aauthor, awork, refauwork and refscope. In its current definition, a citation is a relation between any two entities, where one is always the indication of the citation’s scope (i.e. refscope) and the other can be any of the other entities (i.e. aauthor, awork and refauwork).
Figure 2: Diagram showing the various phases of mining canonical citations from texts.
Once captured, citations need to be disambiguated: this is done by assigning to each citation its corresponding CTS URN. What this means in practice is that, for instance, the citation “Hell. 3.3.1-4” of the example showed in Fig. 2 (no. 3) is mapped to its corresponding URN, urn:cts:greekLit:tlg0032.tlg001:3.3.1-3.3.4. Designed to become the equivalent of canonical citations in a digital environment, CTS URNs are a kind of identifiers that follows the Uniform Resource Name standard and was developed within the Multitext Homer project as part of the CITE architecture to make it possible to “identify and retrieve digital representations of texts” (Smith and Blackwell 2012)2.
A Knowledge Base of Canonical Texts
NER systems of this kind typically require and rely on a surrogate of domain knowledge, such as a gazetteer or a knowledge base, to support both the extraction and disambiguation of NEs. To support the disambiguation of canonical citations such a knowledge base needs to contain, for example, all possible abbreviations of the name of an author or the title of a work, possibly in multiple languages if working on multi-lingual corpora. Since the texts we are dealing with are canonical it is possible to use this knowledge base to store, in addition to abbreviations, detailed information about the citable structures of each text such as, for example, how many books are contained in Thucydides’ Histories, how many chapters are contained in book 1 etc. Being able to query this sort of information allows one to validate the automatically extracted citations, thus making it possible to identify, if not to recover, those citations that are just impossible. An example of this phenomenon is the string “Thuc. 5. 14. 1. 41.”: although it looks as a plausible citation, it is not a valid one as the work here referred to–Thucydides’ Histories–is made of three, not four, citable, hierarchical levels, (i.e. book/chapter/section). Such errors are very common when working with OCRed texts where the lack of structural markup causes, as in this case, the footnote number to be mistakenly interpreted as being part of the canonical citation “Thuc. 5. 14. 1”.
The content in the knowledge base is structured mostly using a combination of CIDOC-CRM and FRBRoo ontologies3: the Functional Requirements for Bibliographic Records (FRBR) model, in particular, is suitable for modelling information related to Classical (canonical) texts, as was showed by Babeu et al. (2007, ref), and has influenced substantially the design of the CTS protocol. In those few cases where these ontologies did not suffice to model the data we have extended some of the classes they provide in what we called the HUmanities CITation Ontology (HuCit)4.
@prefix ecrm: <http://erlangen-crm.org/current/> .
@prefix efrbroo: <http://erlangen-crm.org/efrbroo/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001> a efrbroo:F1_Work;
ecrm:P131_is_identified_by <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#cts_urn>;
efrbroo:P102_has_title <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#title>;
owl:sameAs <http://catalog.perseus.org/catalog/urn:cts:greekLit:tlg0003.tlg001> .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#creation_event> a efrbroo:F27_Work_Conception;
efrbroo:R16_initiated <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001> .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#cts_urn> a ecrm:E42_Identifier;
rdfs:label "urn:cts:greekLit:tlg0003.tlg001";
ecrm:P2_has_type <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#type_CTS_URN> .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#title> a efrbroo:E35_Title;
ecrm:P139_has_alternative_form <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#abbr1> .
rdfs:label "Der Peloponnesische Krieg"@ger,
"History of the Peloponnesian War"@eng,
"La Guerra del Peloponneso"@ita,
"l’Histoire de la guerre du Péloponnèse"@fre .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001#abbr1> a ecrm:E41_Appellation;
rdfs:label "Thuc.";
ecrm:P2_has_type <http://data.mr56k.info/type_abbreviation> .
Figure 3: The knowledge-base record for Thucydides’ Histories serialized as RDF/Turtle.
As shown in Fig. 3 our record is linked to a record in the Perseus Catalog; the CTS URN associated with the work as well as the abbreviations of its title are explicitly modelled by using respectively the CIDOC-CRM classes E42_Identifier and E41_Appellation.
Publishing Extracted Citations as Linked Open Data
Not only are canonical citations important because of their function, they are also interesting artifacts in themselves. They were designed, well before the advent of digital technologies, to refer to texts in a very precise and interoperable way: precise because texts are the fundamental object of philological research, therefore a scholarly discourse about texts needs an accurate way of referring to them; interoperable because although texts may exist in different editions and translations, scholars need to be able to refer to specific sections of them without having to worry about the many possible variations in pagination or layout each single edition may present.
If we accept that canonical citations are already a way of linking objects–i.e., the citing text and the cited text–extracting citations reconstructs and makes explicit links that already exist in the text. The act of transforming citations into hyperlinks, however, may lead to a misrepresentation of their nature and specifically of their being designed to be interoperable: a canonical citation should not be tied to the referenced passage in a specific edition, but should rather work as a resolvable pointer, that can be resolved to a given portion of text in any available edition or translation.
Let us now consider how extracted citations are stored and published online as Linked Open Data (Heath and Bizer 2011). By following an approach that was largely inspired by the Pelagios Project5, extracted canonical citations are represented as annotations as defined by the Open Annotation Data Model6 (see Fig. 4). A new annotation is created for each extracted citation: the string containing the citation becomes its label, whereas the citing and the cited texts become respectively its target and body–to use the OAC terminology–as expressed by the oac:hasTarget and oac:hasBody properties. The property oac:motivatedBy is used here to clarify the reason for creating such annotations: I chose oac:identifying as, in fact, extracting citations can be seen as the act of making explicit what is the object (i.e. text section) that is identified by a given citation.
<http://hellespont.org/annotations/jstor#16> a oac:Annotation;
rdfs:label "Thuc. 1. 101";
oac:motivatedBy oac:identifying;
oac:hasBody <http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001:1.101>;
oac:hasTarget <http://jstor.org/stable/10.2307/268729> .
Figure 4: An extracted citation represented by means of the OAC ontology and serialized as RDF/Turtle.
The RDF fragment that is returned when the body URI is resolved (see Fig. 5) shows how the citation is not linked directly to the digital text but points to an intermediate object called hucit:TextElement7. This abstract object identifies a citable element within the hierarchical structure of a text and is linked via the hucit:resolves_to property to digital representations of the cited passage, in this case the editions and translations available in the Perseus Digital Library and via the Classical Works Knowledge Base (CWKB) resolution service. It must be pointed out, however, that linking to these resources is not, strictly and technically speaking, LOD-compliant as these URIs do not resolve (yet) to an RDF representation of the resource identified by the URI. However, as it has emerged clearly during the LAWDI event at which this paper was presented, linking resources together is the first necessary step to LOD, that it is hoped will be followed by making the underlying technology compliant with the LOD principles.
@prefix ecrm: <http://erlangen-crm.org/current/> .
@prefix hucit: <http://purl.org/net/hucit#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001:1.101> a hucit:TextElement;
rdfs:label "book 1, chapter 101 of Thucydides' Histories"@en,
ecrm:P1_is_identified_by [ a ecrm:E42_Identifier;
rdfs:label "urn:cts:greekLit:tlg0003.tlg001:1.101";
ecrm:P2_has_type <http://data.mr56k.info/CTS_URN> ];
hucit:is_part_of <http://data.mr56k.info/urn:cts:greekLit:tlg0012.tlg001:1>;
hucit:precedes <http://data.mr56k.info/urn:cts:greekLit:tlg0012.tlg001:1.100>;
hucit:precedes <http://data.mr56k.info/urn:cts:greekLit:tlg0012.tlg001:1.102>;
hucit:resolves_to <http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-eng1:1.1>,
<http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-eng2:1.1>,
<http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.1>,
<http://cwkb.org/resolver?rft.au=Thucydides&rft.title=Historiae&rft.slevel1=1&rft.slevel2=101&rft_val_fmt=info:ofi/fmt:kev:mtx:canonical_cit&ctx_ver=Z39.88-2004> .
Figure 5: RDF/Turtle representation identified by the URI http://data.mr56k.info/urn:cts:greekLit:tlg0003.tlg001:1.101
Notes
1 The Hellespont Project: Integrating Arachne and Perseus, http://hellespont.dainst.org/.
2 To date one of the main adopters of this technology is the Perseus project that has built on top of it to provide several functionalities of its digital library and catalog (see Almas et al. this volume).
3 The Erlangen OWL implementations of both CIDOC-CRM and FRBRoo were used: they are available respectively at http://erlangen-crm.org/ and http://erlangen-crm.org/efrbroo.
4 The HuCit namespace is http://purl.org/net/hucit; the source code and some examples can be found in the code repository at https://bitbucket.org/56k/hucit/.
5 Pelagios: Enable Linked Ancient Geodata In Open Systems, http://pelagios-project.blogspot.com.
6 Open Annotation Data Model, http://www.openannotation.org/spec/core/.
7 For further details about the design of HuCit see Romanello and Pasin (2013).
Works Cited
Babeu, Alison, David Bamman, Gregory Crane, Robert Kummer, and Gabriel Weaver. 2007. “Named Entity Identification and Cyberinfrastructure.” In Research and Advanced Technology for Digital Libraries, ed. László Kovács, Norbert Fuhr, and Carlo Meghini, 259–270. Springer. http://dx.doi.org/10.1007/978-3-540-74851-9_22.
Crane, Gregory, Brent Seales, and Melissa Terras. 2009. “Cyberinfrastructure for Classical Philology.” Digital Humanities Quarterly 3. http://www.digitalhumanities.org/dhq/vol/3/1/000023/000023.html.
Heath, Tom, and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers.
Romanello, Matteo. 2013. “Creating an Annotated Corpus for Extracting Canonical Citations from Classics-Related Texts by Using Active Annotation.” In Computational Linguistics and Intelligent Text Processing. 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I, ed. Alexander Gelbukh, 1:60–76. Springer Berlin Heidelberg. doi:10.1007/978-3-642-37247-6.
Romanello, Matteo, Federico Boschetti, and Gregory Crane. 2009. “Citations in the digital library of classics: extracting canonical references by using conditional random fields.” In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, 80–87. Morristown, NJ, USA: Association for Computational Linguistics.
Romanello, Matteo, and Michele Pasin. 2013. “Citations and Annotations in Classics : Old Problems and New Perspectives.” In DH-Case 2013. ACM. http://dx.doi.org/10.1145/2517978.2517981.
Romanello, Matteo, and Agnes Thomas. 2012. “The World of Thucydides: From Texts to Artefacts and Back.” In Revive the Past. Proceeding of the 39th Conference on Computer Applications and Quantitative Methods in Archaeology. Beijing, 12-16 April 2011, ed. Mingquan Zhou, Iza Romanowska, Wu Zhongke, Xu Pengfei, and Philip Verhagen, 276–284. Amsterdam University Press. http://dare.uva.nl/document/358465.
Smith, Neel, and Christopher Blackwell. 2012. “Homer Multitext Project: documentation. An overview of the CTS URN notation.” http://www.homermultitext.org/hmt-doc/cite/cts-urn-overview.html.
©2014 Matteo Romanello. Published under the Creative Commons Attribution 3.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.25 (2014)
Linked Data and Ancient Wisdom
Charlotte Roueché, Keith Lawrence, and K. Faith Lawrence
The SAWS Project, Sharing Ancient Wisdoms: Exploring the Tradition of Greek and Arabic Wisdom Literatures, was a joint project funded by HERA (Humanities in the European Research Area) under a call to explore Cultural Dynamics. The aim was to examine Greek and Arabic collections of 'sayings', and relate them both to their sources (largely in Greek and Arabic literature), to one another, and to texts which made use of them with these links making up the core of the project.
The gnomologia, or collections of sayings, tended to be authored by the scribe of the manuscript in which they survive. One reason for the linked data approach was that the authoring scribe was free to modify a given collection as he thought fit resulting in similar, but distinct, variations on the collection. As these works were translated and re-translated between languages they provide an interesting entry point into both the philosophical traditions and the cultural interactions that they represent.
Our unit of analysis, therefore, was a 'saying', as identified and presented by the scribe of a particular manuscript. Our first challenge, therefore, was to label every such item. Over the last 18 months we have worked very closely with the CTS project, to generate unique ids; this has been slow work, but, we believe, beneficial both for us and for CTS. See http://www.ancientwisdoms.ac.uk/method/using-cts/.
In linking to sources, we were constrained by what is available at present. We have been able to make extensive use of texts in Perseus, but we identified many other items for which there is not, at present, a text to which we could link. We had hoped to be able to link to items in the TLG canon, but this is not yet possible.
Figure 1. Folioscope displaying a document with links to other SAWS texts and external documents.
We were, however, able to enrich our texts with other kinds of linked data. For authors, we were able to include links to the new Perseus catalog, http://catalog.perseus.org/, or, failing that, to VIAF, http://viaf.org/. For places we were able to use Pleiades, and even add some new locations; this meant that we were able to join Pelagios. For people named as actors, rather than authors, who occur in our 11th century 'destination' text, we were able to use links to the Prosopography of the Byzantine World (http://pbw.kcl.ac.uk) and to the Paregorios list of Roman emperors (http://www.paregorios.org/resources/roman-emperors/). For the latter, we used the long URI for the document: http://www.paregorios.org/resources/roman-emperors/about-alexios-i-komnenos rather than the DBpedia URI http://dbpedia.org/page/Alexios_I_Komnenos, since for the average reader a DBpedia page may be rather too unfriendly. We also tried to enrich the bibliography http://www.ancientwisdoms.ac.uk/library/bibliography with as many links as possible to online versions of the materials being cited. We recorded, but did not have time to include, VIAF references for many of the authors cited.
One challenge for us was to create a resource which makes the fullest possible use of links in machine-readable form, but also offers an environment which is welcoming for human readers. In due course it is to be hoped that CTS references can easily be resolved back into conventional representations: we did not have time to develop this. We explored this aspect in the presentation of the commentary on our 11th century text.
Figure 2. Same document showing an Ancient World Linked Data popup. See http://isawnyu.github.io/awld-js/.
It will be very interesting to see how readers respond. One interesting outcome is that working in this way, when it is possible to go directly to the text to which one is making reference, can show up incorrect or mistaken references: the linking of data should, it can be hoped, lead to a slow steady improvement in accuracy.
©2014 Charlotte Roueché, Keith Lawrence, and K. Faith Lawrence. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.26 (2014)
Linked Open Data for the Uninitiated
Rebecca M. Seifried
Concepts like “linked” and “open” data are old hat for a handful of specialists working with archaeological data, but for the rest of us – those scholars who have never built a web page or coined a URI – the world of linked open data seems a shadowy and impenetrable veil. The data I encounter on a weekly basis violate so many best practice guidelines that it is frightening to even consider publishing the data online, let alone connecting it to the semantic web. Nevertheless, the amazing things that can be done with linked open data, the connections and meaningful relationships, the benefits to the public and scholars alike, are reason enough to encourage us “uninitiated” archaeologists to reconsider how we create, manage, and publish our data.
This short essay represents the personal musings of a non-technical web user, who sees the benefits of publishing archaeological data in a linked, open way, but who cannot (yet) speak the language of HTML or SPARQL. I have worked with born-digital data existing in different levels of technical sophistication, from projects as large and well funded as the Körös Regional Archaeological Project in Hungary, to my own ongoing dissertation research project in Greece, the Byzantine and Ottoman Settlement Study. While many of the contributors to this online volume demonstrate the wealth and variety of linked open data, my focus is turned toward the initial difficulties that will inevitably face the uninitiated, who see the potential for disseminating their data to a worldwide community but simply don’t know how to begin.
Although data recording has almost entirely shifted from a manual process to a digital one, many of the uninitiated fail to live up to best practice guidelines for born-digital data (e.g., [ADS and DA 2011]). It is imperative that even if we have no intention of coding RDF triples ourselves (see [RDF Working Group 2013]), we prepare for data to be archived in an online, open-access format. The issue isn’t just about using the proper file format; it is also about determining which specific types of data will be relevant to a broader audience. A file may contain thousands of entries, but oftentimes metadata is recorded for the entire file, rather than for individual records. Wouldn’t it be nice to know when each individual record is modified, and by whom? Aside from thinking ahead toward potential online publication, the uninitiated should try their best to understand the basic premises of linked open data. While we may not be skilled enough to created linked open data on our own, we can structure our datasets so that later on, specialists can do so in a relatively easy and painless process.
Present Options for Disseminating Archaeological Datasets
For case studies, I will mention a few specific archaeological projects and how they are currently disseminating their data. Information about the Körös Regional Archaeological Project is disseminated primarily on a subsidiary webpage of The Field Museum ([The Field Museum 2013]), although a more significant site is planned for the future. The current site includes blog entries, photos, and videos, but almost no original data. The project hasn’t even earned itself one star, according to the Berners-Lee ([Berners-Lee 2006]) framework, although there is potential to use the server to publish the FileMaker Pro databases (to earn 2 stars) or non-proprietary versions of the data (to earn 3 stars). Online data depositories are a second option for archaeologists who do not have access to institution servers. The Shala Valley Project provides an excellent example of online data dissemination, using the Archaeology Data Service ([Galaty et al. 2009]), although the Digital Archaeological Record ([Digital Antiquity 2013]) is another depository for projects to consider. Shala Valley data are available in the form of images, non-proprietary database files, scanned field notes and drawings, and an interactive GIS. A third option is to secure a non-institutional, non-depository, but reliable website of one’s own, such as a domain powered through WordPress. Data hosting is limited on these sites, but interactive maps can be powered through third-party software like MapsMarker and TileMill. A great example of this type of site is the Documenting Cappadocia project ([McMichael 2013]).
The type of online data hosting offered in these examples is a good way for the uninitiated to disseminate archaeological data; by and large, someone else does the work of website-building, allowing archaeologists to make their data somewhat accessible without getting lost in an utterly foreign world of languages they don’t know. The drawback of this approach is that the data can remain stagnant and isolated, unconnected from the myriad sources on the web that could highlight relevant and meaningful relationships – if only they were linked.
As I structure the databases for my own dissertation research and think ahead to the ideal way to publish them online as linked open data, I see two alternatives for achieving the coveted “5 stars” of Berners-Lee’s Linked Open Data guidelines. The first is to use a data depository, like OpenContext.org ([OpenContext 2013]), that issues individual URIs for each item within a dataset (see [Kansa 2013], this volume). This option would give me peace of mind to know that, so long as I follow best practice guidelines for data acquisition and compilation, a technical expert will be able to help me later on to get my data online in a meaningful way. The second option is to create my own website using RDF triples and cool URIs ([W3C 2008]). At the 2013 LAWDI gathering, a number of individuals – many of whom were never formally trained in web languages – showed off their successes after attempting this task on their own. As one of the many archaeologists foreign to HTML coding and the like, the experience tore down the veil that made linked open data seem obscure and impenetrable. I came away realizing that linked open data is not only possible, but that it is, in fact, the future of archaeological research and data dissemination.
Works Cited
[ADS and DA 2011] Archaeology Data Service and Digital Antiquity. Guides to Good Practice. 2011. Available at: http://guides.archaeologydataservice.ac.uk/
[Berners-Lee 2006] Berners-Lee, Tim. “Linked Data.” Design Issues. 27 July 2006. Available at: http://www.w3.org/DesignIssues/LinkedData.html
[Digital Antiquity 2013] Digital Antiquity. “The Digital Archaeological Record.” tDAR.org. Accessed 22 October 2013. Available at: http://www.tdar.org/
[The Field Museum 2013] The Field Museum. “Neolithic Archaeology: Körös Region, Hungary.” Expeditions at The Field Museum. Accessed 22 October 2013. Available at: http://expeditions.fieldmuseum.org/neolithic-archaeology/
[Galaty et al. 2009] Galaty, Michael L., Ols Lafe, Zamir Tafilica, Charles Watkinson, Wayne E. Lee, Mentor Mustafa, Robert Schon, and Antonia Young. The Shala Valley Project [data-set]. York: Archaeology Data Service [distributor] (doi:10.5284/1000103). Available at: http://archaeologydataservice.ac.uk/archives/view/svp_mellon_2009/
[Kansa 2013] Kansa, Eric C. “Open Context and Linked Data.” Forthcoming: LAWDI 2013.
[McMichael 2013] McMichael, A. L. “Cappadocia Landscape.” Documenting Cappadocia. Accessed 22 October 2013. Available at: http://www.nml.cuny.edu/documentingcappadocia/view/
[OpenContext 2013] OpenContext. “OpenContext’s Technologies.” OpenContext.org. Accessed 22 October 2013. Available at: http://opencontext.org/about/technology
[RDF Working Group 2013] RDF Working Group. “Resource Description Framework (RDF).” W3C Semantic Web. Published 10 February 2004. Modified 22 March 2013. Available at: http://www.w3.org/RDF/
[W3C 2008] W3C. “Cool URIs for the Semantic Web.” W3C Interest Group Note. 3 December 2008. Available at: http://www.w3.org/TR/cooluris/
©2014 Rebecca M. Seifried. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.27 (2014)
Pelagios
Rainer Simon, Elton Barker, Pau de Soto, and Leif Isaksen
Pelagios is a community-driven initiative with the goal to facilitate better linking between online resources documenting the past, based on the places that they refer to. It is open to online content of any type and format: it currently connects data as diverse as text corpora, image archives, archaeological databases, museum collections, publication series, and online resources produced by specific research projects. By addressing the problems of discovery and reuse, Pelagios aims to help digital humanists in making their data more discoverable, and to empower real-world users – scholars as well as the general public – to find information about particular ancient places and visualize it in meaningful ways. At the time of writing, the Pelagios network incorporates more than 830,000 place references in datasets from 27 partners. Users can start exploring it by searching for a place in the map interface1, or by browsing the list of currently available datasets2.
The key to connectivity in Pelagios is the use of a common vocabulary when referring to places, combined with a set of lightweight conventions on how to publish these place references as Linked Data. The common vocabulary is formed by the Pleiades Gazetteer of the Ancient World3, which provides unique URI identifiers for places in the Greco-Roman world. For example, https://pleiades.stoa.org/places/59672 is a unique URI for Alexandria Eschate in modern Tajikistan. It can serve as the basis for aggregating references to that site and for ensuring that it is not confused with the more famous Alexandria in Egypt. The publishing conventions established by the Pelagios community are based on an annotation paradigm: no matter what type or format a data resource is in, as long as it is available on the Web, under a stable URI, one can, from a conceptual point of view, annotate (or “tag”) it with a reference to one or more Pleiades identifiers. The collected set of annotations is then published as RDF according to the Open Annotation ontology4, and made available under an open license, chosen individually by each partner. Technically, the data is published as a simple dump file, hosted on each partner’s own Website. There is no need to set up specific infrastructure such as a triple store or a dedicated API.
Previously funded by JISC, the UK’s digital education and research charity, Pelagios has entered its third stage of development (Pelagios 3), with generous support from the Andrew W. Mellon Foundation. While the focus of the project has so far been on classical antiquity, this new phase will significantly expand the scope of Pelagios in both space and time by annotating Early Geospatial Documents – documents that use written or visual representation to describe geographic space prior to the European discovery of the Americas in 1492. They include ancient and medieval geographic descriptions (geographiae, chorographiae and itineraries), world maps (mappaemundi) and sea charts, and are products of Greek, Roman, Christian, Islamic and Chinese traditions.
What makes Pelagios 3 different is not only the scope of the documents to be brought into the network but also the fact that we are now annotating documents ourselves. This means that a significant part of Pelagios 3 is being devoted to developing new tools and infrastructure, which will: create a semi-automatic annotation workflow for extracting place name data from digitized texts and maps; produce minimal requirements for enabling the linking and interoperability of multiple, domain-specific gazetteers; and provide useful analytics and visualizations for enabling the search and discovery of data within the growing Pelagios network.
Phase 3 of Pelagios runs from September 2013 until August 2015.
Notes
1 "http://pelagios.github.io/pelagios-heatmap/.
2 http://pelagios.org/api.
3 http://pleiades.stoa.org/.
4 http://www.openannotation.org/spec/core/.
©2014 Rainer Simon, Elton Barker, Pau de Soto, and Leif Isaksen. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.28 (2014)
Linked data and the future of cuneiform research at the British Museum
Jon Taylor
ResearchSpace1 opens significant new possibilities for a wide range of projects based on the British Museum’s collections. As part of its work, ResearchSpace has converted the Museum’s collections catalogue (strengthened by descriptions, associations and taxonomies developed over three decades of digitisation) into RDF triples, mapped to the powerful CIDOC-CRM ontology. This dataset has been made available via a SPARQL Endpoint on the Museum website: http://collection.britishmuseum.org/. Based on this starting point, several projects with a cuneiform focus are exploring the potential of Semantic Web/Linked Data. Cuneiform is a small and largely isolated field, with much both to gain from, and to offer to, the wider community. The LAWDI 2013 event provided a wealth of useful theoretical and practical guidance to help me get to grips with semantic technologies and their potential.
The Ashurbanipal Library Project
The Ashurbanipal Library Project2 is recreating in digital form one of the most important libraries ever assembled: that of Ashurbanipal, King of Assyria during 7th century BC. Over 30,000 inscribed tablets and fragments are currently known. Most are in the British Museum collection, but others are found in collections around the world, and our partners at the University of Mosul are likely to find more during their excavations at the site of Nineveh. Texts from the Library have formed the central pillar of cuneiform research, and stimulated wider public interest, since 1850’s. The mass of material and of research carried out on it, and the many audiences interested in the Library, demand a detailed and structured digital presentation. A related initiative will combine research data with guide books and archival records to explore how the Library has been presented and interpreted in the Museum over the last 160 years.
There is clear potential to contextualise the Library in many ways: from relating place names to Open Context’s presentation of the Tübinger Atlas des Vorderen Orients (see Kansa, this volume) or personal names to prosopographical databases such as the Berkeley Prosopography Services (see Pearce and Schmitz, this volume), to thematic relations with cross cultural study of magic, religion, literature (perhaps building on research presented by Nurmikko-Fuller, this volume) and so on. A controlled bibliography would also be very useful (see Acheson, this volume).
Ur of the Chaldees: A Virtual Vision of Woolley's Excavations
The new Ur of the Chaldees Project is bringing together the complete set of all finds, inscriptions, images and excavation records of the first archaeological mission of the modern state of Iraq. These materials are divided between the Iraq Museum, British Museum and Penn Museum. The project is described in more detail by Hafford (this volume). Penn Museum has now also made their collections catalogue available in semantic form (see Williams, this volume).
Materialities of Assyrian Knowledge Production: Object Biographies of Inscribed Artefacts from Nimrud for Museums and Mobiles
The Nimrud project3 promotes awareness among specialists and non-specialists of how the past is reconstructed and understood through objects. It takes as its case study the inscribed artefacts from Nimrud, tracing them from their manufacture and use in antiquity to their current locations in museums, and their virtual representations on the web. The technical focus is on the development of Linked Open Data, especially for handheld devices such as mobile phones and tablet computers.
A complete catalogue of the cuneiform collections of the British Museum
The Museum’s online catalogue 4 is a powerful resource. For cuneiform specialists, however, it has its limitations, such as the usual issues of “noise” from large volumes of unrelated material, problems sorting and displaying results effectively, and the inability to download results. A personal project of the author seeks to present a dedicated catalogue of every object inscribed with cuneiform in the British Museum (around 130,000 registered objects). The interface will allow manipulation of groups of records, rather than just finding individual records. It is hoped that as more cuneiform-based projects implement semantic technologies, the catalogue will be able to harmonise Museum data with those from the growing family of projects based around the world.
Notes
1 http://www.researchspace.org/. Funded by the Andrew W. Mellon Foundation; principal PI Dominic Oldman.
2 This project is a long term, international collaboration based at the British Museum.
3 http://oracc.museum.upenn.edu/nimrud. The project is a collaboration between University of Cambridge, the British Museum and Penn Museum.
4 http://www.britishmuseum.org/research/collection_online/search.aspx.
©2014 Jon Taylor. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.29 (2014)
Integrating Historical-Geographic Web-resources
Tsoni Tsonev
In recent years archaeological web-sites with various data formats, videos, pictures, and maps increase significantly not only in number but in diversity of themes, geographical regions and time span. On the other hand demands for acquiring archaeological information from various web resources also grow. In this situation a question naturally arises as to how to efficiently search this fast growing domain available on the Web. Not that long ago it was enough to search through Google in order to obtain good and satisfactory results. Now this option is still in use but it is getting increasingly difficult to find appropriate site, museum exhibit, and text at the cost of spending only few minutes. As the people are getting busier they cannot find much time sitting at the computer.
This situation of fast growing in a hazardous way number of web resources (not only archaeological but other that relate to this or similar domains of knowledge) requires building adequate system for sharing data and linking representations of various kinds, texts, literature, and citations. At first glance such a task for integrating different resources seems difficult. Yet LAWDI’s approach has found an elegant solution that reduces much of the effort and, at the same time, opened space for expansion in other domains of knowledge, forms of virtual representation and open access for wide audiences.
The basic building blocks of this integrating system are the ‘Virtual International Authority File’ (VIAF, http://viaf.org/ and ‘FOAF Vocabulary’ (Friend of a Friend), http://xmlns.com/foaf/spec/ - basic characteristics and activities of persons and relations on the web. Why are persons taking primary role in virtual communication? The answer is simple: all histories, objects and texts relate to one or several related persons (authors). This approach is advantageous in that it allows to transverse disciplinary boundaries and relates various resources. In such a way the user directly accesses sufficiently defined complete datasets that may be exhaustive (contain all the necessary data) in their respective domain.
Most often, however, users do not need to access all the data in a dataset or all the layers in a GIS or similar software presentation. For example, in archaeology most of its data have intuitive meaning when they are located on a geographical map. More than that, archaeological knowledge can be used to ascribe differential values and social importance to various monuments, sites, artefacts. Thus the geographic map turns into space laden with differentially distributed historical-geographic values that constitute the necessary requirement for successful execution of spatial analyses of archaeological data, or their management. They form locally important features that lack universal explanatory values which means that moving to another region the same spatial analytical methods will produce different results or the nature of the data in the new region will require different methods for carrying out archaeologically meaningful spatial analyses. For example, the transformed by simple Kriging method space of distribution of megaliths and rock-cut tombs in two adjacent geographic regions in South Bulgaria shows that their distribution has unique values which cannot be achieved by applying the same method to other regions with megaliths – megaliths are known from the greater part of Eurasia. Thus this universal type of monuments creates local meanings in various similar geographic regions which is a good example to show that human and social evolution lack spatio-temporal continuity.
It is these theoretical premises that pose the requirement for selective access and integrating only fragments taken from otherwise complete datasets published on the Web. This goal requires changes in the organization of archaeological/historical data. They have to respond to a core functionality needed for establishing cross-domain (e.g. archaeology – anthropology) relationships. On the second place come descriptive languages of archaeology specific analyses: lithics, pottery, raw materials, sedimentology, etc. The major problem lies in the ways of organizing the already existing attribute data so that they to be available for external access to the interfaces of remotely enabled function modules designed for archaeologically meaningful spatial and conceptual distribution or arrangement. In this approach the unique ability of GIS or similar open source systems to model complex spatial relationships such as networks offers valuable advantages for carrying out further analyses and research.
At the data level the integration of different data coming from various resources need to be organized as graphic representations (area feature types in GIS). Thus the synchronization of different data (data transformed into graphically represented objects) will go through the exact correspondence between each of these data-objects and respective feature on the GIS database. For this purpose the linking of GIS features to external data-objects will be accomplished through foreign-key mapping which associates the unique ID for each object-data with the corresponding GIS feature. These relationships are necessary requirements for subsequent data access and transaction processes. For example, an archaeological project may work with ancient buildings (monuments) connected by a system of roads that may correspond exactly to GIS database features.
The general aim of this integration strategy is somewhat simpler than the already developed business models because it is not necessary to make available most of the external functionality of the corresponding information system. In business models the availability of functionality is even more important than the access to the data. The appropriate and specific for archaeological (wider humanitarian) studies aim is the possibility to organize data into geo-referenced (address specific sources of information) web-resources. For example, the typical task then would be users to be able to retrieve a list of all ‘connection object features' within a geographic area and work simultaneously with another resource to get a list of all bibliographic references related to the studies of these monuments. The management of such tasks would require different geo-referenced data types and services to be maintained in a services repository. Thus the required set of web services will be confined to a services repository for creating and storing limited number of process definitions, metadata, and services registry for publishing, classifying, and discovering services.
The above presented scenario only describes the general approach to integrating separate information systems and data. It is based on GIS but can equally well be developed with open source software and other systems. Through this presentation I want to underline the fact that linking raw data is only the first step in the process of advancing humanitarian studies of the past. The re-organization of the archaeological/historical data into geo-referenced web resources turns the raw data into technically (GIS or other) and conceptually defined information systems. In my view it is getting into these data that poses the major challenge for future development and integration of these resources and of integrating the information systems that support them.
©2014 Tsoni Tsonev. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
ISAW Papers 7.30 (2014)
Moving from cross-collection integration to explorations of Linked Data practices in the library of antiquity at the Royal Museums of Art and history, Brussels
Ellen Van Keer
Introduction
The Royal Museums of Art and History rank among the largest cultural heritage research institutes in Belgium. Thousands of artifacts and historical objects from around the globe, dating from prehistoric to modern times, are on display in the galleries or kept in the storerooms.1 In addition, the library keeps thousands of journal articles, books and other scholarly publications regarding the museum's collections. In the current infrastructure, the museum and library databases are two separate systems. However, the materials they contain overlap not only on thematic level (e.g. "ancient Egypt") but also on entity level (e.g. entries on objects in exhibition catalogs). As part of the project "Bridging Knowledge Collections" we will create a cross-domain integration.2
As a first objective, we will make the two datasets cross searchable in a single online user interface, a major challenge because the museum and library sectors traditionally use totally different sets of standards (Prescott & Erway 2011). As a second objective, following the demands of the curators, we are researching a workflow that will link objects and documents on record level and support the input of object bibliography in the museum database. To avoid the manual creation of duplicate datasets, we will reuse existing bibliographic information from the library catalog in the museum database and link back to the full references by adding a system identifier. We will not use the library's internal database primary keys as identifiers, but instead, we will be using the permalinks produced on the library OPAC, because they are open to everyone and allow a direct hyperlinked access from within the museum database to the on-line library application and its range of functionalities and user-services.
Furthermore, we are considering a Linked Data implementation. As the museum system also generates permalinks for our objects, machine actionable links to identify both citing publications and cited artifacts are already in place. Moreover, we are not (yet) producing our own RDF, but we can rely on our partnership in the larger-scaled Europeana community for initiatives in this direction. Last year, Europeana published its API and released 20 million objects from its providers as RDF dump under open licenses (CC0), including our entire collection of Egyptian objects.3 What actually happened is that we exported our SPECTRUM compliant museum data as LIDO XML and mapped this to ESE, the earliest Europeana datamodel, which was Dublin Core based.4 ESE is currently being replaced by the RDF based EDM, and Europeana has also transformed the content ingested in the older format into this new model now.5 As a result of this procedure the resulting RDF remains rather crude and lacks contextualization links to other sources. However, in newer Europeana projects, we are directly transforming our LIDO export to EDM and a semantic layer is being implemented in the ingestion process.6
Important for our present purpose, is that the core semantic layer of EDM includes the “related works” element:
http://purl.org/dc/terms/isReferencedBy
This is an adequate predicate for linking cited resources from the museum database to citing resources in the library system - and thus produce RDF triples. At least, that’s the general idea. There are of course practical obstacles.
Museum Linked Data
On the museum side, we will have to make additions to our LIDO XML export, as this remains currently our basis for all further mappings, also to EDM. More particularly, we want to add a new relatedWorksWrap to the LIDO,7 and map this element to <IsReferencedBy> in the ESE/EDM transformation. Unfortunately, we depend on (resources for) the museum system vendor making any changes to our export. Also in this regard, linking to bibliographic records with permalinks is more efficient than describing publications at the lower levels of author, title, year, etc. A new LIDO event type for "Publication" is a complex element and requires more modifications. However, in the Europeana context, tools are being developed for giving content providers more control over their application data export in the future.8
As a side note here, but essential for the adoption of Linked Data, we would also want to improve the “permalinks” for the objects the museum system produces, e.g.
http://carmentis.kmkg-mrah.be/eMuseumPlus?service=ExternalInterface&module=collection&objectId=84853&viewType=detailView
The permalinks should rather be application and vendor independent HTTP URI's that are composed of the institutional namespace and object ID's (Heath & Bizer 2011), e.g.
http://kmkg-mrah.be/collection/id/84853
Likewise, it is relevant that Europeana has established its own identifier for this object:
http://www.europeana.eu/portal/record/08566/8628ED4D8C180516185A720890C82B33B0145767.html
Of course, changing “permalinks” and multiple identities are very delicate matters with implications far beyond the limited framework of this library-based project.
Library Linked Data
On the library side, identifiers pose the additional difficulty of having to decide which one(s) to use, both locally and globally. As we are part of a large library network and the “permalinks” produced on our library OPAC are actually queries into the shared database, they will change for every library "view”. Therefore, a publication will have a different permalink when accessed depending on whether access is through the general catalog of the network or through our local library “view”, e.g.
http://opac.libis.be:80/F/?func=direct&doc_number=007483927&local_base=OPAC01
http://opac.libis.be:80/F/?func=direct&doc_number=007483927&local_base=KMKG
Still, we can easily extract a better URI automatically by stripping after the last ampersand which gives us
http://opac.libis.be:80/F/?func=direct&doc_number=007483927
But additionally, the newly implemented library discovery tool LIMO produces again different sets of "permalinks" for the same publication, which we cannot refine manually, e.g.
https://services.libis.be/query?query=sys:LBS01007483927&view=LIBISnet&institution=LIBISNET&builder=primo_opensearch&host=limo.libis.be
https://services.libis.be/query?query=sys:LBS01007483927&view=KMKG&institution=KMKG&builder=primo_opensearch&host=limo.libis.be
However, this last link will lead our users directly to our local LIMO “view”, with customized new features such as user-recommendations and access to locally licensed e-content.9 For this practical reason, we will deploy it as our primarily linking source. Nevertheless, we would also want users in the rest of the network and, ideally, the entire world to be able to discover objects from our collections through citing publications they have access to. LIMO should be able to achieve this functionality through server-side enrichment of related records. Moreover, this new tool is based on Ex-Libris' Primo service and the company is working on producing persistent LOD-friendly URI's that will return RDF for all Primo PNX records (Koster & Harper 2013).
The global context also poses direct challenges to us. To start with, co-reference, the problem that multiple identifiers point to the same entity, is inherent in the global library context. While museum pieces are usually unique, libraries all over the world can hold (multiple) copies of the same publication. As a result, it will get numerous URI's through different library systems and aggregators, e.g.
http://www.worldcat.org/oclc/690395786
http://trove.nla.gov.au/version/50480843
http://www.theeuropeanlibrary.org/tel4/record/1000123701574
http://zenon.dainst.org/#book/000850955
Moreover, in producing a scholarly bibliography of our museum objects, linking to a specific edition of a publication in another library is just not an option. Library systems serve traditionally primarily as tools for locating physical copies (FRBR "items") of publications, but object citations are content disclosure and operate on a more general level. However, larger-scaled projects are increasingly trying to serve more globally scoped and content oriented (FRBR "manifestation/expression") identifiers, which suit our purpose better (Gatenby e.a. 2012).
https://www.worldcat.org/title/elkab-and-beyond-studies-in-honour-of-luc limme/oclc/476143033/editions
And in the end, the development and implementation of RDA is intended as a remedy, since it (ideally) enables referencing entities on all four levels of FRBR group 1.10 Digital libraries as well have this advantage of being independent of the physical item, but they proliferate the identity crisis further (Hull e.a. 2008). Through publisher DOI's, commercial services such as Jstor, Open Access repositories such as OpenDOAR, and bibliographic or subject-related databases such as Zotero, APh, OEB, papyri.info... many additional sets and types of permalinks and URI's (often closed but nevertheless useful for researchers) can be assigned to the same publication or some part of it, e.g.
http://www.peeters-leuven.be/boekoverz.asp?nr=8662
http://papyri.info/biblio/20718?q=elkab+and+beyond
http://orbi.ulg.ac.be//handle/2268/23962
It is clearly impossible to trace, add and check all co-references of citing publications in all these systems manually. Implementing web-based systems and semantic technologies should allow automating this procedure in the future and addressing the issue on a larger scale.
Semantic enrichment and alignment of our datasets with external sources at a lower level of description can also be framed in the broader library and museum communities. In the Europeana context we are for instance transforming our thesauri to SKOS and aligning them with other reference terminologies in the Galleries, Libraries, Archives, and Museum sector (GLAM), such as the Getty Thesauri.11 Moreover, new tools and workflows are being developed to support Linked Data production and enrichment locally, before mapping to EDM (e.g. de Boer e.a. 2012). Also, direct mappings of LIDO aggregations as Linked Data sets are being investigated, to achieve rich and interoperable resource descriptions (Tsalapati e.a. 2012). Ideally, implementing a fully-fledged top-level ontology such as CIDOC-CRM would allow us to describe, relate and align our heterogeneous, distributed local datasets at the lowest and fullest level of detail, resolving the issue of cross-collection integration at the record level we are (still) dealing with now. However, it would not immediately solve all problems. URI disambiguation is a more general challenge in the consumption of Linked Data (Jaffri e.a. 2008) and the ubiquitous use of <owl:SameAs> has even invoked an identity crisis at the lowest description level (Halpin e.a. 2010). The adoption of Linked Data involves embracing of the global knowledge space in all its complexity.
Notes
1 http://www.kmkg-mrah.be.
2 http://www.belspo.be/belspo/fedra/proj.asp?l=en&COD=AG/LL/167. The project is coordinated by Wouter Claes, chief librarian of the RMAH. It is partnered by LIBIS, http://www.libis.be, the central IT service for libraries, museums and archives of KULeuven.
3 http://pro.europeana.eu/datasets.
4 http://pro.europeana.eu/ese-documentation.
5 http://pro.europeana.eu/edm-documentation.
6 http://www.europeanafashion.eu and http://www.partage-plus.eu.
7 http://www.lido-schema.org/schema/v1.0/lido-v1.0-schema-listing.html
8 http://www.europeana-inside.eu and http://dm2e.eu.
9 If at least the supplier agrees to have its data index in the system, which is not a natural matter of course, as was illustrated by the a recent debate between Ebsco and Ex-Libris (Pohl 2013).
10 http://www.loc.gov/aba/rda/
11 http://www.athenaplus.eu. We are actually the leader of Work Package 4 on “Terminologies and Semantic enrichment”.
Works Cited
De Boer e.a. 2012: Boer, V., Wielemaker, J., Gent, J., Hildebrand, M., Isaac, A., Ossenbruggen, J., & Schreiber, G. "Supporting Linked Data Production for Cultural Heritage Institutes: The Amsterdam Museum Case Study". In: E. Simperl, P. Cimiano, A. Polleres, O. Corcho, & V. Presutti (Eds.), The Semantic Web: Research and Applications, 7295 (2012), p. 733–747, doi:10.1007/978-3-642-30284-8_56.
Gatenby e.a. 2012: Gatenby, J., Greene, R. O., Oskins, M. W., Thornburg, G. "GLIMIR: Manifestation and Content Clustering within WorldCat". Code{4}lib Journal 17 (2012), http://journal.code4lib.org/articles/6812.
Halpin e.a. 2010: Halpin, H., Herman, I., Hayes, P. J., McGuiness, D.L., Thompson, H. S. "When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web". In: The Semantic Web – ISWC 2010. Lecture Notes in Computer Science 6496 (2010), p. 305-320, doi: 10.1007/978-3-642-17746-0_20, http://iswc2010.semanticweb.org/pdf/261.pdf.
Heath & Bizer 2011: Heath, T. & Bizer, Ch. Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool: 2011, http://linkeddatabook.com/editions/1.0/
Hull e.a. 2008: Hull, D., Pettifer, S. R., & Kell, D. B. "Defrosting the digital library: bibliographic tools for the next generation web". PloS computational biology 4/10 (2008), doi:10.1371/journal.pcbi.1000204
Jaffri e.a. 2008: Jaffri, A., Glaser, H. Millard, I. "URI disambiguation in the context of linked data". In: Linked Data on the Web, Beijing, April 2008, http://eprints.soton.ac.uk/265181/
Koster & Harper 2013: Koster, L. & Harper, C. "Linked Open Data at the IGelU conference in Berlin 2013". http://igelu.org/special-interests/lod/meetings/igelu-2013
Pohl 2013: Pohl, A. "Discovery silo's versus the open web". Blogpost http://openbiblio.net/2013/06/23/discovery-silos-vs-the-open-web/
Prescott & Erway 2011: Prescott, L. & Erway, R. Single Search: the quest for the holy grail, OCLC Research 2011, http://www.oclc.org/research/publications/library/2011/2011-17.pdf
Tsalapati e.a. 2012: Tsalapati, E. Simou,N. Drosopoulos, N., Stein, R. "Evolving LIDO based aggregations into Linked Data", In: CIDOC2012 - Enriching Cultural Heritage, Helsinki, Finland, June 2012, http://www.image.ntua.gr/php/pub_details.php?code=767
©2014 Ellen Van Keer. Published under the Creative Commons Attribution 4.0 license.
This article is part of ISAW Papers 7.
READ PAPER
