Happy to report that the DEEP project consortium agreement between KCL, Queens, Edinburgh (both LTG and EDINA) and Nottingham has been agreed and signed.
-
Recent Posts
Recent Comments
Archives
Categories
Meta
Happy to report that the DEEP project consortium agreement between KCL, Queens, Edinburgh (both LTG and EDINA) and Nottingham has been agreed and signed.
The DEEP project started on time in November last year. Our project plan has been finalised, and will shortly be available from our page on the JISC website.
As it promised, the Survey of English Place-names (SEPN) is a complex and fascinating document. Produced by the English Place-Name Society (EPNS), the SEPN is a true community effort. Its 86 volumes document the names of some 40 English counties, and have been compiled by different place-name scholars over the years. Thus, a succession of different people have moulded the text itself to fit and reflect England’s ancient and rich toponymic landscape.
While this provides an unrivalled resource for the place-name scholar, the historian, the geographer and the linguist, this makes digitizing it a challenge. Our aim is to put the forms into a structured gazetteer, but the structure varies from county to county. The basic hierarchy goes from large units, such as counties and hundreds, to smaller units, such as parishes, townships, settlements and minor names. Some conventions persist. Parish names are mentioned as headings for example, followed by townships and settlements, but there are inevitable exceptions, which makes tagging these sections of text complex – we do not wish to impose artificial structures on anomalous portions of text, since they will all be anomalous for a reason.
OCRing the text is the responsibility of CDDA. This process has thrown up problems, for example in some cases matching Anglo Saxon characters to their supported Unicode equivalents requires expert input from the team at Nottingham. Sometimes AS characters are simply hard to read due to printing issues, sometimes the problem is that the Unicodes themselves need correcting. E.g. a character initially assigned Unicode E624 was misread and reassigned 01ED (ǭ).
Cheshire is now completed, and work is underway on Shropshire.
Here is our updated description for the DEEP project:
Place-names are not static. They change and evolve over time, in response to the development of language, wars and conquests, shifting administrative boundaries, or simply the vagaries of spelling in the days before dictionaries and atlases. They have complex etymologies derived from different languages, and they mean different things to different communities. Therefore, historical documents and archives, ephemera and sources, contain different spellings (forms) of place-names, depending on their date and context. However – and despite the fact that we now take for granted the ability to search geographic data using web services such as Google Maps and GeoNames.org – there is no gazetteer documenting these historic name forms. Therefore, there is no means of linking or cross-searching the geographic references they contain. In summary, a search using a modern place-name will not currently return results for that name in all its many variant forms. This has resulted in a major underutilisation of electronic resources.
Digitisation, however, offers a solution. In England, the historical developments of place-names over time have been systematically surveyed since 1922 by the specialists of the English Place-Name Society (EPNS). Examining an extensive range of documentary sources in local and national archives, and gathering the knowledge of local communities and experts, the EPNS has built up an 86-volume county by county survey of England’s place-names – detailing over four million variant forms, from classical sources, through the Anglo-Saxon period and into medieval England and beyond to the modern period. JISC’s Digital Exposure of English Place-Names (DEEP) project will digitise all these forms, and make them available as structured data. The corpus will be comprise a gazetteer within JISC’s Unlock service, meaning that researchers will be able to cross-query the dataset, and use it to search their own digital documents and databases for any historic place-name form. The gazetteer data will also be made available in structured XML, meaning that it will be possible to experiment with methods of data mining and visualisation that are not possible with the paper volumes. In addition to the digitisation, a network of experts will be convened to correct and enhance the dataset.
The completed resource will provide a key piece of electronic infrastructure for the discovery, clustering, use and analysis of e-content referenced by place. It will also be an important resource for scholars of place-names, and scholars in cognate disciplines such history, linguistics, archaeology, and historical geography.
DEEP has had a long gestation period, and as such it is a logical extension of existing work. Its context is significant existing investment which JISC has made in various forms of gazetteers and geospatial web services such as GeoCrosswalk, GeoDigRef, and Unlock. Principally, it grew from the Connecting Historical Authorities with Linked data, Contexts and Entities (CHALICE), funded in 2010 under JISC’s Information Environment Programme, and led by EDINA. In this exemplar project, the current project team carried out a full pilot demonstrator. This exemplar digitised the place-names of Cheshire, and a sample of those of Shropshire, and extracted place-name, attestation and chronological data from them using the Edinburgh geoparser, and generated a gazetteer of historic place-names to link documents and authority files in Linked Data form. This proved the concept that is being rolled out under DEEP but, as an exemplar was constrained by limitations on time and resources. As a result, methodological challenges have been resolved and the team has a proven track record of working together..
It’s probably worth starting by summarising what this new JISC-funded project is all about so here’s a summary from the grant application to the JISC eContent Capital Programme:
Place-names are a fundamental concept in all academic collections: everything happens somewhere. Contemporary place-names are comprehensively represented in digital gazetteer and geospatial web services such as GeoNames. However, despite millions of pounds of investment by JISC and other agencies in historical online resources in recent years, there is currently no equivalent for historic place-names. This project will digitize the entire 86 volume corpus of the Survey of English Place-Names (SEPN), the ultimate authority on historic place-names in England, and make its 4 million forms available via the JISC-funded Unlock service, along with vigilantly curated crowd-sourced contributions from the expert community. The content will be published in XML, employing a data model. The platform could thus be generalized to incorporate any other historic digital place-name corpus in the UK or elsewhere, provided it is in digital form, and adheres to appropriate standards.
Paul