The Turkology Annual
The Turkologischer Anzeiger/Turkology Annual (TA), founded by Andreas Tietze (†) and György Hazai, is an indispensable systematic bibliography for Turkology and Ottoman Studies. Experts from all over the world contribute to its compilation, which is funded by several institutions including the UNESCO. The volumes edited by the Department of Oriental Studies of the University of Vienna have until now only appeared in printed form.
Turkology Annual Online
Our project at Heidelberg University's Cluster of Excellence "Asia and Europe in a Global Context" has digitized the first 26 published TA volumes and for the first time provides an online "re-published" version of the resource with new and efficient search functionality. The entries of volume 27-28, which was published in 2010 after the start of our project, will be added as soon as possible.
The TA contains entries in a large number of different languages, including transcriptions of Arabic and languages using the Cyrillic alphabet. Even single entries may contain chunks in several different languages. We expected this to constitute a serious problem for digitization using the Optical Character Recognition (OCR) software available at the Cluster of Excellence: Even very good OCR results are still hardly acceptable for building a database, as entries that contain recognition mistakes cannot be reliably retrieved in search. However, it turned out that accordingly fine-tuned, OCR results were of such a high quality that the few remaining errors were mostly irrelevant for typical search queries. While this meant that the effort of developing automatic OCR correction software would not be justified for our project, we encountered problems of another kind: Syntax analysis of the TA entries proved to be much harder than anticipated, as entry types and data structures are often only implicitly marked, and some of them change from volume to volume. Additionally, syntax analysis (parsing) had to cope with structural errors in the entries - errors that human editors have made and that human readers would not even notice, but errors that can be serious problems for parsing. Our parsing software therefore needed to be tailored on the data in order to be comprehensive as well as robust.
- Poster presented at the Annual Conference 2009 "Flows of Images and Media"
- Poster presented at the Annual Conference 2010 "Flows of Concepts and Institutions"
- Slides and Abstract presented at the conference Scientific Computing and Cultural Heritage 2013
- Heckmann, Dustin, Anette Frank, Matthias Arnold, Peter Gietz, and Christian Roth. "Citation Segmentation from Sparse & Noisy Data: A Joint Inference Approach with Markov Logic Networks." Digital Scholarship in the Humanities 31, no. 2 (2016, First published online 8 December 2014): 333-356. doi:10.1093/llc/fqu061.
- Department of Oriental Studies, University of Vienna (page in German)
(publisher of the TA)
- Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest
- Department of Computational Linguistics, Heidelberg University
(TA Online conception and programming)
- Department of Languages and Cultures of the Near East (Islamic Studies), Heidelberg University
(project patronage, TA copies for scanning)
- Prof. Dr. Anette Frank: project director (Computational Linguistics)
- Prof. Dr. Michael Ursinus: project director (Islamic Studies)
- Matthias Arnold: coordination (image processing and user interface)
- Peter Gietz: coordination (integration into Heidelberg Research Architecture)
- Christian Roth: general coordination
- Arina Chitavong: scanning
- Jens Hansche: scanning
- Nicolas Bellm: programming (database programming)
- Mateusz Dolata: programming (syntax analysis)
- Dustin Heckmann: programming (user interface)
We are not involved in the actual editing of the TA. For any questions, please contact the editors or the University of Vienna.