When you approach a field of research that has not been studied much from certain perspectives, you are confronted with a number of challenges. Due to the lack of comparative projects, there is a certain absence of a blueprint on how to deal with specific issues. The ARITHMETIC project serves as a textbook example. Research into German arithmetic texts of the 15th and 16th centuries, especially from a linguistic and cultural-historical perspective, has not yet attracted much attention. The manuscript corpus is the centrepiece of the project. It is the basis for answering the linguistic and historical questions. The texts are the lock that needs to be opened in order to analyse the narrative storytelling as well as economic and socio-political phenomena in reckoning examples. The transcription of these texts acts as the key to this lock.

The corpus of the ARITHMETIC project contains a total of 115 manuscripts. These are to be transcribed with the help of the transcription software Transkribus. Transkribus operates based on machine learning principles and artificial neural networks, allowing researchers to train their own handwritten text recognition (HTR) models. Transkribus was chosen for the transcription of the arithmetic texts due to its usability and the possibility of training one’s own models.

In the meantime, 2 ½ years of the project have passed and accordingly a lot has already been transcribed. Working with Transkribus on arithmetic texts is very ambivalent. The beginning in particular is time-consuming. There were hardly any public models that were suitable for our text structure. At the same time, many changes took place at Transkribus in late 2022, such as the discontinuation of the HTR+ transcription engine and the launch of the Pylaia engine, which made work even more challenging. On the one hand, the transcriptions of the texts were inadequate, and on the other, Transkribus had great difficulties with the layout of some pages. As a result, the baseline was often split into two or even three parts after reading. Fractions could not be reproduced either. Thanks to the iterative training of our models, many initial difficulties were eliminated in the first year. This enabled us to train better models that were at least able to transcribe the text very well, which saved working time. Transkribus also learnt the digits quickly, even if fractional representations are still problematic. The heterogeneous layouts also pose difficulties to this day. Generic models, which we have now been able to train through intensive transcription, have already eliminated some of these shortcomings, but they will probably never be perfect. A paper on the detailed steps of model training with statistical information and a detailed discussion of the difficulties of transcribing arithmetic texts with Transkribus is in progress and will be published in 2026 at the latest.

Typical example of an interrupted baseline because of fractions – Heidelberg, Universitätsbibliothek, Pal. germ. 642 fol. 7r
Paris, Bibliotheque Nationale, Lat. 7197, fol 11r – ausgelesen mit „Generisches Modell Rechenbücher 15. Jahrhundert“

By Norbert Orbán



Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert