Department of Lexicology and Lexicography
Acting chair: Nóra Ittzés Senior Research Fellow
The main research area of the department is the compilation of monolingual Hungarian dictionaries. This working team created A magyar nyelv értelmező szótára 1–7. (’Explanatory Dictionary of Hungarian’, abbr. ÉrtSz., chief editors: G. Bárczi and L. Országh; Akadémiai Kiadó, Budapest, 1959–1962), and its shortened version, Magyar értelmező kéziszótár (’Concise Hungarian Explanatory Dictionary’, abbr. ÉKsz., edited by J. Juhász, I. Szőke, G. O. Nagy, and M. Kovalovszky; Akadémiai Kiadó, Budapest, 1972).
I. Comprehensive Dictionary of Hungarian
Our primary task at present is to compile A magyar nyelv nagyszótára (’Comprehensive Dictionary of Hungarian’, abbr. Nszt.). The dictionary is written in XML (= Extended Markup Language) database format, an international standard for identifying compositional units of the text. The dictionary will principally contain the vocabulary of the Hungarian literary and common language, and, to a lesser extent, that of other language varieties (dialects, technical terminologies, slang etc.) in approximately 110,000 entries in more than 20 volumes, by processing data from printed texts from the time of the language reform (beginning of the 19th century) up to the present day. It is based on the electronic Hungarian Historical Corpus, containing 30 million running words, as well as on the archive of 6 million dictionary notes (created between the end of the 19th and the middle of the 20th century), and on other texts from CD-ROMs (this collection consists of more than 335 million words). In 2005, the electronic database (created between 1985 and 2000) was completed and corrected for the first time. By the end of 2015, a new phase of the corpus expansion had been completed, and as a result of this, the upper year limit of the corpus became 2010.
In the archival methods of the different types of databases, we can observe the digital evolution of our age. The 5-6 million archival slips of the collection were created in handwritten form firstly, later with typewriter on A6 size paper slips. The building of the electronic corpus had been started also by manual data recording, by typing the selected texts into computers. In 2014-2015, the optical character recognition had the most important role in the corpus expansion. Since the OCR software (ABBYY FineReader) takes digital images as input, the pages of different books and journals with the selected texts had to be digitalized. In the case of a few hundred pages, this was done by digital camera, but in the bigger part of the material, the texts were digitalized by scanning. After creating the digital images, the texts on them were recognized by the OCR software. Due to the recognition sensitivity of the software, and the advanced vocabulary of the program, the text of an average type of page – including text styles and formatting – was recognized with nearly 100 percent accuracy. The completed and corrected material provides a valuable basis for research on the history of words in the lexicon of Hungarian, as well as for compiling various types of dictionaries. The Hungarian Historical Corpus is also available for external users for research purposes. The XML database of the philologically verified bibliographic data of the corpus’ sources (more than 30,000 titles) is also available on the internet.
This comprehensive type of dictionary, which belongs to the category of explanatory dictionaries not only illustrates the vocabulary of the period between 1772 and 2010 with a larger number of lexemes and a richer meaning structure than any other previous dictionary, but also sketches the historical development of the lexemes as well. Word meanings are illustrated with the help of examples, indicating their sources. The first occurrences in the corpus are in all cases cited for each meaning. In accordance with the current international practice in lexicography, we place a special emphasis on distinguishing between collocations, idioms and words, and on handling, interpreting, and illustrating the collocations as individual linguistic units. The dictionary contains a notable amount of technical terms and for the first time in the history of Hungarian lexicography the entries of these lexemes are peer-reviewed by the authorities and experts of the given fields of science or profession. The database format makes it possible to constantly expand and update the dictionary, and enables one to search it according to different criteria, as well as to classify or compare various data.
Since 2006, 6 volumes of the Comprehensive Dictionary of Hungarian have been published. Volume 1 contains supplementary material: it gives information about the general nature, the structure, and the lexicographic principles of the dictionary, the composition of the corpus, the technical background of the electronic database, and also a short overview of the history of the project. In this volume the reader can find the bibliography of the sources (30,000 titles), the description of the philological methods used in the dictionary, as well as the list of the authors of the texts in the database (original authors also in the case of translations). This first volume contains useful tables with the paradigms of the Hungarian words, and a list of abbreviations which enumerate the grammatical and usage labeling and the abbreviations used in the lexicographic description. Volume 2, the first dictionary-volume contains more than 5,500 entries of words beginning with a and á, the third and the fourth volumes more than 6,000 entries of words beginning with b. Volume 5 contains the entries of words beginning with c, cs and the first half of the d-material, while the rest of the d-material, and the entries beginning with e-ek can be found in Volume 6. Each volume contains a list of abbreviations occurring in the example sentences, as well as the current completions of the bibliography.
Since early 2017, the web version of the Comprehensive Dictionary of Hungarian is available at http://nagyszotar.nytud.hu.
Ittzés, Nóra (chief ed.) A magyar nyelv nagyszótára. 1. Segédletek. [‘Comprehensive Dictionary of Hungarian. 1. Supplements’] MTA Nyelvtudományi Intézet, Budapest, 2006. 1119 pp.
Ittzés, Nóra (chief ed.) A magyar nyelv nagyszótára. 2. A–azsúroz. [‘Comprehensive Dictionary of Hungarian. 2.’] MTA Nyelvtudományi Intézet, Budapest, 2006. 1550 pp.
Ittzés, Nóra (chief ed.) A magyar nyelv nagyszótára. 3. B–bes. [‘Comprehensive Dictionary of Hungarian. 3.’] MTA Nyelvtudományi Intézet, Budapest, 2011. 1039 pp.
Ittzés, Nóra (chief ed.) A magyar nyelv nagyszótára. 4. Besz–by. [‘Comprehensive Dictionary of Hungarian. 4.’] MTA Nyelvtudományi Intézet, Budapest, 2011. 1020 pp.
Ittzés Nóra (chief ed.) A magyar nyelv nagyszótára. 5. C–dézs. [‘Comprehensive Dictionary of Hungarian. 5.’] MTA Nyelvtudományi Intézet, Budapest, 2013. 1247 pp.
Ittzés Nóra (chief ed.) A magyar nyelv nagyszótára. 6. Di–ek. [‘Comprehensive Dictionary of Hungarian. 6.’] MTA Nyelvtudományi Intézet, Budapest, 2016. 980 pp.
II. New Etymological Dictionary of Hungarian
The department also works on a new etymological dictionary called Új etimológiai szótár (‘New Etymological Dictionary of Hungarian’, abbr. ÚESz.).
The last internationally acclaimed Hungarian etymological dictionary, the Etymologisches Wörterbuch des Ungarischen 1–2. (abbr.: EWUng., chief editor: L. Benkő; Akadémiai Kiadó, Budapest, 1993–1995) was published more than 15 years ago. As the title suggests, the metalanguage of the dictionary was German, thus the newest results of Hungarian etymological research could be accessed by etymologists of other countries, too. Although the international reception of the dictionary was good, it became less well-known in Hungary than it was expected. The German language of the dictionary can firstly be mentioned as a problematic factor, secondly, its (relatively) high price, thirdly, the limited edition (the possibility of the new edition had not even arisen). Thus many think that the (otherwise outstanding) A magyar nyelv történeti-etimológiai szótára 1–4. (‘Historical-Etymological Dictionary of Hungarian’, abbr.: TESz., chief ed.: L. Benkő; Akadémiai Kiadó, Budapest, 1967−1984) is the only Hungarian etymological dictionary.
Because of this situation, and with the results of etymological research after the middle of the 1990s (mostly in the area of turkology, slavistics, and in the research connected to the second edition of the Magyar értelmező kéziszótár [‘Concise Hungarian Explanatory Dictionary’], abbr. ÉKsz.2, chief ed. F. Pusztai; Akadémiai Kiadó, Budapest, 2003), it has become possible and necessary to compile a new Hungarian etymological (and to a certain extent, word historical) dictionary that satisfies the scientific requirements, too.
The project commenced in 2011, February, and will end in 2015, January, financed with an OTKA Research Grant, coordinated by Károly Gerstner, senior research fellow.
The New Etymological Dictionary of Hungarian is planned to contain approximately 15,000 main and sub headwords in 10,000 entries. This exceeds the entry number of TESz. and EWUng., moreover, besides the difference in quantity, the content also differs. The selection of the entries is based on TESz., EWUng., the second edition of the Concise Hungarian Explanatory Dictionary, the published and planned volumes of the Comprehensive Dictionary of Hungarian, but also on the Hungarian Historical Corpus and the Hungarian National Corpus.
The entry structure and the elements are basically the same as that of the earlier etymological dictionaries: this clear structure is well-known among Hungarian linguists. The dictionary is compiled in electronic format, accordingly to the expectations of the present day and the future. The XML-based database format makes the stuctural unity of the dictionary possible, and, on the other hand, searching in the whole material and updating will also be easy.
ÚESz. will be published in digital (DVD) and in original, paper-based format (in two volumes). It is planned to be a dictionary not only for researchers, but also for the well-read audience, with its rich content and its fluent, developed scientific language, as opposed to EWUng.’s necessarily compact German metalanguage.
Last modified: 22.03.2017