Ossetic National Corpus

This site contains the written Corpus of Ossetic language (Ossetic National Corpus — ONC). The total size of ONC is about 12 million tokens. ONC was created in 2011–2014 with the financial support of the Presidium of the Russian Academy of Sciences program “Corpus linguistics” (leader Arseniy P. Vydrin) and RFBR project № 11-06-00512a (leader Michael A. Daniel). The corpus comprises published texts in Iron dialect of Ossetic language which is the base of standard Ossetic.

Structure

Two thirds of ONC consist of the literary journal Makh dug (‘Our epoch’), issues for 1996–2014. Makh dug is the major journal in standard Ossetic which focuses on fiction, poetry and criticism. Other texts included to ONC are issues of the online newspaper “Sputnik&rdqou;, fiction and poetry published during 1990–2014, issues of the literary journal Nogzaw (‘Pioneer’) for 2010–2013 and some works of the most famous Ossetic writers of the 20th century such as Izmail Aylarov, Šamil Djikaev, Azamat Kaytukov, Arsen Kotsoev, Muzafer Dzasohov etc. ONC also contains the comprehensive edition of Narty epic published by the North Ossetic Institute for humane and social studies in 2003-2010. The complete list of the texts included in ONC is downloadable from here.

Corpus specifics

All the texts included in the corpus are automatically annotated in English and in Russian (the annotation consists of grammatical information and translation). The number of annotated wordforms after automatic annotation is approximately 90% of the total number of wordforms. The corpus uses a modified version of the search engine of the Eastern Armenian National Corpus (EANC), which allows searching by lexeme, wordform, and by particular grammatical features. To prevent infringement of copyrights the access to the full versions of the texts is unavailable. When searching for a particular wordform, lemma, or a set of grammatical tags, the platform displays all sentences containing the requested wordforms. Every result sentence can be expanded to 3 sentences before and 3 sentences after it.

Credits

We are grateful to the editorial boards of Makh dug journal, Nogzaw journal and Ir publishers for providing us with soft versions of their publications. We appreciate the collaboration with Akhsar M. Kodzati (editor-in-chief of Makh dug journal), Irida A. Kodzati (desktop publisher specialist of Makh dug journal), Zhanna G. Kozyreva (head of Ir publishers) and Totraz A. Kokaev (editor-in-chief of Ir publishers). We thank M.V. Darchieva and I.M. Mirikova for scanning some of the texts included in the corpus.

Creators of the Corpus

The corpus has been jointly created by Oleg Belyaev and Arseniy Vydrin. The system of automatic morphological analysis UniParser was developed by Timofey Arkhangelskiy. Some texts included to the corpus were scanned by M.V. Darchieva and I.M. Mirikova. Since 2014, the corpus is being maintained and developed by Arseniy Vydrin (email: senjacom@gmail.com).

Contact details

Comments can be sent by email: ossetic.studies@gmail.com

Announcement

We will be thankful for any published texts in Iron Ossetic sent to us. Texts are accepted in any text format (doc, docx, rtf, txt, odt) by the following emails: ossetic.studies@gmail.com and senjacom@gmail.com. We warrant the copyrights protection. The received texts will be used only for corpus purposes.