Coronavirus Information på dansk / Information in English

Texts in TEI format – University of Copenhagen

Home > CLARIN-DK Infrastructure > Texts in TEI format

Texts in TEI format

The first version of the CLARIN-DK data centre was developed in the Danish DK-CLARIN project 2008-2011 (See: https://dkclarin.ku.dk/). In the subproject working with written language resources, a variety of written texts were collected and annotated: contemporary as well as older, general language as well as language for specific purposes, fiction as well as non-fiction, and parallel corpora with Danish being one of the languages. Furthermore a common TEI format was developed for all files. This version of the CLARIN-DK data centre will not be expanded any more. 

See a description of the TEI format her:  Asmussen: Text formatting and download the rng schema her: https://clarin.dk/schemas/tei/TEIDKCLARIN.rng
You can search for the old TEI files her: https://clarin.dk/clarindk/find.jsp

Instruction on how to search for TEI files (in Danish): https://info.clarin.dk/clarin-dk-infrastrukturen/vejledninger/Frems_gTeiResurser_v2.pdf/

Automatic generation of the TEI format

In order to upload text files to the old data centre, the files had to be in the common TEI format. For a general user a conversion was a difficult task, and therefore DK-CLARIN created an automatic procedure for preparing the texts for upload. 

This procedure can still be used to create files in TEI format from simple text files or from RTF files (created in Word): https://clarin.dk/clarindk/toolchains-upload.jsp

Instruction on how to generate the TEI format (in Danish): Vejledning i konvertering til TEI

Hostet af Københavns Universitet