NLP Tools

In CLARIN-DK a range of NLP tools are available online, i.e. to be used directly from the user interface (download not necessary). The tools work on Danish and English texts and accept the formats TXT, RTF (from Word), and PDF.

You can do:

  • Sentence segmentation and tokenisation of your text (separation of the words, digits and punctuation marks in the text).
  • Part of speech identification of all words in your text (POS-tagging)
  • Lemmatisation of your text (all words in the text are assigned their base form)
  • Creation of frequency list of the words in your text so that you get an overview of the frequently used words.
  • Identification og classification af the names of the text (named entity recognition)

This is how you use the tools online:

  • Upload the file you want to work with
  • Choose file format (TXT, RTF, PDF)
  • Check the box for the tool you want to use det værktøj du vil bruge
  • Press ’Submit’
  • You can now download a zip file with the result (the file with the longest name), output from the intermediate stages and your input file.

Use NLP Tools: