Annif - automated subject indexing using Finna as a corpus

Annif is a statistical automated indexing tool for libraries, archives and museums. After feeding it a SKOS vocabulary and existing openly available metadata from the Finna search engine for library, archive and museum collections, it knows how to assign subjects for new documents.

New version coming up!

The first version of Annif (2017, presented in the video below) was a hackish prototype. In 2018 a new version, supporting multiple backends/algorithms and aiming for production use, is being developed at the National Library of Finland. The code is in the NatLibFi/Annif GitHub repository (Apache License 2.0) and a test instance is running at


Annif has a REST API and a mobile web app that can analyze physical documents such as books. With Annif, we can add semantics to documents in three projects (Finnish, Swedish and English) using our own indexing vocabulary YSO.

Code for Annif is available on Github (CC0 license).

Watch the video

Try it!