Choose a controlled subject vocabulary and train Annif on already indexed documents – it can then suggest subjects for new documents!
Annif uses a combination of existing natural language processing and machine learning tools including TensorFlow, Omikuji, fastText and Gensim. It is multilingual and can support any subject vocabulary (in SKOS or a simple TSV format). It provides a command-line interface, a simple Web UI and a microservice-style REST API.
Code and documentation for Annif is available on GitHub (Apache 2.0 license). Annif can also be installed from PyPI and as a Docker image from Quay.io. Annif is mainly being developed at the National Library of Finland, but others are welcome to join in!
To get a hands-on experience of Annif, study the Annif tutorial materials, which include example data sets, exercises and short video presentations:
There is also extensive usage documentation in the wiki on GitHub.
The annif-users mailing list and web forum is available on Google Groups. The forum is meant for general discussion about Annif, asking for help, and announcements of new versions. All messages are public and anyone is welcome to join!
Please use the forum instead of sending personal e-mail to the Annif developers.
The first article about Annif was published in 2019 in LIBER Quarterly.