The Education and Youth Board (harno) has consulted with researchers from Tallinn University. They are investigating whether native language exam papers could be assessed using artificial intelligence in the future. Researchers have already conducted a study that showed that language models give similar grades to humans.
Merilin Aruvee is a lecturer at Tallinn University. She has participated in the testing of 9th grade e-exams. Together with researcher Katarin Leppik, Aruvee developed new assessment criteria. They ask: can exam papers be assessed using artificial intelligence?
Researchers studied how to assess exam papers. They collaborated with harno's assessors. The assessors were experienced native language teachers. They tried to assess exam papers according to the new criteria. Researchers adjusted their model based on teachers' recommendations.
Inspiration came from Kais Allkivi, who works at the TLÜ Institute of Digital Technologies. She has studied how machine learning can help assess Estonian language skills.
Researchers made an agreement with harno. They gained access to the 2024 and 2025 trial exam papers. These papers were coded and anonymized. Researchers could only see the text and the assessors' grades.
As a result of the study, researchers found that language models give similar grades to humans. Aruvee says that the models' grades do not differ much from human grades. AI stays within the same range as humans in 60% of cases.
Artificial intelligence is useful in assessing the use of source texts. A source text is a text that the student uses in their writing. The machine can compare the source text and the student's text. The machine shows which sentences have been copied from the source text.
Aruvee says that humans are necessary in assessment. For example, the introduction and conclusion of a text require human judgment. A person assesses whether the text seems substantial and meaningful.
Aruvee says that language models can also assess texts written in Estonian. This was a surprise to the researchers. Language models usually use internet texts, but texts written in Estonian are different.
Artificial intelligence cannot make the final decision in assessing exam papers. European Union directives and good practice do not allow this.
In February, researchers will present their study results to harno. Harno's press representative says that they have not yet taken a position on the study.
Aruvee says that students need more practice in using source texts. For this, it is necessary to develop learning materials that help students write better.