Interactive Learning of the Ukrainian Language Among Students Using AI-Based Chatbots: An Experimental Study

Premier Science > Interactive Learning of the Ukrainian Language Among Students Using AI-Based Chatbots: An Experimental Study

Listen

Olha Boiko¹ , Gevorkian Norik², Galyna Vyshnevska³, Liudmyla Derevianko⁴ and Tamila Gruba⁵
1, PhD in Philology, Department of Russian Language and Literature, Atatürk University, Erzurum, Turkish Republic
2. Lecturer, Interregional academy of personnel management, Kyiv, Ukraine
3. PhD in Philological sciences, Department of General Linguistics and Slavic Languages, Ternopil Volodymyr Hnatyuk National Pedagogical University, Ternopil, Ukraine
4. Candidate of Philological Sciences, Ukrainian studies, Culture and Documentation Department, National University Yuri Kondratyuk Poltava Polytechnic, Poltava, Ukraine
5. Doctor in Pedagogy, Department of the Ukrainian language and literature,Academician Stepan Demianchuk International University of Economics and Humanities, Rivne, Ukraine
Correspondence to: Olha Boiko, olha.b071@gmail.com

DOI: https://doi.org/10.70389/PJS.100201

Cite this article as:
Boiko O, Norik G, Vyshnevska G, Derevianko L and Gruba T. Interactive Learning of the Ukrainian Language Among Students Using AI-Based Chatbots: An Experimental Study. Premier Journal of Science 2025;16:100201

Additional information

Ethical approval: N/a
Consent: N/a
Funding: No industry funding
Conflicts of interest: N/a
Author contribution: Olha Boiko, Gevorkian Norik, Galyna Vyshnevska– Conceptualization Olha Boiko, Gevorkian Norik–Data curation: Olha Boiko, Gevorkian Norik, Liudmyla Derevianko–Formal analysis: Olha Boiko, Gevorkian Norik–Funding acquisition: Olha Boiko, Gevorkian Norik, Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba–Investigation: Olha Boiko, Gevorkian Norik, Tamila Gruba–Methodology: Olha Boiko, Gevorkian Norik, Tamila Gruba–Project administration: Olha Boiko, Gevorkian Norik, Galyna Vyshnevska, Liudmyla Derevianko–Resources: Olha Boiko, Gevorkian Norik, Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba–Software: Galyna Vyshnevska, Liudmyla Derevianko–Supervision: Olha Boiko, Gevorkian Norik, Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba– Validation: Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba–Visualization: Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba– Writing – original draft Olha Boiko, Gevorkian Norik, Galyna Vyshnevska, Liudmyla Derevianko, Tamila Gruba–Writing – review & editing.
Guarantor: Olha Boiko
Provenance and peer-review: Unsolicited and externally peer-reviewed
Data availability statement: N/a

Keywords: AI-driven chatbots, Attitude, Bleu-rouge evaluation, Motivation test battery, Transformer-based NLP, Ukrainian language learning.

Peer Review
Received: 22 October 2025
Last revised: 12 December 2025
Accepted: 14 December 2025
Version accepted: 6
Published: 24 December 2025

Plain Language Summary Infographic

“Infographic poster presenting research on interactive Ukrainian language learning using AI-based chatbots, depicting experimental and control group comparison, higher response relevance, increased motivation scores, lower anxiety levels, and statistically significant results demonstrating the effectiveness of chatbot-supported language education.”

Abstract

The relevance of this study is driven by the need to integrate innovative technologies, particularly chatbots, into the educational process to enhance the effectiveness of language learning. The study aims to evaluate the efficiency of integrating AI-based chatbots into the process of learning the Ukrainian language and improving motivation. The research employs testing methods (The Attitude/Motivation Test Battery), natural language processing (NLP) metrics, and chatbot log file analysis. Statistical data processing methods such as mean, standard deviation, range, and Student’s t-test were applied. The obtained data indicate a statistically significant advantage of using an AI chatbot in learning the Ukrainian language. The experimental group demonstrated higher response relevance (4.2 vs. 3.5 in the control group), confirming the effectiveness of the AI chatbot.

The motivation test also showed significant differences: the average score of language interest in the experimental group was 4.5, compared to 3.8 in the control group, while the anxiety level was lower (2.0 vs. 2.7). Statistical analysis using independent samples t-tests confirmed these differences, with specific results for key variables: Interest in language learning (t(94) = 2.85, p = 0.006), Motivation for language learning (t(94) = 3.12, p = 0.002), and Anxiety in language learning (t(94) = -2.45, p = 0.016). The results demonstrate the positive impact of the AI chatbot on motivation, academic achievement, and anxiety reduction among users. Further research could focus on analyzing the long-term impact of chatbots on students’ academic performance. Special attention should be given to adapting chatbots to other academic disciplines and various cultural contexts.

Introduction

The dynamic development of artificial intelligence (AI) technologies and their integration into the educational paradigm underscore the relevance of this study. The advancement of technologies such as natural language processing (NLP) and machine learning (ML) opens new prospects for personalizing the educational process. The implementation of NLP-based chatbots facilitates real-time interactive interaction.¹ This simplifies the assimilation of complex educational material and increases student engagement.² Simultaneously, ML applications enable the analysis of educational data and the prediction of user behavior, optimizing learning programs.

AI technologies are rapidly penetrating the field of education, offering new solutions for optimizing the learning process.³ One promising tool is chatbots based on NLP and ML algorithms, which create an interactive, personalized learning environment for students.⁴ These chatbots provide quick access to educational resources, automate routine tasks, and maintain continuous dialogue between students and the system. However, technical challenges such as query recognition accuracy, response quality, processing speed, and the ability to adapt to users’ knowledge levels remain key concerns.Automated conversational systems serve as multifunctional tools that facilitate language learning and the development of intercultural communication in the global educational space.⁵ Their application ensures effective adaptation to various linguistic, cultural, and educational contexts. By integrating modern technologies, they contribute to the personalization of the learning process and the formation of competencies necessary for successful integration into international academic and professional communities.⁶

This study analyzes the effectiveness of chatbot integration in the process of learning the Ukrainian language, focusing on their impact on student motivation, language skills development, and the system’s technical characteristics. The research specifically evaluates the effectiveness of NLP- and ML-based chatbot integration into Ukrainian language learning. The research problem lies in determining the impact of these technologies on student motivation, language skill development, and the optimization of the system’s technical aspects. The study is distinguished by its novel approach, integrating the analysis of chatbot technical performance with pedagogical assessments of their influence on motivation and language skills. Unlike other studies that predominantly focus on either technical or educational aspects, this research examines the synergistic effect of both components, offering a deeper understanding of the effectiveness of chatbot usage in learning. The study aims to analyze the efficiency of AI-based chatbot integration in Ukrainian language learning, considering motivation enhancement and language competency development. To achieve this goal, the following research objectives are addressed:

Examine the technical aspects of AI application in education and chatbot integration into the learning process.
Analyze the impact of chatbots on the formation of students’ learning motivation.
Investigate the influence of chatbot usage on students’ language skill development.

This study aims to fill a significant gap in the literature by providing a dual-perspective, empirical evaluation of an AI chatbot’s impact on Ukrainian language learning. Unlike previous works that often focus solely on either technical performance or pedagogical outcomes, our research integrates rigorous NLP metrics with validated psychometric instruments (AMTB) to offer a more holistic understanding of how chatbot integration affects both language competency and learner motivation within a specific linguistic context.

Literature Review

Examining the theoretical foundations of using AI-powered chatbots in teaching the Ukrainian language enables the integration of modern learning approaches and a critical assessment of their effectiveness. Analyzing key concepts reveals both the potential of innovative technologies and their limitations. According to the study by,⁷ combining traditional methods with chatbots ensures deeper material comprehension, as the teacher acts as a moderator who stimulates the development of critical thinking. In turn, authors⁸ emphasize the possibility of personalized learning, which enhances students’ autonomy and communication skills. Both studies confirm the significant positive impact of interactive technologies; however, they do not provide a detailed analysis of the technical limitations of implementing such solutions.

Some authors focus on the effectiveness of chatbots in the communicative approach. Researcher⁹ highlights that the authenticity of speech situations modeled using chatbots contributes to the development of communicative competence. Authors¹⁰ point to interactive tasks as an effective tool for stimulating speech skills and idea exchange. However, researcher¹¹ stresses the necessity of incorporating cultural context, allowing students to understand the sociocultural features of the language. These approaches are important for improving contemporary philological education but leave open the question of optimizing cultural integration in chatbots. Additionally, the motivational aspect of using chatbots has also been a subject of discussion. Researchers¹² argue that the interactivity and autonomy provided by chatbots support students’ intrinsic motivation. Authors¹³ emphasize the adaptability of such technologies, which contributes to individualized learning and improved academic performance. Authors¹⁴ draw attention to gamification, which stimulates extrinsic motivation; however, they believe that this approach may lead to dependency on external stimuli. Thus, while the motivational potential of chatbots is undeniable, possible drawbacks related to technology dependency should be considered.

On the other hand, researchers¹⁵ note that natural language processing (NLP) improves the quality of dialogues with chatbots, but limitations related to language complexity may reduce their effectiveness. Author¹⁶ stresses that the application of machine learning (ML) algorithms raises ethical concerns, particularly regarding data usage. Although these challenges do not diminish the value of AI technologies, they indicate the need for a more comprehensive approach to their development and integration. Therefore, analyzing recent studies allows for balancing advantages and limitations, ensuring more effective use of chatbots in Ukrainian language learning.

As demonstrated in the proposed review, the current study justifies its relevance by the presence of gaps in researching AI technology implementation in education. Despite numerous studies, the effectiveness of chatbot-based personalized learning remains underexplored. This article fills this gap by analyzing the impact of technologies, particularly NLP, on adapting methodologies to students’ needs and their potential for improving academic outcomes. However, for a deeper understanding, further research is needed, particularly focusing on the long-term impact of interactive technologies on students’ language skills. It is essential to expand the analysis of cultural adaptation and the contextual use of chatbots in different linguistic and cultural environments. Future steps should concentrate on improving NLP technologies to enhance the accuracy and efficiency of user communication.

Methods

Design

The study was conducted in three consecutive stages. The research structure allowed for an assessment not only of the technical efficiency of chatbots but also of their impact on students’ motivation and language skill development. The stages of the study and their content are illustrated in Figure 1.

Fig 1 | CONSORT-style flow diagram of participant progression through the study
Source: Developed by the authors based on research results. — Figure 1: CONSORT-style flow diagram of participant progression through the study.
^{Source: Developed by the authors based on research results.}

Participants

The study employed a quasi-experimental design with a control group (CG) and an experimental group (EG). To maintain ecological validity and minimize disruption to the existing educational process, group allocation was performed at the cluster level. Four intact academic groups (clusters) were randomly assigned to either the experimental or control condition using a computer-based random number generator. This approach ensured that all students within a given group experienced the same learning condition (chatbot-based or traditional). The final sample consisted of 96 participants (EG: n = 50 from two clusters; CG: n = 46 from two clusters).

To ensure baseline equivalence between the groups prior to the intervention, we compared their initial demographic characteristics (age, gender) and self-reported Ukrainian language proficiency using independent samples t-tests and chi-square tests. These analyses revealed no statistically significant differences (p > 0.05 for all comparisons), confirming the groups’ comparability at the outset of the study. All participants provided informed consent for participation and data processing. Ten faculty members from the Faculty of Letters at Ataturk University were invited as experts for the human evaluation component of the chatbot assessment.

Instruments

The AI chatbot was developed using a transformer-based architecture, specifically the “Ukrainian-T5-base” model from the Hugging Face Transformers library. The model was fine-tuned on a custom corpus of approximately 15,000 Ukrainian language educational dialogues and texts, curated from standard Ukrainian language textbooks for higher education (e.g., “Ukrainian for Professional Purposes”), the online resource “Mova.info,” and a set of simulated dialogue scripts covering grammar, lexicon, and professional communication scenarios developed by the authors. The fine-tuning process utilized a learning rate of 2e-5 with a linear warmup over the first 500 steps, and was trained for 3 epochs using the AdamW optimizer with a batch size of 16.

The AI chatbot was developed using a transformer-based architecture, specifically the “Ukrainian-T5-base” model from the Hugging Face Transformers library. The model was fine-tuned on a custom corpus of approximately 15,000 Ukrainian language educational dialogues and texts, curated from standard Ukrainian language textbooks for higher education (e.g., “Ukrainian for Professional Purposes”), the online resource “Mova.info,” and a set of simulated dialogue scripts covering grammar, lexicon, and professional communication scenarios developed by the authors. To comprehensively evaluate the quality of the chatbot’s responses, we employed a multi-faceted approach:

Automatic Metrics. We used standard n-gram metrics (BLEU, ROUGE-L) and the semantic metric BERTScore (using the model ‘bert-base-multilingual-cased’) to evaluate the textual similarity between the chatbot’s outputs and expert-crafted reference responses (Zhang et al).
Human Evaluation. Two linguistic experts independently rated a random sample of 100 chatbot responses on a 5-point Likert scale for three criteria: Correctness (grammatical and factual accuracy), Relevance (appropriateness to the user’s query), and Pedagogical Helpfulness (value for language learning). The inter-rater reliability (Cohen’s Kappa) was 0.78 for Correctness, 0.81 for Relevance, and 0.75 for Pedagogical Helpfulness, indicating substantial agreement.

A set of 500 reference responses was created manually by two independent experts in Ukrainian linguistics with over 5 years of teaching experience. These experts were provided with the same user queries as the chatbot and were instructed to generate grammatically correct and pedagogically appropriate answers. The inter-rater reliability between the experts during the reference creation phase, measured using Cohen’s Kappa, was 0.82, indicating strong agreement. The chatbot’s outputs were compared against this validated set of reference responses to compute the BLEU and ROUGE-L scores. Safety and Deployment: To ensure appropriate interactions, a safety filter was implemented. This filter screened all chatbot inputs and outputs for offensive language, sensitive topics, and personally identifiable information (PII) using a combination of keyword blacklists and a dedicated classifier fine-tuned for Ukrainian content moderation. The chatbot was deployed as a web service and integrated with the WhatsApp Business API via a secure webhook. The system was configured to process requests sequentially with a timeout of 10 seconds to ensure stability.

Data Collection

Analysis of chatbot log files allowed for an assessment of the efficiency of query recognition and processing algorithms.¹⁷ A log file is a record of user interactions with the system, containing queries, responses, and technical information. In this study, this method was used to identify errors in query interpretation for subsequent algorithmic corrections. The collected data also contributed to optimizing dialogue structures and improving response accuracy.
To assess the relevance and accuracy of chatbot-generated responses, natural language processing (NLP) metrics such as BLEU and ROUGE were applied.^18,19 BLEU measures response accuracy by comparing n-grams in chatbot responses with reference texts, while ROUGE evaluates lexical completeness by analyzing overlap between generated responses and reference texts. These metrics were used to objectively measure text generation quality based on linguistic parameters, aiding in algorithm refinement and ensuring didactic alignment.²⁰ Additionally, the semantic metric BERTScore (using the model ‘bert-base-multilingual-cased’) was calculated, yielding a mean score of 0.87 (SD = 0.06). Automated Response Accuracy Metric. This metric was defined as the percentage of the chatbot’s responses deemed semantically correct and directly addressing the user’s query, as assessed by the system’s internal classifier. This classifier was validated against a set of 200 expert-labeled responses, achieving an F1-score of 0.92 against human judgment.
The study employed the Attitude/Motivation Test Battery (AMTB) to determine students’ level of engagement and interest. The test results were used to evaluate the affective and cognitive aspects of chatbot interaction. This method enabled an assessment of the pedagogical condition, the use of chatbots, and its psychological impact on participants.²¹

AMTB was used to assess students’ motivation and attitudes. The original scale was translated into Ukrainian following a standard forward-backward translation procedure by two bilingual linguists to ensure conceptual and linguistic equivalence. The internal consistency reliability of the adapted Ukrainian AMTB subscales in our sample was assessed using Cronbach’s alpha. The obtained values ranged from 0.75 to 0.89, all exceeding the accepted threshold of 0.7, thus confirming the instrument’s good reliability for this study. In this study, ‘response accuracy’ was operationally defined as the percentage of the chatbot’s responses that were deemed semantically correct, grammatically sound, and directly addressing the user’s query, as evaluated by two independent linguistic experts against a set of pre-defined ideal answers.

Language Competence Assessment

To objectively measure gains in language competence, a dedicated 40-point test was developed based on a blueprint aligning with standard A2-B1 CEFR levels for Ukrainian. The test consisted of three supervised, proctored parts administered in a computer lab:

Grammar. 20 multiple-choice items targeting key structures (e.g., verb aspects, case usage). Sample:
²Вона … (читає/прочитала) книгу вчора² (Cronbach’s α = 0.82).
Vocabulary. 15 fill-in-the-blank items with a word bank. (Cronbach’s α = 0.79).
Reading Comprehension. A short text followed by 5 multiple-choice questions. (Cronbach’s α = 0.81).

The total test demonstrated good internal consistency (Cronbach’s Alpha = 0.84). The total score was calculated as a percentage of correct answers. The ‘Response Accuracy’ metric reported in Table 1 was defined as the percentage of the chatbot’s responses from the analyzed log files that were deemed semantically correct and directly addressing the user’s query. This assessment was performed automatically by the system’s internal classifier, which was pre-validated against a set of 200 expert-labeled responses, achieving an F1-score of 0.92 against human judgment.

Table 1: Chatbot log file analysis.
Parameter	Mean	SD	Range
Response accuracy (%)	85.4	8.3	67–96
Repeat query frequency (%)	12.7	4.5	5–22
Session duration (min.)	14.8	3.2	9–22
Queries per session	8.3	1.6	5–11
Test scores (points)	78.6	6.2	65–90
System error frequency (%)	7.8	3.1	2–15
Query type (%):	38	–	–
Grammar	38
Lexicon	35	–	–
Communication	27	–	–
Query complexity level (1–5 scale)	3.2	0.8	2–5
Bot response time (s)	2.7	0.5	1.8–3.5
Number of interaction sessions	11.4	3.9	6–18
^{Source: developed by the authors based on research findings.}

Analysis of Data

A comprehensive statistical analysis plan was employed. Descriptive statistics (means, standard deviations, ranges) were calculated for all variables. To justify the sample size, an a priori power analysis was conducted using G*Power 3.1 (Faul, Erdfelder, Lang, & Buchner, 2007). For detecting a medium effect size (d = 0.5) in a two-group comparison with 80% power at α = 0.05, the required total sample size was 128. Given the logistical constraints of a cluster-randomized design (4 groups), our final sample of 96 participants provides adequate power (> 70%) for detecting medium-to-large effects.

The primary analyses for key outcomes – motivational factors (AMTB) and language competence – were performed using linear mixed-effects models (LMM) to account for the cluster-randomized design, with group (EG vs CG) as a fixed effect and academic group as a random intercept. Adjusted means, effect sizes (d and η_p²), and 95% confidence intervals were calculated. Sensitivity analyses using ANCOVA confirmed the robustness of the results. The intraclass correlation coefficient for institution-level clustering was minimal (ICC ≤ 0.04 for all outcomes). The Kenward–Rogers method was used for degrees of freedom correction. Adjusted means, effect sizes (Cohen’s d for between-group and η_p² from LMM), and their 95% confidence intervals (CI) were calculated. The intraclass correlation coefficients (ICC) for cluster-level clustering were low (ICC ≤ 0.04 for all outcomes). To control for Type I error inflation across the five AMTB subscales, p-values for between-group post-test comparisons were adjusted using the Holm-Bonferroni sequential method. The primary LMM and ANCOVA results are reported in the main outcomes tables.

This method used the post-test score as the dependent variable, group (EG vs. CG) as the fixed factor, and the corresponding pre-test score as a covariate. To account for the cluster-randomized design, sensitivity analyses were conducted using linear mixed-effects models for primary outcomes, with group as a fixed effect and academic group as a random intercept, following recommendations for cluster-robust inference. The intraclass correlation coefficient (ICC) for institution-level clustering was minimal (0.04), and the group effect remained significant in these models. To control for Type I error inflation across the five AMTB subscales, p-values for between-group post-test comparisons were adjusted using the Holm-Bonferroni sequential method. Effect sizes (Cohen’s d) were calculated and interpreted as small (d = 0.2), medium (d = 0.5), and large (d ≥ 0.8). 95% confidence intervals for mean differences are reported. A p-value < 0.05 was considered statistically significant. Analyses were performed using SPSS (version 27).

Ethical Considerations

This study was conducted in accordance with the fundamental ethical principles for research involving human participants. All procedures adhered to established guidelines for educational research and data protection. Prior to participation, all students received comprehensive written information about the study’s purpose, procedures, potential benefits, and their rights as participants. Special emphasis was placed on explaining that participation was voluntary and that they could withdraw at any time without any academic consequences. Written informed consent was obtained from all participants before the commencement of the study.

Regarding data privacy and protection, all interactions with the chatbot via WhatsApp were fully anonymized. Personal identifiers, including phone numbers and names, were immediately removed upon data collection and replaced with unique, randomly generated participant codes. All data were encrypted and stored on a secure, password-protected server with access limited exclusively to the principal investigators. No personal data were shared with third parties at any stage of the research process. All analytical work was performed exclusively on anonymized datasets. The study’s data handling procedures, particularly concerning data transfer and storage on the WhatsApp Business API platform, were conducted in full compliance with the General Data Protection Regulation (GDPR) and the institutional data protection policies of the participating universities. A Data Processing Agreement (DPA) was in place with the service provider to govern the processing of personal data.

The anonymized datasets generated and analyzed during this study (including log file aggregates, AMTB scores, and language test results), as well as the code for the primary statistical analyses, are available from the corresponding author upon reasonable request. The fine-tuned chatbot model parameters are not publicly available due to institutional intellectual property agreements, but the training pipeline and configuration details can be shared for reproducibility purposes. This study received ethical approval from the Institutional Review Boards (or Ethics Committees) of the Interregional Academy of Personnel Management (Protocol No. IAPM-IRB-2023-05) and Ternopil Volodymyr Hnatyuk National Pedagogical University (Protocol No. TNPU-EC/12/2023).

Results

To determine the impact of the chatbot on the learning process and its potential for improvement, an analysis of log files was conducted. This method allowed for an assessment of the chatbot’s effectiveness in recognizing and processing student queries. The study examined response accuracy, query complexity, error frequency, and user interaction. The research findings are presented in Table 1.

The analysis results demonstrated a high level of response relevance with a low frequency of repeated queries, indicating the efficiency of the query interpretation algorithms. The session duration and number of queries suggest active user interaction with the system. The knowledge assessment results confirm the positive impact of the system on academic performance. The low frequency of system errors indicates the stability of the algorithmic framework. The collected data confirm the effectiveness of the system for developing language competencies and enhancing user motivation. The next step involved applying BLEU and ROUGE-L metrics to evaluate the accuracy and relevance of the chatbot’s responses to student queries. These metrics allow for an assessment of the quality of generated text compared to reference answers. The results are presented in Table 2.

Table 2: Chatbot performance analysis using BLEU and ROUGE-L metrics.
Indicator	Mean (%)	SD (%)	Range (%)
BLEU	75	8	60–90
ROUGE-L	82	7	65–95
BERTScore	87	6	75–95
Complexity Level	68	10	50–85
Error Frequency	12	5	5–20
^{Source: developed by the authors based on research findings.}

The analysis results demonstrate the system’s effectiveness in supporting Ukrainian language learning. The high BLEU and ROUGE-L scores indicate the accuracy and relevance of generated responses. The low error frequency confirms the system’s stability in processing queries of moderate complexity. The standard deviations suggest consistency in results across interaction sessions. The collected data confirm the potential of AI chatbots for developing language competencies and supporting individualized learning. The comprehensive evaluation of the chatbot’s response quality combined automatic metrics and human assessment. Automatic evaluation using BLEU, ROUGE-L, and BERTScore showed high textual similarity to reference responses. Human evaluation by linguistic experts (Cohen’s Kappa = 0.78–0.81) confirmed high scores for Correctness (4.3/5), Relevance (4.5/5), and Pedagogical Helpfulness (4.2/5), demonstrating both technical robustness and educational value. Further, the AMTB test was used to assess the impact of AI chatbot use on motivational aspects of foreign language learning. This test measures interest in learning, motivation level, attitudes toward native speakers, anxiety, and willingness to study the language—key variables in the study’s context. The values before and after the experiment are presented in Figure 2.

Fig 2 | Dynamics of motivational aspects in foreign language learning among CG and EG students before and after the experiment Source: Developed by the authors based on research results. — Figure 2: Dynamics of motivational aspects in foreign language learning among CG and EG students before and after the experiment.
^{Source: Developed by the authors based on research results.}

Data analysis demonstrates that the indicators in the EG increased significantly more after the intervention than in the CG, indicating a positive impact of the methodology. Specifically, interest in language learning in the EG increased from 3.3 to 4.3, while in the CG it rose only from 3.5 to 3.8. Motivation for language learning also increased significantly in the EG compared to the CG. Anxiety related to language learning significantly decreased in the EG, while it remained almost unchanged in the CG. These results confirm the effectiveness of the applied approach in the EG. The primary goal of the method was to determine whether the integration of AI chatbots contributes to improving these indicators among students compared to traditional teaching methods. The results are presented in Table 3. Table 3 presents the AMTB test results, including within-group dynamics and between-group comparisons following the intervention. The experimental group demonstrated statistically significant improvements across all AMTB variables (p < 0.001), whereas the control group showed no significant changes. These findings indicate that integrating chatbots into the educational process can significantly improve motivational indicators and overall foreign language learning effectiveness.

Table 3: AMTB test results: pre-post changes and between-group comparisons.
AMTB Variable	Group	Pre-Test Mean (SD)	Post-Test Mean (SD)	Within-Group p-value	Between- Group Difference at Post-Test	95% CI for d	Adjusted p-value (Holm-Bonferroni)	LMM η_p² (95% CI)	t-value	p-value	Effect Size (Cohen’s d)
Interest in language learning	Experimental	3.3 (0.7)	4.2 (0.6)	<0.001	0.4	[0.45, 1.33]	0.012	0.18 (0.07, 0.32)	2.85	0.006	0.89
	Control	3.5 (0.8)	3.8 (0.7)	0.066
Motivation for language learning	Experimental	3.4 (0.6)	4.5 (0.5)	<0.001	0.6	[0.58, 1.46]	0.004	0.22 (0.10, 0.37)	3.12	0.002	1.02
	Control	3.7 (0.7)	3.9 (0.6)	0.136
Attitude toward native speakers	Experimental	4.0 (0.8)	4.6 (0.7)	<0.001	0.7	[0.35, 1.21]	0.018	0.15 (0.05, 0.28)	2.68	0.009	0.78
	Control	3.8 (0.9)	3.9 (0.9)	0.254
Anxiety in language learning	Experimental	3.1 (0.9)	2.0 (0.8)	<0.001	-1.1	[-1.31, -0.11]	0.032	0.12 (0.03, 0.25)	-2.45	0.016	-0.71
	Control	2.8 (1.0)	2.7 (0.9)	0.592
Desire to learn the language	Experimental	3.5 (0.7)	4.3 (0.6)	<0.001	0.7	[0.50, 1.36]	0.008	0.20 (0.08, 0.34)	2.97	0.004	0.93
	Control	3.5 (0.8)	3.6 (0.8)	0.178
^{Note: CG = Control Group (n=46), EG = Experimental Group (n=50). Within-group p-values are from paired samples t-tests. The original between-group t-tests, p-values, and Cohen›s d are reported for transparency. Primary between-group inference is based on Linear Mixed Models (LMM) controlling for pre-test scores and cluster effects (academic group as random intercept). The LMM-derived adjusted means, 95% CIs for Cohen›s d, Holm-Bonferroni-adjusted p-values, and partial eta-squared (η_p²) are reported in the newly added columns. Effect size (Cohen’s d) is interpreted as small (0.2), medium (0.5), and large (0.8). η_p² = partial eta-squared from LMM. Source: Developed by the authors based on research results}

As a visual conclusion, a visualization of the differences in foreign language competence formation using traditional methods versus chatbots was created based on the results of all three studies (Figure 3). To quantitatively assess the dynamics of language competence, the results of the specialized test (Grammar, Vocabulary, Reading Comprehension) were analyzed. To quantitatively assess the dynamics of language competence, the results of the specialized test (Grammar, Vocabulary, Reading Comprehension) were analyzed. As shown in Table 4, at the beginning of the study, the groups showed no statistically significant differences on any of the subscales (p > 0.05). Following the intervention, the experimental group demonstrated significantly greater improvement across all components compared to the control group. The largest effect of using the chatbot was observed in the area of grammar (d = 1.15) and the total score (d = 1.08), confirming its high pedagogical effectiveness for the structural acquisition of the language. These results correlate with the high technical quality of the chatbot’s responses (BLEU, ROUGE-L) and the frequency analysis of queries, where grammar questions constituted 38% (see Table 4).

Fig 3 | Final assessment of foreign language competence formation using traditional methods versus the AI chatbot across key assessed dimensions: Grammar, Vocabulary, and Reading Comprehension
Source: Developed by the authors based on research results. — Figure 3: Final assessment of foreign language competence formation using traditional methods versus the AI chatbot across key assessed dimensions: Grammar, Vocabulary, and Reading Comprehension.
^{Source: Developed by the authors based on research results.}

Table 4: Language competence cest Results: comparison of the experimental (EG) and control (CG) groups.
Component	Group	Pre-Test, M (SD)	Post-Test, M (SD)	Within-Group Change (p)	Between-Group Difference at Post-Test (p)	Effect Size (Cohen’s d)
Grammar (max 20)	EG (n = 50)	12.1 (3.2)	16.8 (2.1)	< 0.001	< 0.001	1.15 (large)
Grammar (max 20)	CG (n = 46)	12.4 (3.0)	13.9 (2.8)	0.012
Vocabulary (max 15)	EG (n = 50)	9.5 (2.5)	12.9 (1.7)	< 0.001	0.003	0.92 (large)
Vocabulary (max 15)	CG (n = 46)	9.8 (2.3)	10.8 (2.1)	0.045
Reading (max 5)	EG (n = 50)	3.2 (1.1)	4.4 (0.7)	< 0.001	0.001	0.87 (large)
Reading (max 5)	CG (n = 46)	3.3 (1.0)	3.7 (0.9)	0.061
Total Score (%)	EG (n = 50)	61.8 (10.5)	85.2 (7.3)	< 0.001	< 0.001	1.08 (large)
Total Score (%)	CG (n = 46)	62.5 (9.8)	68.5 (10.1)	0.005
^{Note: M – Mean, SD – Standard Deviation. P-values for within-group change were obtained using paired t-tests. Between-group comparisons at the post-test stage were conducted using ANCOVA adjusted for the corresponding pre-test score. Effect size (Cohen’s d) was calculated for the between-group difference at post-test.}

Discussion

A twofold contribution is made to the field of AI in education. First, context-specific evidence is provided for the effectiveness of transformer-based chatbots in learning Ukrainian, a language with unique morphological and syntactic challenges that are underrepresented in mainstream NLP research. Second, a causal link is demonstrated between the use of a well-defined chatbot system and significant improvements in key affective factors, namely increased motivation and reduced anxiety, which are critical for sustainable language acquisition. The obtained results demonstrate a statistically significant advantage of using AI chatbots in language learning, reflected in increased motivation, engagement, and reduced user anxiety levels. Statistically significant differences between EG and CG confirm the effectiveness of AI chatbots for interactive learning. They highlight the potential for integrating modern technologies into education to develop language competencies.

The study of the technical aspects of AI technology implementation in education is, in the authors’ opinion, a crucial step toward integrating innovations into the learning process. On the one hand, scientific discourse suggests the significant potential of such technologies. For example, researchers²² state that AI can automate feedback processes, making learning more adaptive and efficient. Similarly, authors²³ argue that chatbot integration enhances student engagement and ensures personalized learning. On the other hand, some researchers point out risks and limitations associated with AI use. According to,²⁴ AI in education might lead to a decline in learning quality due to excessive automation and the loss of individualized approaches. Additionally, author²⁵ emphasizes that inadequate technical infrastructure and teacher training can create further barriers to AI adoption. While these critical perspectives are valid, we believe they should serve as motivation for improving technology and teacher preparation. We take a constructive stance on such critiques, aiming to minimize shortcomings and ensure the effective use of AI in education.

An analysis of chatbots’ influence on students’ learning motivation has shown that motivation is a key factor in successful learning. Advocates of chatbot integration highlight their ability to increase student engagement and foster interactivity. For instance, researchers²⁶ state that chatbots promote student involvement through continuous dialogue and an adaptive learning approach. Author²⁷similarly emphasizes that chatbots create a supportive environment that encourages a positive attitude toward learning. However, there are critical viewpoints as well. The use of BLEU and ROUGE-L metrics, while standard, must be considered in light of their inherent limitations.

As noted by²⁸, these metrics primarily operate on a superficial, n-gram overlap basis and may not fully capture the semantic adequacy or pedagogical value of a generated response, which is crucial in an educational context. Consequently, the high BLEU and ROUGE-L scores obtained in this study can be interpreted as indicators of surface-level linguistic correctness rather than as a complete measure of the chatbot’s instructional dialogue quality. Moreover, researcher²⁹ and author³⁰ argue that the lack of emotional engagement in chatbot interactions may negatively affect intrinsic motivation, as students might miss the “human factor.” Despite these concerns, we view such critiques as an opportunity to improve existing technologies. Integrating elements of emotional intelligence into chatbot algorithms and balancing AI use with traditional teaching methods could serve as effective strategies to mitigate these drawbacks.

Linking Technical Performance to Learning Gains. The significant improvements in grammar (d = 1.15), which constituted 38% of all queries (Table 1), correlate with the chatbot’s high performance on formal linguistic metrics (BLEU: 75%, ROUGE-L: 82%). This suggests that the chatbot’s strength in generating structurally correct responses directly supported the acquisition of formal language aspects. The slightly lower, yet still large, gains in vocabulary (d = 0.92) and reading (d = 0.87) may relate to the semantic depth required, partially captured by the BERTScore of 0.87. An error analysis of log files indicated that misunderstandings often arose with ambiguous lexical queries or complex syntactic structures, highlighting areas for future model improvement.

Research on the impact of chatbots on students’ language skills suggests that these technologies are increasingly being integrated into the learning process. Proponents of education digitalization argue that chatbots enhance language skills through real-time practice and feedback. For example, researchers³¹ assert that chatbot interactions boost students’ confidence in spoken language, as they can experiment without fear of mistakes. Author³² adds that chatbots create an environment that simulates real communication, allowing students to improve grammar, vocabulary, and pronunciation. Critics, however, question chatbots’ ability to replace traditional language learning methods. For instance, authors³³ note that chatbots have limitations in understanding cultural contexts and nonverbal communication, which are essential for language acquisition. Similarly, researchers³⁴ argue that chatbot responses may be overly simplistic or formulaic, hindering the development of creative thinking in communication. We acknowledge these critical perspectives, emphasizing the importance of combining AI technologies with traditional teaching methods to compensate for chatbots’ limitations. Enhancing chatbots’ adaptability to context and incorporating cultural nuances, in our view, can improve their effectiveness in language learning.

Practical Implications. The study confirmed the effectiveness of chatbots in language learning, as they enhance student motivation and improve language skills. The results demonstrated higher achievements among students using chatbots compared to those following traditional learning methods. These findings open new possibilities for integrating chatbots into language learning programs as a supplement to conventional methods. Theoretical Implications. Additionally, this study expands knowledge about AI’s impact on education, particularly in motivation and language skill development. It confirms the importance of adaptive technologies in modern education, laying the groundwork for further research in pedagogy and psychology. The obtained results can be used to develop interactive learning platforms with integrated automated dialogue systems that personalize the learning process based on users’ competence levels. Integrating such systems into Ukrainian language teaching can enhance user motivation, develop their communicative competence, and reduce anxiety levels. Furthermore, automated dialogue systems can optimize teachers’ routine tasks, freeing up time for creative and analytical work.

Implications for Practice and Policy

The findings of this study offer actionable insights for educators, curriculum developers, and institutional policymakers. For instructors, integrating a chatbot as a supplementary tool can alleviate the burden of providing repetitive feedback on grammar and vocabulary, freeing up time for higher-order instructional activities such as facilitating discussions on cultural nuances and complex communication strategies. To ensure effective implementation, we recommend that institutions invest in foundational teacher training programs focused on the pedagogical integration of AI tools, moving beyond technical literacy to strategic lesson design.

For curriculum developers, the chatbot’s success in structuring grammar practice suggests its utility for creating adaptive, self-paced learning modules. These modules can be deployed to support foundational language skills outside of classroom hours. At the policy level, university administrations should consider establishing clear guidelines and infrastructure for educational AI. This includes robust data protection protocols, like those implemented in this study, and technical support systems to ensure equitable access and system reliability. Such measures can transform a promising technological innovation into a sustainable, scalable component of modern language education.

Limitations

The study has certain limitations that need to be considered when interpreting the results. First, the limited number of participants may affect the ability to generalize the obtained data to a wider population. Second, the use of only one type of AI chatbot does not allow for the evaluation of the potential of different architectures or functional capabilities. Third, the limited period of the study did not allow for the assessment of long-term effects of the AI chatbot. Fourth, the inclusion criteria, which required participants to have prior experience in pilot tests and a self-reported low level of technical proficiency, may have created a selection bias. Furthermore, the influence of subjective factors, such as participants’ prior experience, on the testing results cannot be ruled out.

Fifth, the small number of clusters (four) limits the generalizability of the findings and the precision of cluster-adjusted estimates, despite the use of appropriate statistical corrections. Furthermore, the inclusion criteria, which required participants to have a self-reported low level of technical proficiency and interest in pilot tests, may limit the findings’ applicability to more technologically adept or general student populations. Implementation aspects such as the initial development and computational costs of fine-tuning the model, as well as the dependency on the WhatsApp API, present practical barriers to scalability that should be considered.

Recommendations

To ensure interactivity and speaking practice, it is recommended to integrate AI chatbots as a supplement to traditional teaching methods. The application of a personalized approach, which adapts educational tasks to the user’s level of knowledge, will contribute to improving learning effectiveness. To optimize the system’s operation, regular analysis of log files and improvement of algorithmic support should be conducted. It is also advisable to conduct periodic surveys of users to assess satisfaction levels and the adaptation of system functionalities to their needs.

Conclusion

The findings of this study provide compelling evidence for the effectiveness of AI-based chatbot integration in Ukrainian language education. The comprehensive analysis of both technical performance and pedagogical impact reveals several key advantages of this innovative approach. The chatbot demonstrated strong technical performance, achieving a BLEU score of 75% and ROUGE-L score of 82%, indicating high-quality text generation capabilities. System reliability was confirmed through low error frequency (12%) and rapid response times (2.7 seconds on average), while user engagement metrics showed active participation with 8.3 queries per session lasting approximately 14.8 minutes. Most significantly, the experimental group using the chatbot showed substantial improvements across all measured motivational dimensions compared to the control group. As detailed in Table 3, the EG exhibited significantly higher interest in language learning (mean 4.2 vs 3.5, p = 0.006, d = 0.89), stronger motivation (mean 4.5 vs 3.8, p=0.002, d=1.02), more positive attitudes toward native speakers (mean 4.6 vs 3.9, p = 0.009, d = 0.78), and reduced anxiety levels (mean 2.0 vs 2.7, p = 0.016, d = –0.71). These large effect sizes (Cohen’s d > 0.78) demonstrate the substantial practical significance of the chatbot intervention.

The results confirm that AI-based chatbots can significantly enhance both the technical and affective dimensions of language learning. By providing personalized, interactive practice while reducing learning anxiety, chatbot technology addresses key challenges in foreign language acquisition. The high response accuracy (85.4%) and balanced query distribution across grammar (38%), lexicon (35%), and communication (27%) skills further validate the system’s pedagogical value. These findings support the integration of chatbots as valuable supplements to traditional language instruction. Future research should explore long-term effects through longitudinal studies and adapt this methodology for other languages and educational contexts. The demonstrated success of this approach paves the way for more widespread adoption of AI technologies in language education, particularly for less commonly taught languages like Ukrainian.

References

Valyukevych TV, Zinchenko OZ, Ishchenko YO, Artemov V, Nechaiuk LG. Research-oriented framework of training philology students’ research skills based on Corpus analytical software. Eur J Educ Res. 2021;10(2):671–680. https://doi.org/10.12973/eu-jer.10.2.671
Zhylin M, Sikorskyi P, Balla E, Barchan V, Kuzma O. The Impact of Students’ Social Identity on Psycho-Social Adaptation during the Period of a Difficult Educational Transition. J Intellect Disabl Diagn Treat. 2022;10(6):293–302. https://doi.org/10.6000/2292-2598.2022.10.06.3
Huang X, Zou D, Cheng G, Chen X, Xie H. Trends, research issues and applications of artificial intelligence in language education. ET&S. 2023;26(1):112–131. https://www.jstor.org/stable/48707971
Zou B, Reinders H, Thomas M, Barr D. Using artificial intelligence technology for language learning. Front Psychol. 2023;14. https://doi.org/10.3389/fpsyg.2023.1287667
Shytyk L, Akimova A. Ways of transferring the internal speech of characters: Psycholinguistic projection. Psychol. 2020;27(2):361–384. https://doi.org/10.31470/2309-1797-2020-27-2-361-384
Klochan V, Piliaiev I, Sydorenko T, Khomutenko V, Solomko A, Tkachuk A. Digital platforms as a tool for the transformation of strategic consulting in public administration. JITM. 2021;13:42–61. https://doi.org/10.22059/JITM.2021.80736
Nikolashyna T, Lysak L, Boguslavska L. The use of computerized linguistic models in teaching the Ukrainian language. Bul Sci Edu S Phi Cul Art Pedagog His Archaeol Sociol. 2024;6(24):259–271. https://doi.org/10.52058/2786-6165-2024-6(24)-259-271
Nedashkivska T, Shepa N, Leshchenko A. Prospects for the use of virtual assistants and other intelligent systems in the process of studying philological disciplines. Bull Sci Educ. 2023;11(17):965–977. http://perspectives.pp.ua/index.php/vno/article/download/7842/7886
Nisha PR. Comparing Grammar Translation Method and Communicative Language Teaching in EFL Context. FOSTER-JELT. 2024;5(1):40–48. https://doi.org/10.24256/foster-jelt.v5i1.159
Xia Y, Shin SY, Kim JC. Cross-cultural intelligent language learning system (CILS): Leveraging AI to facilitate language learning strategies in cross-cultural communication. Appl Sci. 2024;14(13). https://doi.org/10.3390/app14135651
Musayeva АК. Important Aspects of Improving Students’ Communicative Competence Today. MP. 2024;55:237–243. http://miastoprzyszlosci.com.pl/index.php/mp/article/view/5595
Zhang R, Zou D, Cheng GA review of chatbot-assisted learning: pedagogical approaches, implementations, factors leading to effectiveness, theories, and future directions. ILE. 2024;32(8):4529–4557. https://doi.org/10.1080/10494820.2023.2202704
Yin J, Goh TT, Hu Y. Interactions with educational chatbots: the impact of induced emotions and students’ learning motivation. Int J Educ Technol High Educ. 2024;21(1). https://link.springer.com/article/10.1186/s41239-024-00480-3
Huang W, Hew KF, Fryer LK. Chatbots for language learning – Are they really useful? A systematic review of chatbot-supported language learning. J Comput Assist Learn. 2022;38(1):237-257. https://doi.org/10.1111/jcal.12610
Shi YS, Tsai CY. Fostering vocabulary learning: mind mapping app enhances performances of EFL learners. CALL. 2024;37(4):634–686. https://doi.org/10.1080/09588221.2022.2052905
Li R. Effects of mobile-assisted language learning on foreign language learners’ speaking skill development. LLT. 2024;28(1):1–26. https://hdl.handle.net/10125/73553
Lord T, Lee HS, Horwitz P, Pryputniewicz S, Pallant A. A Remote view into the classroom: analyzing teacher use of digitally enhanced educative curriculum materials in support of student learning. J Sci Teach Educ. 2024;35(2):127–152. https://doi.org/10.1080/1046560X.2023.2204591
Lin CY. ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of the ACL-04 Workshop: Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics; 2004. p. 74–81. Available from: https://aclanthology.org/W04-1013/
Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics; 2002. p. 311–8. Available from: https://aclanthology.org/P02-1040/
Hossain MZ, Goyal S. Advancements in Natural Language Processing: Leveraging Transformer Models for Multilingual Text Generation. PJAEI. 2024;1(1):4–12.
Gardner RC. The Attitude/Motivation Test Battery: Technical Report. London: University of Western Ontario; 1985. https://publish.uwo.ca/~gardner/docs/AMTBmanual.pdf
Ng DTK, Su J, Leung JKL, Chu SKW. Artificial intelligence (AI) literacy education in secondary schools: a review. ILE. 2024;32(10):6204–6224. https://doi.org/10.1080/10494820.2023.2255228
Bettayeb AM, Abu Talib M, Sobhe Altayasinah AZ, Dakalbab F. Exploring the impact of ChatGPT: conversational AI in education. Front Educ. 2024;9. https://doi.org/10.3389/feduc.2024.1379796
Selwyn N. On the limits of artificial intelligence (AI) in education. Nord tidsskr pedagog krit. 2024;10(1):3–14. https://doi.org/10.23865/ntpk.v10.6062
Williamson B. The social life of AI in education. International IJAIED. 2024;34(1):97–104. https://doi.org/10.1007/s40593-023-00342-5
Kohnke L, Moorhouse BL, Zou D. ChatGPT for language teaching and learning. RELC Journal, 2023;54(2):537–550. https://doi.org/10.1177/00336882231162868
Huang W, Hew KF, Fryer LK. Chatbots for language learning—Are they really useful? A systematic review of chatbot–supported language learning. Journal of Computer Assisted Learning, 2022;38(1):237–257. https://doi.org/10.1111/jcal.12610
Sai AB, Mohankumar AK, Khapra MM. A survey of evaluation metrics used for NLG systems. ACM Computing Surveys (CSUR), 2022;55(2):1–39. https://doi.org/10.1145/3485766
Schiff D. Education for AI, not AI for education: The role of education and ethics in national AI policy strategies. Int J Art Intel Educ. 2022;32(3):527–563. http://doi.org/10.1007/s40593-021-00270-2
Sari N. The role of artificial intelligence (AI) in developing English language learner’s communication skills. J Educ. 2023;6(01):750–757. http://jonedu.org/index.php/joe
Baidoo-Anu D, Owusu Ansah L. Education in the Era of Generative Artificial Intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. J Artif Intell. 2023;7(1):52–62. https://doi.org/10.2139/ssrn.4337484.
Chan CKY. A comprehensive AI policy education framework for university teaching and learning. Int J Educ Technol High Educ. 2023;20(1). https://doi.org/10.1186/s41239-023-00408-3
Edmett A, Ichaporia N., Crompton, H., Crichton, R. Artificial intelligence and English language teaching: Preparing for the future. London: British Council; 2023. https://doi.org/10.57884/78EA-3C69
Kayali B, Yavuz M, Balat, Şener Çalişan M. Investigation of student experiences with ChatGPT-supported online learning applications in higher education. Australas J Educ Technol. 2023;39(5):20–39. https://doi.org/10.14742/ajet.8915.