Abstract
Mazkur tadqiqotda Telegram kanallaridagi matnlarni mavzuga ko‘ra avtomatik guruhlash jarayonida ikki xil vektorlashtirish yondashuvi — TF-IDF va Sentence-BERT —ning samaradorligi solishtirildi. Dastlab kanal xabarlari tozalanib, standart shaklga keltirildi. TF-IDF statistik xususiyatlarga asoslangan yuqori o‘lchamli vektorlarni yaratdi, Sentence-BERT esa qisqa Telegram xabarlarining semantik mazmunini chuqur aks ettiruvchi kontekstual embeddinglar hosil qildi. Har ikki yondashuvda K-Means algoritmi qo‘llanib, natijalar siluet ko‘rsatkichi, Davies–Bouldin indeksi va qo‘lda semantik tahlil orqali baholandi.
Tadqiqot natijalariga ko‘ra, semantik jihatdan izchil va mavzuviy jihatdan bir xil klasterlar shakllantirishda Sentence-BERT TF-IDFga nisbatan ancha ustun ekanligi isbotlandi.
References
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3982–3992.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), 6000–6010.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Jurafsky, D., & Martin, J. H. (2023). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed.). Pearson.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171–4186.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., et al. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of ACL, 8440–8451.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281–297.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. Proceedings of the NLP4SocialMedia Workshop, 1–10.
Abdullayev, S., Mirzakhalilov, M., & Yusupov, M. (2023). BERTbek: A pretrained language model for Uzbek. arXiv preprint arXiv:2306.00602.
Qo’yliyeva, F. A., & Babomurodov, O. J. (2026). Detection of Risk Levels in Social Network Messages Using K-Means Clustering and Threshold-Based Analysis. Digital Transformation and Artificial Intelligence, 4(1), 45–53.
Qo’yliyeva, F. A. (2025). O‘zbek tilidagi toksik xabarlar uchun maxsus mini korpus yaratish va uning asosida klassifikatsiya modeli qurish. Geoaxborot texnologiyalarini takomillashtirish masalalari: innovatsiyalar, barqaror rivojlanish va raqamli transformatsiya, 112–118.