Application of the TextRank Algorithm for Indonesian News Text Summarization Using Word2Vec and LSA

Main Article Content

Prana Wijaya Pratama Nandana Muhammad Faisal

Abstract

In the digital era, the increasing volume of online news makes it difficult for readers to identify relevant and trustworthy information. This study proposes an extractive summarization system for Indonesian news articles by combining the TextRank algorithm with Word2Vec and Latent Semantic Analysis (LSA). Unlike previous methods such as LexRank, YAKE, and Genetic Algorithm (GA), which have shown limited F1-scores around 0.453, this study introduces a novel integration of semantic and graph-based techniques to improve summary relevance and coherence. Experiments were conducted using the IndoSum dataset comprising 5,000 news articles. The system was evaluated using ROUGE metrics and manual validation. The best performance was achieved at a 30% compression level, yielding ROUGE-1 of 0.4808, ROUGE-2 of 0.3433, and ROUGE-L of 0.4675. Manual evaluation also confirmed that the generated summaries were more informative and readable compared to those from other compression levels. These results indicate that the proposed approach contributes to the advancement of automatic summarization techniques in the field of natural language processing, particularly for the Indonesian language.

Article Details

How to Cite
NANDANA, Prana Wijaya Pratama; FAISAL, Muhammad. Application of the TextRank Algorithm for Indonesian News Text Summarization Using Word2Vec and LSA. Proceedings of the International Conference on Green Technology, [S.l.], v. 15, n. 1, mar. 2026. ISSN 2580-7099. Available at: <https://conferences.uin-malang.ac.id/index.php/ICGT/article/view/3832>. Date accessed: 04 may 2026. doi: https://doi.org/10.18860/icgt.v15i1.3832.
Section
Technology Information

References

[1] N. D. Sukmono, “Clickbait Judul Berita Online dalam Pemberitaan Covid-19,” Transformatika: Jurnal Bahasa, Sastra, dan Pengajarannya, vol. 5, pp. 1–13, Mar. 2021, doi: 10.31002/transformatika.v%vi%i.3643.
[2] N. Newman, R. Fletcher, K. Eddy, C. T. Robertson, and R. Kleis Nielsen, “Reuters Institute Digital News Report 2023,” 2023, doi: 10.60625/risj-p6es-hb13.
[3] P. Majerczak and A. Strzelecki, “Trust, Media Credibility, Social Ties, and the Intention to Share Information Verification in an Age of Fake News,” Behavioral Sciences, vol. 12, no. 2, Feb. 2022, doi: 10.3390/bs12020051.
[4] S. Thange, J. Dange, V. Karjule, and J. Sase, “A Survey on Text Summarization Techniques,” International Journal of Scientific and Research Publications, vol. 13, no. 11, pp. 528–535, Nov. 2023, doi: 10.29322/ijsrp.13.11.2023.p14355.
[5] M. Xu, H. A. Rahman, and F. Li, “Text Summarization: A Bibliometric Study and Systematic Literature Review,” Ingenierie des Systemes d’Information, vol. 29, no. 5, pp. 2077–2089, Oct. 2024, doi: 10.18280/isi.290538.
[6] J. Wijaya and A. S. Girsang, “Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm,” Statistics, Optimization and Information Computing, vol. 12, no. 6, pp. 1973–1983, Nov. 2024, doi: 10.19139/soic-2310-5070-1976.
[7] A. Rahmadhany, A. Aldila Safitri, and I. Irwansyah, “Fenomena Penyebaran Hoax dan Hate Speech pada Media Sosial,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 3, no. 1, pp. 30–43, Jan. 2021, doi: 10.47233/jteksis.v3i1.182.
[8] K. Kurniawan and S. Louvan, “IndoSum: A New Benchmark Dataset for Indonesian Text Summarization,” in Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, Institute of Electrical and Electronics Engineers Inc., Jul. 2018, pp. 215–220. doi: 10.1109/IALP.2018.8629109.
[9] N. Basri and E. Utami, “Application of Word2Vec and LSTM Models in Sentiment Analysis of Mobile Legends User Reviews,” SISTEMASI, vol. 14, p. 856, May 2025, doi: 10.32520/stmsi.v14i2.5074.
[10] P. L. Rodriguez and A. Spirling, “Word Embeddings What works, what doesn’t, and how to tell the difference for applied research.”
[11] D. F. Surianto, R. A. P. Kadir, F. Syafaat, M. M. Fakhri, and D. M. Rifqie, “Implementasi Metode Latent Semantic Analysis Pada Peringkasan Artikel Bahasa Indonesia Menggunakan Pendekatan Steinberger Jezek,” JURIKOM (Jurnal Riset Komputer), vol. 9, no. 4, p. 894, Aug. 2022, doi: 10.30865/jurikom.v9i4.4620.
[12] D. Kurniadi, S. Farisa, C. Haviana, and A. Novianto, “Implementasi Algoritma Cosine Similarity pada sistem arsip dokumen di Universitas Islam Sultan Agung,” TRANSFORMTIKA, vol. 17, no. 2, pp. 124–132, 2020, doi: 10.26623/transformatika.v17i2.1613.
[13] F. A. Nugroho, F. Septian, D. A. Pungkastyo, and J. Riyanto, “Penerapan Algoritma Cosine Similarity untuk Deteksi Kesamaan Konten pada Sistem Informasi Penelitian dan Pengabdian Kepada Masyarakat,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, p. 529, Dec. 2021, doi: 10.32493/informatika.v5i4.7126.
[14] K. Jane C. Patosa et al., “Enhancement of TextRank Algorithm using Coreference Resolution,” International Journal of Research Publications, vol. 101, no. 1, May 2022, doi: 10.47119/ijrp1001011520223190.
[15] A. Kazemi, V. Pérez-Rosas, and R. Mihalcea, Biased TextRank: Unsupervised Graph-Based Content Extraction. 2020. doi: 10.18653/v1/2020.coling-main.144.
[16] M. Barbella and G. Tortora, “Rouge Metric Evaluation for Text Summarization Techniques,” SSRN Electronic Journal, May 2022, doi: 10.2139/ssrn.4120317.
[17] Halimah, Surya Agustian, and Siti Ramadhani, “Peringkasan teks otomatis (automated text summarization) pada artikel berbahasa indonesia menggunakan algoritma lexrank,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, pp. 371–381, Dec. 2022, doi: 10.37859/coscitech.v3i3.4300.
[18] A. Wahab, “Sampling dalam Penelitian Kesehatan,” Jurnal Pendidikan dan Teknologi Kesehatan, vol. 4, pp. 38–45, May 2021, doi: 10.56467/jptk.v4i1.23.
[19] N. I. Majdina, B. Pratikno, and A. Tripena, “PENENTUAN UKURAN SAMPEL MENGGUNAKAN RUMUS,” Jurnal Ilmiah Matematika dan Pendidikan Matematika (JMP), vol. 16, pp. 73–84, Jun. 2024, doi: 10.20884/1.jmp.2024.16.1.11230.
[20] J. Jia, L. Miratrix, B. Gawalt, B. Yu, and L. El Ghaoui, “Summarizing large-scale, multiple-document news data: sparse methods and human validation,” Dec. 2013. [Online]. Available: http://statnews.org