Application of the TextRank Algorithm for Indonesian News Text Summarization Using Word2Vec and LSA

Prana Wijaya Pratama Nandana; Muhammad Faisal

doi:10.18860/icgt.v15i1.3832

PDF

Published Mar 11, 2026

DOI: https://doi.org/10.18860/icgt.v15i1.3832

Prana Wijaya Pratama Nandana

Departement of Informatics Engineering, UIN Maulana Malik Ibrahim Malang

Muhammad Faisal

Departement of Informatics Engineering, UIN Maulana Malik Ibrahim Malang

Abstract

In the digital era, the increasing volume of online news makes it difficult for readers to identify relevant and trustworthy information. This study proposes an extractive summarization system for Indonesian news articles by combining the TextRank algorithm with Word2Vec and Latent Semantic Analysis (LSA). Unlike previous methods such as LexRank, YAKE, and Genetic Algorithm (GA), which have shown limited F1-scores around 0.453, this study introduces a novel integration of semantic and graph-based techniques to improve summary relevance and coherence. Experiments were conducted using the IndoSum dataset comprising 5,000 news articles. The system was evaluated using ROUGE metrics and manual validation. The best performance was achieved at a 30% compression level, yielding ROUGE-1 of 0.4808, ROUGE-2 of 0.3433, and ROUGE-L of 0.4675. Manual evaluation also confirmed that the generated summaries were more informative and readable compared to those from other compression levels. These results indicate that the proposed approach contributes to the advancement of automatic summarization techniques in the field of natural language processing, particularly for the Indonesian language.

How to Cite

NANDANA, Prana Wijaya Pratama; FAISAL, Muhammad. Application of the TextRank Algorithm for Indonesian News Text Summarization Using Word2Vec and LSA. Proceedings of the International Conference on Green Technology, [S.l.], v. 15, n. 1, mar. 2026. ISSN 2580-7099. Available at: <https://conferences.uin-malang.ac.id/index.php/ICGT/article/view/3832>. Date accessed: 27 july 2026. doi: https://doi.org/10.18860/icgt.v15i1.3832.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 15 No 1 (2025): Proceeding of 15th International Conference on Green Technology

Section

Technology Information

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

References

[1] N. D. Sukmono, “Clickbait Judul Berita Online dalam Pemberitaan Covid-19,” Transformatika: Jurnal Bahasa, Sastra, dan Pengajarannya, vol. 5, pp. 1–13, Mar. 2021, doi: 10.31002/transformatika.v%vi%i.3643.
[2] N. Newman, R. Fletcher, K. Eddy, C. T. Robertson, and R. Kleis Nielsen, “Reuters Institute Digital News Report 2023,” 2023, doi: 10.60625/risj-p6es-hb13.
[3] P. Majerczak and A. Strzelecki, “Trust, Media Credibility, Social Ties, and the Intention to Share Information Verification in an Age of Fake News,” Behavioral Sciences, vol. 12, no. 2, Feb. 2022, doi: 10.3390/bs12020051.
[4] S. Thange, J. Dange, V. Karjule, and J. Sase, “A Survey on Text Summarization Techniques,” International Journal of Scientific and Research Publications, vol. 13, no. 11, pp. 528–535, Nov. 2023, doi: 10.29322/ijsrp.13.11.2023.p14355.
[5] M. Xu, H. A. Rahman, and F. Li, “Text Summarization: A Bibliometric Study and Systematic Literature Review,” Ingenierie des Systemes d’Information, vol. 29, no. 5, pp. 2077–2089, Oct. 2024, doi: 10.18280/isi.290538.
[6] J. Wijaya and A. S. Girsang, “Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm,” Statistics, Optimization and Information Computing, vol. 12, no. 6, pp. 1973–1983, Nov. 2024, doi: 10.19139/soic-2310-5070-1976.
[7] A. Rahmadhany, A. Aldila Safitri, and I. Irwansyah, “Fenomena Penyebaran Hoax dan Hate Speech pada Media Sosial,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 3, no. 1, pp. 30–43, Jan. 2021, doi: 10.47233/jteksis.v3i1.182.
[8] K. Kurniawan and S. Louvan, “IndoSum: A New Benchmark Dataset for Indonesian Text Summarization,” in Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, Institute of Electrical and Electronics Engineers Inc., Jul. 2018, pp. 215–220. doi: 10.1109/IALP.2018.8629109.
[9] N. Basri and E. Utami, “Application of Word2Vec and LSTM Models in Sentiment Analysis of Mobile Legends User Reviews,” SISTEMASI, vol. 14, p. 856, May 2025, doi: 10.32520/stmsi.v14i2.5074.
[10] P. L. Rodriguez and A. Spirling, “Word Embeddings What works, what doesn’t, and how to tell the difference for applied research.”
[11] D. F. Surianto, R. A. P. Kadir, F. Syafaat, M. M. Fakhri, and D. M. Rifqie, “Implementasi Metode Latent Semantic Analysis Pada Peringkasan Artikel Bahasa Indonesia Menggunakan Pendekatan Steinberger Jezek,” JURIKOM (Jurnal Riset Komputer), vol. 9, no. 4, p. 894, Aug. 2022, doi: 10.30865/jurikom.v9i4.4620.
[12] D. Kurniadi, S. Farisa, C. Haviana, and A. Novianto, “Implementasi Algoritma Cosine Similarity pada sistem arsip dokumen di Universitas Islam Sultan Agung,” TRANSFORMTIKA, vol. 17, no. 2, pp. 124–132, 2020, doi: 10.26623/transformatika.v17i2.1613.
[13] F. A. Nugroho, F. Septian, D. A. Pungkastyo, and J. Riyanto, “Penerapan Algoritma Cosine Similarity untuk Deteksi Kesamaan Konten pada Sistem Informasi Penelitian dan Pengabdian Kepada Masyarakat,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, p. 529, Dec. 2021, doi: 10.32493/informatika.v5i4.7126.
[14] K. Jane C. Patosa et al., “Enhancement of TextRank Algorithm using Coreference Resolution,” International Journal of Research Publications, vol. 101, no. 1, May 2022, doi: 10.47119/ijrp1001011520223190.
[15] A. Kazemi, V. Pérez-Rosas, and R. Mihalcea, Biased TextRank: Unsupervised Graph-Based Content Extraction. 2020. doi: 10.18653/v1/2020.coling-main.144.
[16] M. Barbella and G. Tortora, “Rouge Metric Evaluation for Text Summarization Techniques,” SSRN Electronic Journal, May 2022, doi: 10.2139/ssrn.4120317.
[17] Halimah, Surya Agustian, and Siti Ramadhani, “Peringkasan teks otomatis (automated text summarization) pada artikel berbahasa indonesia menggunakan algoritma lexrank,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, pp. 371–381, Dec. 2022, doi: 10.37859/coscitech.v3i3.4300.
[18] A. Wahab, “Sampling dalam Penelitian Kesehatan,” Jurnal Pendidikan dan Teknologi Kesehatan, vol. 4, pp. 38–45, May 2021, doi: 10.56467/jptk.v4i1.23.
[19] N. I. Majdina, B. Pratikno, and A. Tripena, “PENENTUAN UKURAN SAMPEL MENGGUNAKAN RUMUS,” Jurnal Ilmiah Matematika dan Pendidikan Matematika (JMP), vol. 16, pp. 73–84, Jun. 2024, doi: 10.20884/1.jmp.2024.16.1.11230.
[20] J. Jia, L. Miratrix, B. Gawalt, B. Yu, and L. El Ghaoui, “Summarizing large-scale, multiple-document news data: sparse methods and human validation,” Dec. 2013. [Online]. Available: http://statnews.org

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)