Citation-Enhanced Retrieval-Augmented Generation For Automated Scientific Literature Review: A Novel Multi-Factor Ranking Approach

Ida Bagus Kresna Sudiatmika; Made Adi Paramartha Putra

doi:10.37676/jki.v5i1.1618

Authors

Ida Bagus Kresna Sudiatmika Primakara University
Made Adi Paramartha Putra Primakara University

DOI:

https://doi.org/10.37676/jki.v5i1.1618

Keywords:

Retrieval-Augmented Generation, Peninjauan Literatur Otomatis, Citation-Enhanced, Multi-Factor Ranking, Large Language Model

Abstract

Scientific literature review is a fundamental process in academic research that requires significant time and effort. This study proposes a novel framework that combines Retrieval-Augmented Generation (RAG) with Citation-Enhanced mechanisms and a Multi-Factor Ranking algorithm to automate the scientific literature review process intelligently and accurately. The proposed approach integrates three main components: (1) semantic-based document retrieval module using dense vector embeddings, (2) a citation augmentation system that analyzes citation networks between scientific papers, and (3) a multi-factor ranking algorithm that considers semantic relevance, citation impact, publication recency, and author authority. Experiments were conducted on the S2ORC (Semantic Scholar Open Research Corpus) dataset containing over 200,000 scientific papers across various domains. Evaluation using ROUGE-L, BLEU-4, BERTScore, and Citation F1 metrics demonstrates that the proposed approach yields significant improvements over conventional RAG methods. The proposed system achieves a ROUGE-L score of 0.612 and BERTScore of 0.847, improving by 8.3% and 6.1% respectively compared to standard RAG baseline. The results demonstrate that integrating citation information in the retrieval and text generation process substantially enhances the quality, accuracy, and completeness of automatically generated literature reviews.

References

Agarwal, S., Laradji, I. H., Charlin, L., & Pal, C. (2024). LitLLM: A toolkit for scientific literature review. arXiv preprint, arXiv:2402.01788.

Author, R. P., Author, Y. U., & Author, M. J. (2025). Rancang bangun sistem tanya jawab dengan metode retrieval augmented generation berbasis website. JITET, 13(3S1). https://doi.org/10.23960/jitet.v13i3s1.7614

Borah, R., Brown, A. W., Capers, P. L., & Kaiser, K. A. (2017). Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open, 7(2), e012545. https://doi.org/10.1136/bmjopen-2016-012545

Bornmann, L., Haunschild, R., & Mutz, R. (2021). Growth rates of modern science: A latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanities and Social Sciences Communications, 8, Art. 224. https://doi.org/10.1057/s41599-021-00903-w

Cachola, I., Lo, K., Cohan, A., & Smith, N. A. (2020). TLDR: Extreme summarization of scientific documents. Findings of the Association for Computational Linguistics: EMNLP, 4766–4777.

Chen, M., Chu, Z., Wiseman, S., & Gimpel, K. (2022). SummScreen: A dataset for abstractive screenplay summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 8602–8615.

Formal, T., Lassance, C., Piwowarski, B., & Clinchant, S. (2022). From distillation to hard negative sampling: Making sparse neural IR models more effective. Proceedings of the 45th International ACM SIGIR Conference (SIGIR), 2353–2359. https://doi.org/10.1145/3477495.3531857

Gao, Y., et al. (2024). Retrieval-augmented generation for large language models: A survey. arXiv preprint, arXiv:2312.10997.

Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-augmented language model pre-training. Proceedings of the 37th International Conference on Machine Learning (ICML), PMLR, 119, 3929–3938.

Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 874–880.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), Art. 248. https://doi.org/10.1145/3571730

Karpukhin, V., et al. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769–6781.

Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference (SIGIR), 39–48.

Khattab, O., et al. (2022). Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv preprint, arXiv:2212.14024.

Kinney, R., et al. (2023). The Semantic Scholar Open Data Platform. arXiv preprint, arXiv:2301.10140.

Kitchenham, B. A., Madeyski, L., & Budgen, D. (2023). SEGRESS: Software Engineering Guidelines for REporting Secondary Studies. IEEE Transactions on Software Engineering, 49(3), 1273–1298. https://doi.org/10.1109/TSE.2022.3174092

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682

Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474.

Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. (2020). S2ORC: The Semantic Scholar Open Research Corpus. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 4969–4983.

Moosavi, N. S., Ruckle, A., Roth, D., & Gurevych, I. (2021). Learning to synthesize data for semantic parsing. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3894–3909.

Ni, J., et al. (2022). Large dual encoders are generalizable retrievers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9844–9855.

Page, M. J., et al. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, Art. n71. https://doi.org/10.1136/bmj.n71

Pradeep, R., Nogueira, R., & Lin, J. (2021). The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint, arXiv:2101.05667.

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint, arXiv:2205.01833.

Romary, L., & Lopez, P. (2010). GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. Proceedings of the 14th International Conference on Electronic Publishing (ELPUB), 73–86.

Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., & Chen, W. (2023). Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. Findings of the Association for Computational Linguistics: EMNLP, 9248–9274.

Tang, C., Dong, M., & Wang, J. (2022). Augmenting scientific creativity with retrieval across knowledge domains. arXiv preprint, arXiv:2206.01061.

Thakur, N., Reimers, N., Ruckle, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS) Track Datasets Benchmarks.

Wang, K., Shen, Z., Huang, C., Wu, C.-H., Dong, Y., & Kanakia, A. (2020). Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies, 1(1), 396–413. https://doi.org/10.1162/qss_a_00021

Wang, Y., et al. (2024). AutoSurvey: Large language models can automatically write surveys. Advances in Neural Information Processing Systems (NeurIPS), 37.

Xiao, Y., et al. (2022). PRIMERA: Pyramid-based masked sentence pre-training for multi-document summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 5245–5263.

Zhang, J., et al. (2020). PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. Proceedings of the 37th International Conference on Machine Learning (ICML), PMLR, 119, 11328–11339.

Citation-Enhanced Retrieval-Augmented Generation For Automated Scientific Literature Review: A Novel Multi-Factor Ranking Approach

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Journal Details

People

Policies

Submissions

Keywords