Comparison of Compact Generative Models for Automatic Question Answering in Spanish via Retrieval-Augmented Generation

Main Article Content

Jean Phol A. Curi Garrafa
Victor R. Ortega Marocho
Wilson Mamani Rodrigo

Abstract

This study compares five compact generative models (≤ 8 billion parameters) for Spanish question answering under a retrieval-augmented generation (RAG) pipeline executed locally. We assess response quality using F1, BLEU-4, and an external semantic judge (LLM-Judge), alongside efficiency indicators (P95 latency, memory, GPU/CPU). Results show Mistral 7B achieves the highest average F1 and semantic scores, whereas OpenHermes 7B attains nearly identical accuracy with the lowest memory footprint. Zephyr 7B-β performs well on very long documents, and Phi-3 Mini minimizes tail latency under adverse conditions. A Pareto analysis of F1–RAM identifies Mistral 7B and OpenHermes 7B as non-dominated solutions, yielding practical guidelines depending on operational goals (maximum accuracy vs. resource efficiency). The paper contributes a reproducible Spanish-language comparison under RAG and actionable criteria for local deployments.

Article Details

How to Cite
Comparison of Compact Generative Models for Automatic Question Answering in Spanish via Retrieval-Augmented Generation. (2025). C&T Riqchary Science and Technology Research Magazine, 7(2), 9-18. https://doi.org/10.57166/riqchary.v7.n2.2025.2
Section
Artículos
Author Biographies

Jean Phol A. Curi Garrafa, Micaela Bastidas National University of Apurímac

He is a Computer Science and Systems Engineering student at the Micaela Bastidas National University of Apurímac. His training focuses on the development of information systems and the optimization of academic processes through the use of technological tools. He has participated in academic activities related to the design and implementation of IT solutions for university management.

Victor R. Ortega Marocho, Micaela Bastidas National University of Apurímac

He iis a Computer Science and Systems Engineering student at the Micaela Bastidas National University of Apurímac. His academic interests focus on software engineering, process automation and data analysis applied to education. He has contributed to research projects focused on improving academic management systems.

Wilson Mamani Rodrigo, Micaela Bastidas National University of Apurímac

He is a Systems Engineer and Civil Engineer, with a Master's degree in Systems Engineering and a PhD in Environmental Civil Engineering Sciences. He is a teaching assistant at the Micaela Bastidas National University of Apurímac and has taught at the National University of the Altiplano. He has extensive experience in the preparation of technical files, feasibility studies, and civil infrastructure projects, in addition to serving as a consultant and researcher in civil engineering projects.is a Systems Engineer and Civil Engineer, with a Master's degree in Systems Engineering and a PhD in Environmental Civil Engineering Sciences. He is a teaching assistant at the Micaela Bastidas National University of Apurimac and has taught at the National University of the Altiplano. He has extensive experience in the preparation of technical files, feasibility studies, and civil infrastructure projects, in addition to serving as a consultant and researcher in civil engineering projects.

How to Cite

Comparison of Compact Generative Models for Automatic Question Answering in Spanish via Retrieval-Augmented Generation. (2025). C&T Riqchary Science and Technology Research Magazine, 7(2), 9-18. https://doi.org/10.57166/riqchary.v7.n2.2025.2

References

J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez, “Spanish Pre-trained BERT Model and Evaluation Data,” Aug. 2023, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2308.02976

J. Cañete, S. Donoso, F. Bravo-Marquez, A. Carval-lo, and V. Araujo, “ALBETO and DistilBETO: Lightweight Spanish Language Models,” 2022 Lan-guage Resources and Evaluation Conference, LREC 2022, pp. 4291–4298, Apr. 2022, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2204.09145

A. Gutiérrez-Fandiño et al., “MarIA: Spanish Lan-guage Models,” Procesamiento del Lenguaje Natural, vol. 68, pp. 39–60, Apr. 2022, doi: 10.26342/2022-68-3.

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Adv Neural Inf Process Syst, vol. 2020-December, May 2020, Ac-cessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2005.11401

K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. W. Chang, “REALM: Retrieval-Augmented Language Model Pre-Training,” 37th International Conference on Machine Learning, ICML 2020, vol. PartF168147-6, pp. 3887–3896, Feb. 2020, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2002.08909

P. Lewis, B. Oguz, R. Rinott, S. Riedel, and H. Schwenk, “MLQA: Evaluating Cross-lingual Extrac-tive Question Answering,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 7315–7330, Oct. 2019, doi: 10.18653/v1/2020.acl-main.653.

A. Grattafiori et al., “The Llama 3 Herd of Models,” Jul. 2024, Accessed: Aug. 11, 2025. [Online]. Availa-ble: https://arxiv.org/pdf/2407.21783

A. Q. Jiang et al., “Mistral 7B,” Oct. 2023, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2310.06825

“HuggingFaceH4/zephyr-7b-beta · Hugging Face.” Accessed: Aug. 11, 2025. [Online]. Available: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

M. Abdin et al., “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,” Apr. 2024, Accessed: Aug. 11, 2025. [Online]. Avail-able: https://arxiv.org/pdf/2404.14219

“teknium/OpenHermes-7B · Hugging Face.” Ac-cessed: Aug. 11, 2025. [Online]. Available: https://huggingface.co/teknium/OpenHermes-7B

T. Dettmers, M. Lewis, Y. Belkada, and L. Zettle-moyer, “LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale,” Adv Neural Inf Process Syst, vol. 35, Aug. 2022, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2208.07339

J. Johnson, M. Douze, and H. Jegou, “Billion-scale similarity search with GPUs,” IEEE Trans Big Data, vol. 7, no. 3, pp. 535–547, Feb. 2017, doi: 10.1109/TBDATA.2019.2921572.

M. Douze et al., “The Faiss library,” Jan. 2024, Ac-cessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2401.08281

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Morristown, NJ, USA: Association for Computation-al Linguistics, 2001, p. 311. doi: 10.3115/1073083.1073135.

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” 8th International Conference on Learning Representations, ICLR 2020, Apr. 2019, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/1904.09675

C.-Y. Lin, “ROUGE: A Package for Automatic Eval-uation of Summaries,” 2004. Accessed: Aug. 11, 2025. [Online]. Available: https://aclanthology.org/W04-1013/

Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” Proceedings - 2024 Conference on AI, Science, Engineering, and Tech-nology, AIxSET 2024, pp. 166–169, Dec. 2023, doi: 10.1109/AIxSET62544.2024.00030.

H. Yu, A. Gan, K. Zhang, S. Tong, Q. Liu, and Z. Liu, “Evaluation of Retrieval-Augmented Genera-tion: A Survey,” Communications in Computer and In-formation Science, vol. 2301, pp. 102–120, Jul. 2024, doi: 10.1007/978-981-96-1024-2_8.

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers,” 11th Interna-tional Conference on Learning Representations, ICLR 2023, Oct. 2022, Accessed: Aug. 11, 2025. [Online]. Available: https://arxiv.org/pdf/2210.17323