Evaluación del Rendimiento y la Escalabilidad de un Clúster Apache Spark y Hadoop en un Entorno de Bajo Costo

Natalie Cruz Tumba; Alex Lancho Ramos; Henry Leon Hurtado; Rafael Ricardo Quispe Merma; Evelyn Naida Luque Ochoa

doi:10.57166/riqchary.v7.n2.2025.6

PDF

Publicado: Aug 19, 2025

DOI: https://doi.org/10.57166/riqchary.v7.n2.2025.6

Palabras clave:

apache spark, procesamiento de datos, hadoop, clúster distribuido, computación distribuida, clúster de bajo costo

Natalie Cruz Tumba

Universidad Nacional Micaela Bastidas de Apurímac

https://orcid.org/0009-0006-3110-2087

Alex Lancho Ramos

Universidad Nacional Micaela Bastidas de Apurímac

https://orcid.org/0009-0008-5493-397X

Henry Leon Hurtado

Universidad Nacional Micaela Bastidas de Apurímac

https://orcid.org/0009-0001-8216-1232

Rafael Ricardo Quispe Merma

Universidad Nacional Micaela Bastidas de Apurímac

https://orcid.org/0000-0002-8980-4560

Evelyn Naida Luque Ochoa

https://orcid.org/0000-0002-8386-9806

Resumen

Este artículo presenta el diseño, configuración e implementación de un clúster de cómputo distribuido utilizando Apache Spark y Hadoop sobre Ubuntu Server 24.04.1 LTS. La arquitectura consta de un nodo maestro y múltiples nodos esclavos conectados en red local mediante Ethernet. Se detalla el proceso de instalación, configuración y pruebas de rendimiento con PySpark. Los resultados demuestran que, si bien una configuración local es más eficiente para datasets pequeños (<100 MB), el clúster distribuido ofrece mejoras significativas para volúmenes de datos superiores a 1 GB, validando su escalabilidad y viabilidad para entornos educativos y de investigación con recursos limitados.

Cómo citar

Evaluación del Rendimiento y la Escalabilidad de un Clúster Apache Spark y Hadoop en un Entorno de Bajo Costo. (2025). C&T Riqchary Revista De investigación En Ciencia Y tecnología, 7(2), 49-53. https://doi.org/10.57166/riqchary.v7.n2.2025.6

Número

Vol. 7 Núm. 2 (2025): COINCITEC 2025

Sección

Artículos

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.

Cuando un autor crea un artículo y lo publica en una revista, los derechos de autor pasan a la revista como parte del acuerdo de publicación. Por lo tanto, la revista se convierte en la dueña de los derechos de reproducción, distribución y venta del artículo. El autor conserva algunos derechos, como el derecho a ser reconocido como el creador del artículo y el derecho a utilizarlo para sus propios fines académicos o de investigación, a menos que se acuerde lo contrario en el contrato de publicación.

Cómo citar

Evaluación del Rendimiento y la Escalabilidad de un Clúster Apache Spark y Hadoop en un Entorno de Bajo Costo. (2025). C&T Riqchary Revista De investigación En Ciencia Y tecnología, 7(2), 49-53. https://doi.org/10.57166/riqchary.v7.n2.2025.6

Descargar cita

Referencias

J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communica-tions of the ACM, vol. 51, no. 1, pp. 107-113, Jan. 2008.

M. Zaharia et al., "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," NSDI '12: Proceedings of the 9th USE-NIX Conference on Networked Systems Design and Im-plementation, pp. 15-28, Apr. 2012.

M. Zaharia et al., "Apache Spark: A Unified Engine for Big Data Processing," Communications of the ACM, vol. 59, no. 11, pp. 56-65, Nov. 2016.

K. Shvachko et al., "The Hadoop Distributed File System," IEEE 26th Symposium on Mass Storage Sys-tems and Technologies, pp. 1-10, May 2010.

M. Zaharia et al., "Spark: Cluster Computing with Working Sets," HotCloud '10: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10-10, Jun. 2010.

S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google File System," ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29-43, Oct. 2003.

R. Lämmel, "Google's MapReduce Programming Model — Revisited," Science of Computer Program-ming, vol. 70, no. 1, pp. 1-30, Jan. 2008.

J. Ekanayake et al., "Twister: A Runtime for Iterative MapReduce," HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distribut-ed Computing, pp. 810-818, Jun. 2010.

T. White, Hadoop: The Definitive Guide, 4th ed. Sebas-topol, CA: O'Reilly Media, 2015.

C. Ranger et al., "Evaluating MapReduce for Multi-core and Multiprocessor Systems," HPCA '07: Pro-ceedings of the 13th International Symposium on High Performance Computer Architecture, pp. 13-24, Feb. 2007.

V. K. Vavilapalli et al., "Apache Hadoop YARN: Yet Another Resource Negotiator," SoCC '13: Proceedings of the 4th Annual Symposium on Cloud Computing, pp. 1-16, Oct. 2013.

Ubuntu Documentation Team, "Ubuntu Server Guide," Canonical Ltd., 2024. [Online]. Available: https://ubuntu.com/server/docs

S. Ryza et al., Advanced Analytics with Spark: Patterns for Learning from Data at Scale, 1st ed. Sebastopol, CA: O'Reilly Media, 2015.

A. S. Tanenbaum and D. J. Wetherall, Computer Net-works, 5th ed. Boston, MA: Pearson, 2011.

P. Boncz et al., "Breaking the Memory Wall in MonetDB," Communications of the ACM, vol. 51, no. 12, pp. 77-85, Dec. 2008.

Apache Software Foundation, "Apache Hadoop Doc-umentation," 2024. [Online]. Available: https://hadoop.apache.org/docs/stable/

T. Condie et al., "MapReduce Online," NSDI '10: Pro-ceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pp. 21-21, Apr. 2010.

Apache Software Foundation, "Apache Spark Docu-mentation," 2024. [Online]. Available: https://spark.apache.org/docs/latest/

M. Isard et al., "Dryad: Distributed Data-parallel Programs from Sequential Building Blocks," EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys Eu-ropean Conference on Computer Systems, pp. 59-72, Mar. 2007.

H. Karau et al., Learning Spark: Lightning-Fast Big Da-ta Analysis, 1st ed. Sebastopol, CA: O'Reilly Media, 2015.

Barra lateral del artículo

Contenido principal del artículo

Resumen

Detalles del artículo

Cómo citar

Referencias

Artículos más leídos del mismo autor/a