Distributed Graph Processing with Depth-First Search in Apache Hadoop
DOI:
https://doi.org/10.65000/jfkhmg23Keywords:
Apache Hadoop Depth-First Search, Big Data, Graph Processing, Scalability.Abstract
The rapid expansion of large-scale datasets has highlighted the importance of scalable graph processing methods within distributed computing environments. Apache Hadoop, through its integration of the Hadoop Distributed File System (HDFS) and MapReduce, provides a foundation for handling such challenges. This study explores the incorporation of Depth-First Search (DFS) into Hadoop for efficient big data graph processing. The work outlines the design of Hadoop-compatible graph structures and a MapReduce-based DFS framework optimized for large-scale traversal. Advanced implementations, including iterative, randomized, and parallel DFS, are evaluated for their impact on execution efficiency, resource allocation, and scalability. The proposed integration enables applications in web graph analysis, computational biology, and social network exploration, while also providing a generalized foundation for adapting other graph algorithms within Hadoop. Quantitative evaluations demonstrate DFS’s ability to process large adjacency matrices, efficiently traverse graphs of up to 56–101 vertices, and highlight performance trade-offs in terms of execution time, memory handling, and scalability compared with sequential DFS, confirming the benefits of distributed parallelization in Hadoop-based environments.