Page Header

An Efficient Parallel Algorithm for Frequent Itemsets Mining Using BitTable on Spark

Manwika Kittipron, Chuleerat Jaruskulchai

Abstract


A variety of techniques have been used to improve the performance of an algorithm in finding frequent item sets, which is one of the important processes to obtain frequent pattern mining. It was found that today’s technology has resulted in an ever-increasing amount of information, which should be analyzed for various benefits. Therefore, efforts have been made to improve the aalgoruthm’s efficiency to accommodate the nature of data stored through the working process of the main internal memory. Efforts have been made to prepare algorithms for the ever-increasing information. This research provided an appropriate data structure of BitTable to help improve the functionality of the algorithms. Moreover, the principle of parallel frequent itemset mining algorithm based on Map-Reduce design was used in this research to assess the performance of algorithms, named as Adaptive Hybrid Parallel Algorithm (AHP). Additionally, to investigate the performance of the AHP Algorithm Using Apache Spark Technology with the type of data that was accumulated during the process of the main internal memory.

Keywords



[1] S. Moens, E. Aksehirli, and B. Goethals, “Frequent itemset mining for big data,” in 2013 IEEE International Conference on Big Data, 2013, pp. 111–118.

[2] D. C. Anastasiu, J. Iverson, S. Smith, and G. Karypis, “Big data frequent pattern mining,” in Frequent Pattern Mining. Switzerland: Springer, 2014, pp. 225–259.

[3] W. Xiao and J. Hu. “Paradigm and performance analysis of distributed frequent itemset mining algorithms based on Mapreduce,” Microprocessors and Microsystems, vol. 82, p. 103817, 2021.

[4] M. Yimin, G. Junhao, D. S. Mwakapesa, Y. A. Nanehkaran, Z. Chi, D. Xiaoheng, and C. Zhigang, “PFIMD: A parallel MapReduce-based algorithm for frequent itemset mining,” Multimedia Systems, vol. 27, pp. 709–722, 2021.

[5] Apache Hadoop, “Open-source software for reliable, scalable, distributed computing,” 2021. [Online]. Available: http://hadoop.apache.org/docs/

[6] S. Raj, D. Ramesh, and K. K. Sethi, “A Spark-based Apriori algorithm with reduced shuffle overhead,” The Journal of Supercomputing, vol. 77, pp. 133– 151, 2021.

[7] Y. Xun, J. Zhang, H. Yang, and X. Qin, “HBPFP-DC: A parallel frequent itemset mining using spark,” Parallel Computing, vol. 101, p. 102738, 2021.

[8] S. Rathee, M. Kaul, and A. Kashyap, “R-Apriori: An efficient apriori based algorithm on spark,” in Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, 2015, pp. 27–34.

[9] H. Qiu, R. Gu, C. Yuan, and Y. Huang, “YAFIM: A parallel frequent itemset mining algorithm with spark,” in 2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops, 2014, Art. no. 13872289.

[10] F. Zhang, M. Liu, F. Giu, W. Shen, A. Shami, and Y. Ma, “A distributed frequent itemset mining algorithm using Spark for big data analytics,” Cluster Computing, vol. 18, no. 4, pp. 1493– 1501, 2015.

[11] T. S. and R. Nagarajan, “Spark based distributed frequent itemset mining technique for big data,” International Journal of Advanced Research in Engineering and Technology, vol. 11, no. 10, pp. 1800–1814, 2020.

[12] J. Abonyi, “A novel bitmap-based algorithm for frequent itemsets mining,” in Computational Intelligence in Engineering. Germany: Springer, 2010, pp. 171–180.

[13] FIMI, “Frequent itemset mining dataset repository,” 2021. [Online]. Available: http://fimi.ua.ac.be/ data

Full Text: PDF

DOI: 10.14416/j.asep.2022.01.005

Refbacks

  • There are currently no refbacks.