الفهرس | Only 14 pages are availabe for public view |
Abstract Traditional machine learning (ML) algorithms model knowledge using static datasets. Nowadays, there is an increasing demand for machine learning based solutions that can handle very huge amounts of data in the shape of continuous streams. The Very Fast Decision Tree (VFDT) is one of the most widely utilized data stream mining algorithms (DSM), despite the fact that it wastes a huge amount of energy on trivial calculations. The machine learning community has come first in terms of accuracy and execution time when designing algorithms of this nature. Energy usage is considered a crucial factor in assessing data mining algorithms through various types of studies. In this thesis, two new techniques are proposed to optimize the VFDT algorithm, which reduces the waste of energy while maintaining accuracy. In the first proposed method, certain fixed algorithm parameters were changed to dynamic parameters after analyzing each one separately and understanding the extent of their positive impact on reducing energy consumption in various cases within the algorithm. The second approach is based on determining the functions that are considered one of the most energy-consuming functions in the algorithm. In the first proposed method, the practical experiment was conducted on both the algorithm in its basic form and the algorithm in the proposed form. Experiment was conducted on several different types of datasets in the same application environment. The main advantage of the results of the proposed method compared to the results of the basic algorithm is that there was a significant improvement in the performance of the algorithm in terms of reducing its energy consumption and maintaining its accuracy levels especially in large datasets which have no noise. In the second approach, experiments were conducted on real-world benchmark and synthetic datasets to compare the proposed method to state-of-the-art algorithms in previous works. The proposed algorithm works considerably better and faster while using less energy and maintaining accuracy especially in the datasets with large number instances and attributes. Keywords:- Big Data; Data stream mining; classification; Very fast decision tree algorithm; Hoeffding bound; Energy consumption; Massive online analysis. |