Just to start conceptualizing my project, I’d like to start documenting some of the ideas I have had. I am currently also taking a Machine Learning course, and I have been thinking of ways to intertwine the two courses.
Distributed Machine Learning is quickly becoming a popular topic. I have come to find myself interested in both distributed computing and machine learning. So, how can I bring the two together? There are other distributed machine learning technologies like MLBase.
My overall goal is two experiment using Apache Ignite (see other posts for an introduction), and some sort of Machine Learning technology for performing decisions trees/random forests for classification. Classification is the process of using probability to determine what “class” an item in a dataset is. This is commonly done using decision trees/random forests.
A large dataset can quickly cause a decision tree to become very large. Many people have found that random forests work more effectively for large datasets. A typical random forest breaks up the dataset to be split into multiple decision trees. The decision trees are computed and finally the decision tree with the most predictability is shown.