A collection of machine learning algorithms: from Bayes to deep learning and their respective advantages and disadvantages(part one)
In the applications that we use in our daily lives, such as recommendation systems, smart image beautification applications and chatbots, a variety of machine learning and data processing algorithms are doing their part. In this article, we filter and briefly introduce some of the most common algorithm categories, and for each category, we list some actual algorithms and briefly introduce their advantages and disadvantages.
It is an extension of another method (usually a regression method) that penalizes models based on their complexity, preferring models that are relatively simple and can be better generalized.
• Its penalty will reduce overfitting • There will always be a solution
Disadvantages: • Penalties will cause underfitting • Hard to calibrate
Ensemble algorithms
The Ensemble Algorithms is an integration model, in which multiple weaker models are integrated into a model set, where the models can be trained individually and their predictions can be combined in some way to make an overall prediction.
The main problem of the algorithm is to find out which of the weaker models can be combined and the way of the combination. This is a very powerful set of techniques and is enjoy great popularity.
• Boosting
• Bootstrapped Aggregation(Bagging)
• AdaBoost
• Stacked Generalization(blending)
• Gradient Boosting Machines,GBM
• Gradient Boosted Regression Trees,GBRT
• Random Forest
Advantages: • Almost all of the most advanced predictions available today use algorithm integration. It is much more accurate than predictions with individual model
Disadvantages: • Requires a lot of maintenance work
Decision Tree Algorithm
Decision trees learn and use a decision tree as a predictive model that maps observations on an item (characterized on branches) into conclusions about the target value of that item (characterized on leaves).
The target in a tree model is variable and can take a finite set of values, and is called a classification tree; in these tree structures, the leaves represent class labels and the branches represent features that characterize the connections of these class labels. Examples: • Classification and Regression Tree,CART • Iterative Dichotomiser 3(ID3)
• C4.5 and C5.0 (two different versions of a powerful method)
Advantages:
• Easy to interpret
• Non-parametric type
Disadvantages:
• Tends to be overfitting
• May or may not be caught in local minimums
• No online learning
Regression Algorithms
Regression is a statistical process used to estimate the relationship between two variables. The algorithm provides many techniques for modeling and analyzing multiple variables when it is used to analyze the relationship between the dependent variable and one or more independent variables. To be more specific, regression analysis can help us understand when any of the independent variables change and the other independent variable remains constant, the typical value of the change in the dependent variable. Most commonly, regression analysis can be used to evaluate the conditional expectation of the dependent variable given the conditions of the independent variable.
Regression algorithms are the main algorithms in statistics, which have been incorporated into statistical machine learning.
Artificial neural network is an algorithm model inspired by biological neural networks.
It is a pattern matching, often used for regression and classification problems, but has a huge subdomain consisting of hundreds of algorithms and variants for various types of problems.
Example:
• Perceptron
• Counterpropagation
• Hopfield networks
• Radial Basis Function Network (RBFN)
Advantages:
• Performs extremely well in tasks for voice, semantics, vision, and various games (e.g., Go-chess).
• Algorithms can be quickly adapted to new problems.
Disadvantages:
Requires large amounts of data for training
Training requires high hardware configuration
The model is in a "black box state" and it is difficult to understand the internal mechanism
Metaparameter and topology of networks are difficult to choose.