Week #5 in Machine Learning

4 min readFeb 19, 2022

Improving the accuracy of Machine Learning Models

Model performance is always measured using established evaluation techniques that we have mentioned before. These metrics help to compare or estimate the performance of a model and decide whether it performed better or worse.

A majority of data scientists give up when it come to improving the performance of a model. It is a challenging activity. The ways to improve performance of a model include:

Data

More data results in more accurate and better models. This might not be feasible in competition data but it is possible in the enterprise section where most data scientists will be working. As a result, if you as a data scientist are working on a model, ask for more data to help make the models perform better.
Nonlinear ML approaches like Deep learning continue to improve on performance as more data is added.

Handle missing data and outliers

Missing data and outliers might result in creating bias in the model. It also reduces the accuracy of the predictions made. There are different ways of handling missing values.

Dealing with Missing Values

Part 1 in A Data Cleaning Journey

medium.com

However, the one that you select ought to not impact the data balance as much. It is important that the data is imputed if this is possible. Using such methods as imputing using KNNs helps reduce the chances that the model will be biased.
Cleaner data can help get better performances
Check this article on how to deal with outliers

Outliers and how to Handle Them

Here’s our next post in Data cleaning Journey, dealing with outliers. When we talk about outliers, we aren’t talking…

medium.com

Feature engineering

Feature engineering helps extract more information from the data you have. Feature engineering might help explain some of the attributes in the training data such as the variance. To get better features, using hypothesis generation will help.
Some of the feature engineering approaches include data normalization and standardization help improve performances in algorithms that use weighted inputs or distance measures.
Other features can be derived from existing variables . this approach helps to revel the relationship of a dataset that might be hidden.

Feature selection

Selecting useful features is important when working on data science projects. Feature selection involves finding out the best attributes that best explains the relationship of independent variables with the target variables.
Some of the metrics that will help with selecting useful features entail:

— Domain knowledge

— Visualization

— Statistical parameters such as PCA, p-values, information values.

Multiple algorithms

Some algorithms are best suited for certain type of datasets than are others. This therefore implies that it is important to apply all relevant models and select the one with the best performance.

Parameter tuning

Parameters in an algorithm influence the outcome of the learning process.
Parameter tuning help find the optimal values for each parameter to help with accuracy improvements in the model used.
However, to benefit from parameter tuning, you need to understand the particular parameters and what each means or how each would impact the model performance.
Grid search can help understand what grids of standards hyperparameters exist and how to enumerate to find a better configuration.

Algorithm approach

This approach closely related with the last two approaches. It involves choosing the best evaluation metrics for that algorithm based on metrics that better capture the requirements of the problem and the domain.
Spot check both linear and nonlinear algorithms. Linear algorithms are more biased while nonlinear approaches require more data.
Research on recommended algorithms based on your problem.
Configure your algorithms best. This does not relate to tuning the algorithm it merely suggests that you ensure that the investigate how to configure each algorithm better and therefore give it a better chance to perform well.

Ensemble methods

This approach harnesses the results of various weak models and produces better results by combining the models.
It has two approaches including: Bagging & Boosting
Ensemble methods are more complex than traditional methods thereby making them a better approach in accuracy improvements.

Cross validation

This approach helps understand why the model is not performing as expected perhaps due to overfitting.

Sources

How To Increase Accuracy Of Machine Learning Model

Enhancing a model performance can be challenging at times. I'm sure, a lot of you would agree with me if you've found…

www.analyticsvidhya.com

Machine Learning Performance Improvement Cheat Sheet - Machine Learning Mastery

32 Tips, Tricks and Hacks That You Can Use To Make Better Predictions. The most valuable part of machine learning is…

machinelearningmastery.com