Week #5 in Machine Learning

Eliud Nduati
4 min readFeb 19, 2022

--

Improving the accuracy of Machine Learning Models

Photo by Jungwoo Hong on Unsplash

Model performance is always measured using established evaluation techniques that we have mentioned before. These metrics help to compare or estimate the performance of a model and decide whether it performed better or worse.

A majority of data scientists give up when it come to improving the performance of a model. It is a challenging activity. The ways to improve performance of a model include:

Data

  • More data results in more accurate and better models. This might not be feasible in competition data but it is possible in the enterprise section where most data scientists will be working. As a result, if you as a data scientist are working on a model, ask for more data to help make the models perform better.
  • Nonlinear ML approaches like Deep learning continue to improve on performance as more data is added.

Handle missing data and outliers

  • However, the one that you select ought to not impact the data balance as much. It is important that the data is imputed if this is possible. Using such methods as imputing using KNNs helps reduce the chances that the model will be biased.
  • Cleaner data can help get better performances
  • Check this article on how to deal with outliers

Feature engineering

  • Feature engineering helps extract more information from the data you have. Feature engineering might help explain some of the attributes in the training data such as the variance. To get better features, using hypothesis generation will help.
  • Some of the feature engineering approaches include data normalization and standardization help improve performances in algorithms that use weighted inputs or distance measures.
  • Other features can be derived from existing variables . this approach helps to revel the relationship of a dataset that might be hidden.

Feature selection

  • Selecting useful features is important when working on data science projects. Feature selection involves finding out the best attributes that best explains the relationship of independent variables with the target variables.
  • Some of the metrics that will help with selecting useful features entail:

— Domain knowledge

— Visualization

— Statistical parameters such as PCA, p-values, information values.

Multiple algorithms

  • Some algorithms are best suited for certain type of datasets than are others. This therefore implies that it is important to apply all relevant models and select the one with the best performance.

Parameter tuning

  • Parameters in an algorithm influence the outcome of the learning process.
  • Parameter tuning help find the optimal values for each parameter to help with accuracy improvements in the model used.
  • However, to benefit from parameter tuning, you need to understand the particular parameters and what each means or how each would impact the model performance.
  • Grid search can help understand what grids of standards hyperparameters exist and how to enumerate to find a better configuration.

Algorithm approach

  • This approach closely related with the last two approaches. It involves choosing the best evaluation metrics for that algorithm based on metrics that better capture the requirements of the problem and the domain.
  • Spot check both linear and nonlinear algorithms. Linear algorithms are more biased while nonlinear approaches require more data.
  • Research on recommended algorithms based on your problem.
  • Configure your algorithms best. This does not relate to tuning the algorithm it merely suggests that you ensure that the investigate how to configure each algorithm better and therefore give it a better chance to perform well.

Ensemble methods

  • This approach harnesses the results of various weak models and produces better results by combining the models.
  • It has two approaches including: Bagging & Boosting
  • Ensemble methods are more complex than traditional methods thereby making them a better approach in accuracy improvements.

Cross validation

  • This approach helps understand why the model is not performing as expected perhaps due to overfitting.

Sources

--

--