Sign in

Data Science Enthusiast || Poet {in my own interpretation} || I read || I write || I code|| Grey Wolf || Toxic to Vampires || Very Curious ||

Here’s our next post in Data cleaning Journey, dealing with outliers. When we talk about outliers, we aren’t talking about the ones discussed by Malcolm Gladwell’s “Outliers” though the definition is almost alike.

In this post, we will discuss what outliers are, how they came to be, their effects on our machine learning models, how to identify them and how to deal with them.

image from Analytica Vidhya post

To understand what outliers are, we will answer three questions.

What are outliers in Data?

Statistically, Outliers are observations that are distant from other observations.

Let’s try that again: An outlier is an observation that deviates significantly from the rest of…


Our data Cleaning Journey is coming along pretty well. Here we are in part 4 where we look at how to deal with categorical variables.

Photo by v2osk on Unsplash

What are categorical variables?

Suppose you are participating in a survey and come across a question asking how often you use the bus. The range of answers provided is likely to be options such as: “Never,” “Rarely,” “Most days,” and “all days.” The data being collected is categorical since the responses fall within fixed categories. Another scenario will be asking people what make of phones they like or use. in such a case, the responses would be names such…


Part 2 in A Data Cleaning jouney

Last week we discussed Dealing with Missing values in our dataset. Today we back at it again with the data cleaning series.

Remember, the cleaner the data, the better our insights will be

Photo by Matthew Henry on Unsplash

There’s something I forgot to mention before that I want to note; two things.

  1. When do we do the data cleaning before or after the EDA (Exploratory Data Analysis)?
  2. Why do we have to clean our data?

When do we do the data cleaning before or after the EDA (Exploratory Data Analysis)?

We do data cleaning after the EDA. However, this is not applicable in the case of text data. …


Part 1 in A Data Cleaning Journey

You made it to the first article in “A data cleaning journey” series!

Image from Hub research

No one likes data cleaning. It can be very frustrating. This is especially if you are dealing with missing values and are on a deadline.

You need to give yourself time to handle the data cleaning part of your data science process.

Sometimes you will wonder what to do with the missing values on your dataset.


Whether you are a data engineer or a data scientist, you will spend most of your time cleaning data! It is estimated that data scientists spend about 80% of their time cleaning data. This means only 20% of the time will be used to analyze and create insights from the data science process. Data cleaning enhances data quality.

image from https://www.bizprospex.com/four-reasons-clean-your-crm-data/

The concept of data cleaning ensures that the quality of data is preserved and enhanced to meet the business needs. Insights are drawn from quality data help make the best business decisions.

Data quality

Data quality is the measure of how adequate and how…


photo from: International Union for Conservation of Nature (IUCN)

Whenever we hear about gender violence, most people assume that men propagate it. This, however, as will be seen, is not always the case. Gender violence is manifested in various forms in society. In most cases, its victims are women and girls.

- Sexual violence: rape, forced sexual acts, unwanted sexual advances, child sexual abuse, forced marriages, street harassment, stalking, cyber-harassment

- Partner violence: battering, psychological abuse, Marital rape, Femicides

- Human trafficking for sexual exportation

- FGM

- Child marriages.

The UN General Assembly in 1993 defined violence against women as “any act of gender-based violence that results in…


The idea of data privacy trickles down to how, as individuals, we are working towards keeping our data safe. This is, however, a challenge to most. In an era where almost, everyone is on social media platforms and where almost everyone is compelled to share their life there, it might be hard to ensure one’s privacy. Data regulation policies have been implemented to ensure that the platforms we share our data on safeguard the data and keep our privacy. But is this the case?

While data regulation policies focus on making data usage clear, organizations have found loopholes to use…


Some time back, whenever I would receive a call or a text from an unknown number, I would rush to Facebook and run a search based on the number to find out who it belonged to. Truecaller also offers the same service, which allows people to identify the caller or the texter.

How many loyalty programs have you registered to? Most of the local stores have significantly invested in collecting their client’s data to understand the purchase behaviors or their clients. We, the clients, however, readily offer the data without really thinking about data privacy. Another interesting scenario is on…


On May 25th, 2018, the European Union implemented the GDPR. While most people did not understand the aim or role of this policy at first, it has resulted in numerous discussions and interest in data privacy and data security. Lately, every country is rushing towards implementing a data privacy act. But what is this all about? What does it mean to have a data privacy act or law? What does it mean to have data security? The terms have been used interchangeably over time by different people. I bet you have too. So, what does each of this jargon mean…

Eliud Nduati

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store