Week #6 in Machine Learning
Unsupervised Machine Learning
In unsupervised Machine Learning, the data is unlabeled therefore, the system attempts to learn from the data without depending labels. The system attempt to identify patterns in the data and assign them labels based on how the attributes are related.
A large portion of the available data is unlabeled. Labelling this data is costly, tedious and expensive in terms of time.
Unsupervised learning algorithms explit the unlabeled data wihtout needing humans to label the data.
Some of the Most Common algorithms include:
Clustering Algorithms — these are mainly used to identify groups of similar objects or items from a data set. The algorithm will find connections in the data without any help.
Clustering algorithms help in data analysis, customer segmentation, recommender systems, search engines, dimensionality reduction and image segmentation. They include:
- K-means clustering
- DBSCAN
- Hierarchical Cluster Analysis (HCA)
Anomaly Detection and Novelty Detection algorithms — detecting credit card fraud uses anomaly detection. This entails detecting unusual credit card transactions. In novelty detection, new instances that appear different from all other instances in the training set are detected and highlighted. These algorithms include:
- One class SVM
- Isolation forest
Visualization and Dimensionality Reduction algorithms — they receive large complex unlabeled data and output 2D or 3D representation of your data. They try to keep separate cluster from overlapping in the visualization allowing easy understating on how the data is organized. Some of these algorithms include:
- Principle Component Analysis (PCA)
- Kernel PCA
- Locally Linear Embedding (LLE)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
— Dimensionality reduction allows you to simplify the data without losing too much information. A good way to achieve this is to merge several corelated features into one. Using the example of a car’s mileage and age, dimensionality reduction can be used to reduce these two into one feature that would represent the car’s wear and tear. This process is known as feature extraction.
— It is a great idea to reduce the dimensions of the training data using dimensionality reduction algorithm before using it in another machine learning algorithm as it helps reduce disk and memory space used as well as help it run faster and in some instances allow better performances.
Association (rule) Learning Algorithms — these algorithms dig into large amounts of data and discover relations between the attributes. A good way to use this can be in sales data. It might reveal that certain products or item are purchased together. These include:
- Apriori
- Eclat
Since there is no label, there is not specific way of comparing model performances in most unsupervised learning methods.
Diasadvantages of Unsupervised Learning
- It might not consider the spatial relationships in the data
- Interpreting the spectral classes can take too much time.
- It is a challenge sometimes to detemien whether the algorithm learned something useful.
Check out for the next article where we take some of these algorithms for a spin! 😁