Skip to main content

Unsupervised Learning

Updated May 04, 2023 ·

Overview

Unsupervised learning finds patterns in data without needing a target column. It helps uncover structure, relationships, or anomalies in datasets.

  • Learns patterns from data automatically
  • Clustering, anomaly detection, and association.

Clustering

Clustering groups similar observations together, making it easier to understand patterns or segments in the data.

  • Animals grouped by breed or color
  • Customer segments based on behavior

Below are some examples of clustering models:

  • K-Means

    • Number of clusters must be defined in advance
    • Groups points based on similarity or distance
    • Useful for organizing or segmenting data
  • DBSCAN

    • Density-based clustering that can handle noise
    • Defines clusters based on point density
    • No need to specify number of clusters
    • Requires setting criteria for what counts as a cluster

Anomaly Detection

Anomaly detection focuses on identifying outliers, which are observations that significantly differ from others.

  • Detecting Outliers

    • Highlights unusual patterns or errors that may need attention
    • Example: A data point far from a cluster of normal values
  • Removing Outliers

    • Easier in low-dimensional data, harder in high-dimensional data
    • Algorithms help identify complex anomalies
  • Use Cases

    • Fraud detection
    • Faulty device identification
    • Error spotting
    • Unusual medical cases

Association

Association involves finding relationships between observations, often used in market basket analysis to determine which items are bought together.

  • Finding events that happen together.
  • Helps discover hidden relationships in data

It is commonly used in market basket analysis to find patterns in what items are purchased together.

  • Jam is often bought with bread, beer with peanuts, wine with cheese
  • Helps retailers plan product placement
  • Reveals hidden product patterns