What is Object Detection?

Object detection is simply identifying specific objects in an image. Object detection machine learning (ML) models take various approaches to do this but the output of object detection models is often a bounding box surrounding the desired object (e.g., a square drawn around Waldo in a “Where’s Waldo” image).

Detecting common objects (i.e., cat, dog, human, etc.) in plain images is relatively easy for ML models but ML models often stumble detecting less common objects (i.e., pottery wheel, printing press, phonograph, etc.), more granular objects (i.e., cat’s whiskers, dog’s tail, human mullet, etc.), and noisy, complex images (i.e., thousands of sports or music fans, an industrial construction site, a dense rainforest, etc.).

Closed-Set Object Detection vs. Open-Set Object Detection

Object detection application fall into the following two broad categories.

  1. Closed-set object detection is for when we know exactly the types of object we want to find, typically suited static applications.
  2. Open-set object detection if for when we don’t know upfront what objects we want to detect, typically best for dynamic applications.

If you’re developing autonomous vehicles, for example, you wouldn’t want to use closed-set object detection because it wouldn’t be robust to objects the models didn’t train on (engineers can’t foresee and include every object type that a self-driving vehicle might encounter). Dido for robotics, surveillance systems, and many more applications

Closed-set object detection is suited for applications that focus on a limited set of object like facial recognition, retail analytics (e.g., tracking specific products purchased), industrial automation (e.g., quality control via parsing out common defects), or medical imaging (e.g., spotting known malignancies).

    Object Detection Approaches

    Here’s a few object detection approaches:

    • R-CNN, Sliding Window, and Selective Search
    • Fast R-CNN, Region Projection, and Region of Interest (RoI) Pooling Layer
    • Faster R-CNN, Region Proposal Network, and Intersection over Union
    • Mask R-CNN, Mask Prediction Branch, and Region of Interest Align (ROIAlign)
    • You Only Look Once (YOLO), YOLOv1 Architecture
    • Hungarian Matching Algorithm, Tracking, Bounding Box Matching
    • Detection Transformers (DETR), Object Queries
    • Grounding DINO, Open-Set Object Detection

    Learn a Bit More About Object Detection

    For a concise overview of each, I recommend checking DataMListic’s playlist below (~5 mins per video)