Skip to the content.

Pedestrian Detection System using YOLO

The Pedestrian Detection System using YOLO is an innovative project designed to assist the Campus Planning Department at Facilities Management at the University of Massachusetts Lowell. The primary objective of this initiative is to create a tool that leverages advanced Machine Learning Techniques, specifically utilizing the potent YOLO (You Only Look Once) Deep Learning Algorithm developed by Ultralytics. The purpose of this tool is to accurately count the number of individuals traversing specific pathways across the university campus.

With a user-centric approach, we have crafted a sophisticated User Interface as an integral part of the tool. This interface enables users to effortlessly upload a video and define specific regions along the paths where pedestrian detection is required. The system seamlessly executes the process of detecting, tracking, and tallying the number of individuals traversing these designated pathways.

By incorporating YOLO, renowned for its efficiency in object detection tasks, our system ensures robust and real-time detection capabilities. This translates to a more streamlined and accurate monitoring process, enhancing the overall efficiency of the Campus Planning Department in managing pedestrian flow.

In summary, the Pedestrian Detection System utilizing YOLO stands as a testament to the integration of cutting-edge technology to address practical challenges. By providing a user-friendly interface and leveraging the power of YOLO, this project facilitates the precise and automated counting of pedestrians, thereby contributing to enhancing campus planning and management at the University of Massachusetts Lowell.

Table of Contents

Prerequisites

Environment

  1. Python 3 Environment
  2. Python modules required: NumPy, Pandas, PyTorch, Opencv2, Matplotlib, Ultralytics, Supervision, Tkinter, TQDM

OR

Video Description

The tool supports the usage of timelapse videos of any kind and any size with a good resolution and frame rate. Optimize your frame rate and quality of the video based on how big are the persons in the video(Smaller persons in the frame require the video to be clearer). The video used during the development and testing of the model is a timelapse video of a place on the campus of UMass Lowell which contains multiple paths on which the number of people passing has to be counted.

Test Video Information

https://github.com/kysgattu/Pedestrain-Detection-System/assets/42197976/7b929dc9-bdb4-4b71-9ef2-a800f3e86184

Modules

User Interface

The Front-End UI is a Tkinter Dialog box where

Screenshot 2023-11-16 at 7 35 35 PM

Region Of Interest Selector

Once the Detection button is run, For each Region of Interest a Matplot with a frame from the video pops up on which four points have to be selected which encloses the detection region.

Screenshot 2023-11-16 at 7 37 57 PM

YOLO Model

YOLO, or You Only Look Once, is a popular computer-vision object detection algorithm. The key idea behind YOLO is speed and efficiency. Instead of dividing the image into a grid and running object detection on each grid cell, YOLO divides the image into a grid but performs detection for all objects within the entire image in one forward pass of the neural network. YOLO divides the input image into a grid. Each grid cell is responsible for predicting bounding boxes and class probabilities for the objects contained in that cell. Each grid cell predicts multiple bounding boxes along with confidence scores. These bounding boxes represent the location of potential objects in the image. YOLO also predicts the probability of the presence of different classes within each bounding box. After predicting multiple bounding boxes and class probabilities, YOLO uses non-maximum suppression to eliminate duplicate or low-confidence detections. This helps to provide a cleaner and more accurate set of predictions.

Working of the Detection Code

The code initiates by configuring parameters like confidence levels, scaling percentages, and tracking thresholds. The video is processed frame by frame using the OpenCV library, with optional scaling for enhanced performance. Regions of interest (ROIs) are defined within each frame, and the code iterates through these, applying YOLO to identify individuals. Subsequently, a tracking mechanism based on object centers is employed to trace the movement of detected persons across frames. The count of individuals within each ROI is continuously updated, and the annotated frames, showcasing bounding boxes, tracking information, and ROI overlays, are compiled into an output video. The final results, including the number of persons detected and tracked in each ROI, are presented upon completion.

Results Visualisation

The annotated frames with bounding boxes, tracking information, and ROI overlays are stored in an output video and saved for later review. After processing the entire video, the code prints the number of persons detected and tracked in each ROI in the Tkinter Dialog Box.

Screenshot 2023-11-15 at 2 39 37 PM

Result - Annotated Video

The Annotated video with the number of persons in each ROI is shown in the video -

Result Video

Working Demo of the Tool

Step-by-Step Instructions for running the tool

_A detailed demo of how the tool can be used is shown in below video:_

Demo Video

Developers

GitHub:G K Y SHASTRY

University of Massachusetts Lowell

Facilites Information Systems - Facilities Management - UMass Lowell

Contact me: gkyshastry0502@gmail.com , kysgattu0502@gmail.com

References