2a) What type of data are you going to use? (Identify main types of information/data)
2b) What procedures will you use to collect data (include all equipment/methods you plan to use)
2c) What methods will you use to analyse this data?
The processing and sparse segmentation procedures will be performed through the spiking neural network on the data. The coding of event-driven representations of a thermal input will be synthesised via an MMV-based spiking neural network. Sparse foreground segmentation Robust Principal Component Analysis will be used. Lightweight classifiers will be used in gesture classification like SVM or kNN. Classification accuracy and computational efficiency indicators will be used as the determination of performance.
The secondary thermal gesture data needed in this project is available openly to researchers to use in their scholarly studies. No personal or identifiable data will be acquired or processed. Every dataset, intermediate product and the research findings obtained will be stored in a password-protected personal computer and will serve solely academic purposes. The data will be handled as per university data management guidelines and there is no need to have any extra data management processes besides the general requirement of ethics which is present at the module level.
Human-computer interaction Gesture recognition is a crucial aspect of human computer interaction, especially in smart environments, assistive technology and touch free interaction. Older systems of gesture recognition with a vision-based approach are highly reliant on highresolution and RGB cameras and deep learning algorithms, which are computationally costly, power-intensive, and do not run well on edge or embedded computers (Tang et al. 2023). Another type of sensing, thermal imaging, maintains the privacy of users and is not highly sensitive to lighting changes, but presents several difficulties because of low spatial resolution and lack of texture information. Most recent studies have investigated methods of gesture recognition with energy saving features of low-resolution thermal sensor with neuromorphic computing concepts. The arXiv article Resource-Efficient Gesture Recognition using Low-Resolution Thermal Camera through Spiking Neural Networks and Sparse Segmentation (2024) suggests that a lightweight and interpretable pipeline with the integration of spiking neural networks (SNNs) built on the foundation of Monostable Multivibrator (MMV) neurons and sparse segmentation through Robust Principal Component Analysis (R-PCA) can be used. This method has a major advantage as it greatly lowers the computational complexity but does not affect competitive recognition rates when compared to the deep spiking convolutional networks. This project is in line with the existing research on edge AI and neuromorphic computing, targeting the low-power, low-resolution sensing and biologically inspired models. It gives its part through carrying out and testing the suggested pipeline in a repeatable way, assessment of the performance, efficiency and appropriateness to embedded and resource-coerced scenarios (Hwang et al. 2024).
The aim of this project is to design, implement, and evaluate a resource-efficient gesture
recognition system using low-resolution thermal imagery and spiking neural networks.
Objectives
● To review existing literature on thermal-based gesture recognition and spiking neural
networks
● To work with low-resolution thermal gesture datasets provided by the authors
● To implement an MMV-based spiking neural network for event-driven feature extraction
● To apply sparse segmentation using Robust Principal Component Analysis (R-PCA)
● To classify gestures using lightweight classifiers such as SVM or k-NN
● To evaluate recognition accuracy and computational efficiency of the proposed system
The research is a quantitative, experimental and design-based research. The project aims at
developing a gesture recognition pipeline based on biological principles and with reduced energy
usage and testing it on low-resolution thermal data (Xu et al. 2024).
Procedures
The research will also commence with the acquisition of low-resolution thermal gesture data as
supplied by the authors of the source article. The dataset is composed of a sequence of thermal
images whose spatial resolution is 24×32 pixels of the various hand gestures. Preprocessing will
involve initial steps as regularising, background removal and alignment of the frame where
necessary.
A spiking neural network, modelled in MMV will be used to convert thermal image frames into
spike-based representations. The minimum amount of foreground segmentation shall then be
done using the Robust Principal Component Analysis mechanism to distinguish the gesture
associated motion and the noise. The lightweight classifiers that will be used to train on the
segmented output to obtain feature representations are the Support Vector Machines (SVM) or
the k-Nearest neighbours (k-NN).
The system will be tested on the aspects of the accuracy of gesture recognition, the computing
efficiency of the system and its capability to be deployed on lower power or edge computing. The
results will be examined and evaluated against the available methods that are being discussed in
the literature (Gupta et al. 2022).
Resources and Equipment
Software: Python
Libraries NumPy, SciPy, scikit-learn, Matplotlib.
Hardware: Standard laptop or PC (no graphics card needed)
Reference
Tang, J., Li, G., Lin, J. and Chen, Z., 2023. Efficient spiking neural networks for resource-limited vision applications. IEEE Transactions on Neural Networks and Learning Systems, 34(2), pp.755–767.2a) What type of data are you going to use? (Identify main types of information/data)
This research will work with secondary thermal infrared image data, that is available publicly and per the available open datasets, the FLIR ADAS thermal data 1. This data is available in the form of infrared images that have been taken in actual driving scenarios and they contain annotated items like pedestrians, vehicles and bikes. Image-level annotations are recorded in the form of bounding boxes and the class label, which are essential in supervised object detection, in the dataset. No personal or identifiable or sensitive human information will be gathered or analysed.2b) What procedures will you use to collect data (include all equipment/methods you plan to use)
The dataset will be downloaded via web browser or dataset hosting sites to get publicly available sources. Procedural preparation of data would be done under Python. The pictures and comments will be sorted into systematic folders suitable to YOLO trainers. Preprocessing will involve image resizing, image normalisation, label verification, and train, validation, and test splitting. The development and testing of the models will be carried out using a standard personal computer and a GPU may be used to speed up training (optionally).2c) What methods will you use to analyse this data?
Deep learning-based object detection will be used to analyse the data. Thermal object detection will be carried out by implementing an equivalent of a baseline YOLO detector (e.g. YOLOv4 or YOLOv7). An efficient lightweight self-attention Reasoner module will then get incorporated into the stage of feature extraction to increase the spatial as well as semantic thinking at image regions. The standard object detection measures of mean Average Precision (mAP), precision, recall, inference speed (FPS), and false detection rate will be used to measure model performance. The baseline YOLO model and the improved IR Reasoner architecture will be compared in terms of their performance.This project utilises publicly available secondary thermal infrared image data that are useful in research including the FLIR ADAS dataset. The information in the form of infrared images and related objects annotations does not contain personal, sensitive, or personally identifying data. All the data sets and trained models will be stored in a password-protected personal computer and will be utilised in the scope of academic research. No redistribution of data will be performed and no commercial use will be seen. All resulting derived data, experimental results, and trained models files will only be stored as long as the project lasts and handled by the university data management policies. This study will not need any extra data management processes other than the standard requirements of the module level ethics. Dataset Link: https://www.kaggle.com/datasets/rthwkk/ir-object-detection-dataset
IR object detection is a very essential task in features like autonomous driving, surveillance, and night time surveillance because in most cases the camera based on visible light does not work well because of low light or poor weather. Thermal infrared images give resilience in such conditions though object recognition in infrared images is impeding due to low texture details, low contrast, and blurred edges of the objects. Conventional computer vision methods together with standard deep learning detectors usually do not learn significant spatial and semantic associations of thermal cases (Zhang et al. 2023). Recent pinnacle developments in deep learning have transferred object detectors based on the convolutional neural network, in specific, the YOLO-family structures, to infrared images with evident outcomes. Research has revealed that by fine-tuning YOLO models to thermal data like FLIR ADAS, a real-time performance with limited accuracy of the detection can be performed due to lack of contextual thoughts. The IR Reasoner framework overcomes this shortcoming with a lightweight self-attention-based reasoning module which reinforces both spatial and semantic associations among image sections without decreasing the pace of real-time inferences (Redmon and Farhadi 2022). The proposed project aligns with the existing sources as it extends the existing literature on YOLO detectors and incorporates a visual reasoning system to support the infrared data. The paper will be valuable as it applies and tests the IR Reasoner architecture on a reproducible experimental system and the comparison of baseline and improved models and trade-off between detection precision and real-time speed in thermal object detection.
The proposed project will be dedicated to the implementation and testing of the real-time
infrared object detection system that will be based on the IR Reasoner structure and will expand
to a YOLO-based detector using visual reasoning to enhance the detection scheme in thermal
images.
Objectives
● To perform a literature review of the existing literature on infrared object detection and
thermal image analysis with YOLO.
● To develop a baseline YOLO object detector of thermal infrared images.
● To add the visual reasoning module of IR Reasoner into the YOLO extraction feature.
● To optimise and estimate the proposed model on a publicly available thermal dataset.
● To assess and compare the model at the baseline and with improvements on conventional
object detection metrics.
● To examine the tradeoff between the accuracy in detection and real time inferences.
Software: Python Python Frameworks/Libraries: PyTorch, OpenCV, NumPy, Matplotlib
According to the paper reviewed Infrared Maritime Object Detection Network With Feature Enhancement and Adjacent Fusion, an evident and justifiable innovation to this paper would be to adapt the fixed feature-enhancement and fusion model to adjustable, condition-sensitive, and general-relevant detection. In particular, whereas the current paper is concerned with handcrafted attention improvements (ICA, Dilated CBAM), and a fixed adjacent feature fusion as an activity, specific to the maritime environment, the paper by you can bring a dynamic contextadaptable mechanism that can increase or reduce the strength of feature improvement and fusion course depending on the environment of the scene (sea state, clutters density, target scale distribution or thermal contrasts). This may be done by lightweight scene-adaptive gating, transformer-based global reasoning or uncertainty-aware attention that explicitly represents background ambiguity. Moreover, a domain-resilient learning approach (such as cross-domain training, self-supervised pretraining, or physics-informed loss functions) can be suggested to enhance the viability of performance coping with maritime and generic infrared tasks, unlike the checked paper, which admits inability to do so due to limited generalisation to non-maritime data. This puts the work as not only enhancing the accuracy of the detection of low and small targets, but it is also concerned with flexibility and dependence on deployability, which is still an open restriction in literature. Scene-Adaptive Context-Aware Infrared Object Detection Network (SACAIODNet) model will be implemented.
Reference
Redmon, J. and Farhadi, A., 2022. YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767. Available at: https://arxiv.org/abs/1804.027672a) What type of data are you going to use? (Identify main types of information/data)
In this study, the secondary infrared image data will be used based on datasets published publicly and located on Kaggle. The data will comprise of grayscale infrared (thermal) images that will include small targets in complicated background scenes that include sky, sea and land scenes. It is characterised by the nature of the images being low-contrast and noisy, and thus, is susceptible to small target detections algorithms in infrared imaging. The datasets do not contain any personal, identifiable or sensitive human data.2b) What procedures will you use to collect data (include all equipment/methods you plan to use)
The datasets of the infrared images will be saved by simply downloading the files via a web browser or with the Kaggle API. The processing of the data will be performed in a normal personal computer in Python. The photos are going to be arranged into directories and pretreated with grayscale normalisation, noise minimisation, as well as resising when needed. The selection of dataset will be based on the scenes that have been used in previous research on the infrared small target detection to ascertain reproducibility and comparability with the other current studies.2c) What methods will you use to analyse this data?
The analysis of the data will be done by means of classical image processing and in matrices. To improve the targets of small size, there will be the use of a Dual-Window Local Contrast Method (DW-LCM). An approach that will be used to exploit nonlocal spatial information is a Multiscale Window Infrared Patch-Image (MW-IPI) model which will be used to suppress background clutter. Target extraction Thresholding and morphological operations will be used to extract the targets. The standard infrared target detection metrics to be used in the evaluation of detection performance will include Signal-to- Clutter Ratio Gain (SCRG), Background Suppression Factor (BSF) and a rate of detection and false alarm.The presented project uses solely publicly available secondary datasets of infrared images on Kaggle and aimed at scholarly and research purposes. There will be no personal, sensitive or identifiable human information that will be gathered, processed and retained at any point of the research. Each data set is going to be saved in a password-friendly personal computer where it will only be used in this project in an academic context. No information is going to be distributed to third parties beyond the study. Any processed data, interim products and experimental findings will only be stored as long as the project is in progress only to be eliminated or archived according to university data management policies on exceeding the project. This study does not require any other data management processes other than the standard ethics requirements at the module levels.
Small target detection Infrared IR Small target detection is very important in surveillance, missile warning platforms, maritime applications, and remote sensing. Infrared imagery has low signalclutter ratio, background noise and intricate thermal environments that contribute to difficulty in detecting small targets in the infrared imagery. The background clutters usually remain and the weak targets are suppressed in traditional statistical and filtering-based methods. Recent studies have actually shown that local contrast enhancement to be used with nonlocal spatial modelling may lead to a notable enhancement in detection performance. A study conducted by Infrared Small Target Detection Using Local and Nonlocal Spatial Information (2019) suggests an organised, classical image-processing pipeline, which combines the Dual-Window Local Contrast Method (DW-LCM) with Multiscale Window Infrared Patch-Image (MW-IPI) model. This is not a deep learning model, is not based on the big data, and is independent of the use of a graphics card, yet receives strong detection results. The proposed project is relevant to the existing body of literature given its emphasis on methods that are interpretable based on mathematical grounds and makes its own contribution by applying, testing as well as the analysis of the efficacy of local as well as nonlocal spatial procedures on the infrared images through reductive experiments.
Aim
The aim of this project is to implement and evaluate a classical infrared small target detection framework based on local and nonlocal spatial information.
Objectives
To study existing infrared small target detection techniques and challenges
To implement the Dual-Window Local Contrast Method for target enhancement
To implement the Multiscale Window Infrared Patch-Image model for background suppression
To integrate local and nonlocal methods into a unified detection pipeline
To evaluate detection performance using standard infrared target detection metrics
To analyse robustness under different background and noise conditions
| Note: The Project ListYou can view or download. |
