Publications

Research papers and publications across ML, computer vision, and Generative AI.

RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads

Vijayasri Iyer, Maahin Rathinagiriswaran, Jyothikamalesh S

arXiv:2602.12877, 2025

RoadscapesQA is a novel multitask and multimodal dataset that bridges the gap in autonomous driving research for unstructured environments.
Collected over 5 hours of driving footage (9,000 final images). and implemented a resource-efficient pipeline using YOLO-World for initial object detection, followed by rule-based heuristics to automatically generate 7 ground-truth QA pairs per image.
The dataset supports four key tasks: object detection, drivable area segmentation, object counting, and image-level visual question answering (VQA).
GPT-4o demonstrated the strongest semantic reasoning in Surrounding Description with a similarity score of 0.701.
Conducted a detailed hallucination analysis, finding that models struggle most with fine-grained Object Description (50.8%–61.6% hallucination rates).

Computer VisionVLMAutonomous DrivingGenAIYOLO

S Damodaran, R Padmanabhan, R Maahin, S Gurugopinath

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

Designed an unsupervised anomaly detection framework for heterogeneous multivariate data (e.g., UAV sensors) that operates without prior knowledge of statistical correlations.
Improved the performance of standard unsupervised algorithms by training them on synthetic samples generated from copula-based joint distributions.
Tested on the IEEE Signal Processing Cup 2020 dataset, successfully fusing 11-dimensional sensor data to detect system abnormalities in autonomous drones.
Achieved 93% detection accuracy on real-world heterogeneous data using the C-Vine Copula combined with an Autoencoder, significantly outperforming direct training methods.

Machine LearningAnomaly DetectionMultimodality

Maahin Rathinagiriswaran, Swapneel Managaokar, K R Yashaskara Jois, Kartik Vijaykumar Suvarna, Niranjana Krupa

2021 IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC), 2021

Developed an end-to-end framework to recognize South Indian Sign Language and translate it into spoken Kannada in real-time.
Implemented an Inflated 3D (I3D) Convolutional Neural Network, bootstrapping pre-trained 2D Inception-V1 weights to effectively capture complex spatiotemporal features from video streams.
Integrated Optical Flow algorithms to isolate motion dynamics from background noise.
Created and augmented a dedicated dataset of 1,068 videos covering 89 distinct gesture classes, combining official ISL resources with manually recorded samples.
Achieved an average validation accuracy of 87.09% using Stratified K-Fold Cross-Validation, outperforming traditional VGG19 and RNN-based benchmarks in both accuracy and computational efficiency.

Deep LearningComputer Vision3D CNNs