MD Khalequzzaman Chowdhury Sayem
Geometry-Grounded Multimodal Research for Vision-Language-Action Systems
I am a researcher at the Vision & Learning Lab at UNIST, working under the supervision of Prof. Seungryul Baek and Prof. Binod Bhattarai. My research focuses on 3D vision and vision-language models with an emphasis on geometry-grounded reasoning for articulated hands and human–object interaction.
My recent work develops large-scale benchmarks and real-time Transformer architectures that integrate explicit 3D geometric supervision into multimodal models, improving fine-grained spatial reasoning reliability and cross-task generalization.
I am currently interested in advancing toward Vision-Language-Action (VLA) models and geometry-aware world models that unify perception, language, and action for physically grounded embodied intelligence.
I am always open to collaborations and discussions. If you are interested in my research or have any inquiries, feel free to reach out to me at khalequzzamansayem@unist.ac.kr.