Dr. Ghazanfar Ali | Multimodal AI & Digital Humans

InnoCORE Postdoctoral Researcher · KAIST

I craft human-centered AI for digital humans.

Bridging deep learning, embodiment, and interaction design—building multimodal systems where virtual humans listen, speak, and gesture with nuance in real time.

Co-Speech Gesture Generation
Multimodal Representation Learning
Digital Humans for XR
Agentic & RAG-Driven Avatars
LLMs + Motion

Explore Research

View Demos

10+ yrs
XR & AI Research
20+
Peer-Reviewed Pubs
2×
Major Awards

InnoCORE Fellow
KAIST · SNU Vision Lab
Seoul, KR

I design scalable hybrid pipelines for text-to-gesture generation, controllable motion synthesis, and LLM-infused digital humans.

Email
[email protected]

Google Scholar
Publications

Latest News

Sep 2025
Joined KAIST as InnoCORE Postdoctoral Researcher.

May 2025
Started new project on Emotion-Augmented Co-Speech Gestures.

2024
Received Best Paper Award at CASA 2024 for RIDGE framework.

2024
Published work on Multilingual Co-Speech Interaction in IEEE Access.

Research Focus

My work is anchored in multimodal understanding & generation—aligning text, audio, motion, and high-density knowledge.

01 · Multimodal AI

From text and audio to motion

Architectures for aligning language, prosody, and 3D motion—using contrastive learning and shared latent spaces to generate co-speech gestures that feel intentional rather than random.

GestureCLR-style encoders
Temporal transformers
Shared latent spaces

02 · Digital Humans

Expressive avatars for XR

Systems like RIDGE and Enhanced Gesture Units enable virtual agents to perform semantically aligned, style-aware gestures in real-time AR/MR environments.

Co-speech gestures
Pre-viz & storyboarding
Medical explainers

03 · Agentic AI

RAG-powered embodied agents

Retrieval-Augmented Generation and agent frameworks that let virtual humans reason over dense corpora while responding with grounded language and matching non-verbal behavior.

RAG pipelines
Context-aware avatars
Evaluation frameworks

Research Gallery

Visualizing motion synthesis and gesture generation.

* Placeholder videos

RIDGE Framework Demo

Text-to-Gesture Synthesis

ASAP Previz System

Screenplay to 3D Animation

Joseon Dynasty Agent

RAG-Driven Historical Avatar

Career Path

2025 – Ongoing
InnoCORE Postdoctoral Researcher
KAIST
Controllable human motion synthesis and LLM-driven context understanding for virtual human behavior.
2023 – 2025
Postdoctoral Researcher
KIST
Human-to-avatar gesticulation learning and interactive digital human technology for flagship virtual avatar platforms.
2017 – 2023
Research Assistant · Ph.D.
UST – KIST School
Ph.D. in AI-Robotics on scalable hybrid text-to-gesture generation for interactive digital humans.
2014 – 2017
Engineer Instructor
University of Central Punjab
Designed and delivered lab curricula in programming fundamentals, OOP, and databases.

Key Projects

RIDGE · Co-Speech Gestures

Rule-Infused Deep Learning for Realistic Gesture Generation

Computer Animation & Virtual Worlds · Best Paper Award (CASA 2024)

RIDGE fuses LLM-generated specific rules with contrastively learned generalized gesture retrieval in a shared latent space.

Hybrid rule + deep learning
Shared latent motion space
Industry-ready pipeline

ASAP · Storyboarding & Pre-Viz

Auto-Generating Storyboard with Virtual Humans

KOCCA Flagship · ISMAR & SIGGRAPH Asia Real-Time Live

A screenplay-to-3D pipeline that converts scripts into shot-level visualizations with virtual actors.

Screenplay understanding
Virtual cinematography

RAG-Driven Historical Agents

AI Agent for the Annals of the Joseon Dynasty

Computer Animation & Virtual Worlds · ISMAR 2025

A Retrieval-Augmented Generation system that navigates dense historical archives to answer nuanced queries.

RAG pipelines
Cultural heritage

Selected Publications

A sample of recent work. Full list available on Google Scholar.

Computer Animation & Virtual Worlds · 2025

RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation

G. Ali, H.Y. Kim, J.-I. Hwang
Hybrid rule + retrieval framework delivering realistic, deployable co-speech gestures.

IEEE Access · 2025

Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units

G. Ali, W. Kim, M.S. Anwar, J.-I. Hwang, A. Choi
Gesture units for cross-lingual, semantically aligned gesture synthesis.

Computer Animation & Virtual Worlds · 2025

AI-Agent for the Annals of the Joseon Dynasty: RAG System for Contextual Analysis

J.H. Lee, G. Ali, J.-I. Hwang
RAG system over dense historical corpora, powering embodied explainers.

Multimedia Tools & Applications · 2024

ASAP for Multi-Outputs: Auto-Generating Storyboard and Pre-Visualization

G. Ali, H. Kim, B. Han, et al.
Screenplay-driven multi-output pre-visualization for film production.

Let’s Collaborate

Email
[email protected]
Location
Seoul, South Korea
LinkedIn
/in/ghazanfar309
Scholar
Profile ↗

For speaking invitations, research collaborations, or co-developing digital human applications, feel free to reach out.

What I can help you build

Low-latency co-speech gesture engines for real-time avatars
Storyboarding & previz pipelines from script to scene
Agentic, RAG-driven digital humans for education or culture
Evaluation frameworks for multimodal systems