AI / Research
Generative AI Research — Stable Diffusion Pipelines
Text-to-sketch and sketch-to-image generation pipelines with a focus on controllability, fidelity to input constraints, and deployment feasibility on modest hardware.
Overview
A research effort exploring Stable Diffusion-based pipelines for two related tasks — generating sketches from text, and generating finished imagery from sketches. The work emphasizes controllability (the output must respect the input constraints), fidelity (not just plausible but correct), and practicality (runs on a single workstation GPU, not a cluster).
The Problem
Stock diffusion models are great at plausible imagery but weak at constraint-following. A sketch-to-image pipeline that ignores the sketch layout is worthless. Controllability mechanisms — ControlNet conditioning, LoRA fine-tuning, careful prompt design — make diffusion useful for real creative and technical workflows. The research question is how to combine them for the specific task domain.
My Role & Contribution
- Designed and ran the comparison studies across conditioning strategies
- Fine-tuned LoRA adapters on the domain dataset and evaluated their effect on fidelity
- Built a Gradio demo so reviewers could interact with the pipeline without touching code
Approach
- Hugging Face Diffusers as the base Stable Diffusion runtime
- ControlNet conditioning on sketch / edge / depth to enforce layout fidelity
- LoRA fine-tuning on a curated domain dataset to shift style and concept distribution
- Prompt engineering and negative-prompt tuning for consistent outputs
- Gradio front-end for interactive exploration and qualitative review
Tech Stack
Python
PyTorch
Hugging Face Diffusers
Stable Diffusion
ControlNet
LoRA
CUDA
Gradio
Results & Impact
- Qualitative improvements in sketch-following fidelity vs. unconditioned baselines
- A reusable pipeline and demo that informs downstream applied work
// TODO: add diagrams / screenshots