name: integrating-models description: > Use when adding a new model or pipeline to diffusers, setting up file structure for a new model, converting a pipeline to modular format, or converting weights for a new version of an already-supported model.
Goal
Integrate a new model into diffusers end-to-end. The overall flow:
- Gather info — ask the user for the reference repo, setup guide, a runnable inference script, and other objectives such as standard vs modular.
- Confirm the plan — once you have everything, tell the user exactly what you'll do: e.g. "I'll integrate model X with pipeline Y into diffusers based on your script. I'll run parity tests (model-level and pipeline-level) using the
parity-testingskill to verify numerical correctness against the reference." - Implement — write the diffusers code (model, pipeline, scheduler if needed), convert weights, register in
__init__.py. - Parity test — use the
parity-testingskill to verify component and e2e parity against the reference implementation. - Deliver a unit test — provide a self-contained test script that runs the diffusers implementation, checks numerical output (np allclose), and saves an image/video for visual verification. This is what the user runs to confirm everything works.
Work one workflow at a time — get it to full parity before moving on.
Setup — gather before starting
Before writing any code, gather info in this order:
- Reference repo — ask for the github link. If they've already set it up locally, ask for the path. Otherwise, ask what setup steps are needed (install deps, download checkpoints, set env vars, etc.) and run through them before proceeding.
- Inference script — ask for a runnable end-to-end script for a basic workflow first (e.g. T2V). Then ask what other workflows they want to support (I2V, V2V, etc.) and agree on the full implementation order together.
- Standard vs modular — standard pipelines, modular, or both?
Use AskUserQuestion with structured choices for step 3 when the options are known.
Standard Pipeline Integration
File structure for a new model
src/diffusers/
models/transformers/transformer_<model>.py # The core model
schedulers/scheduling_<model>.py # If model needs a custom scheduler
pipelines/<model>/
__init__.py
pipeline_<model>.py # Main pipeline
pipeline_<model>_<variant>.py # Variant pipelines (e.g. pyramid, distilled)
pipeline_output.py # Output dataclass
loaders/lora_pipeline.py # LoRA mixin (add to existing file)
tests/
models/transformers/test_models_transformer_<model>.py
pipelines/<model>/test_<model>.py
lora/test_lora_layers_<model>.py
docs/source/en/api/
pipelines/<model>.md
models/<model>_transformer3d.md # or appropriate name
Integration checklist
- Implement transformer model with
from_pretrainedsupport - Implement or reuse scheduler
- Implement pipeline(s) with
__call__method - Add LoRA support if applicable
- Register all classes in
__init__.pyfiles (lazy imports) - Write unit tests (model, pipeline, LoRA)
- Write docs
- Run
make styleandmake quality - Test parity with reference implementation (see
parity-testingskill)
Attention pattern
Attention must follow the diffusers pattern: both the Attention class and its processor are defined in the model file. The processor's __call__ handles the actual compute and must use dispatch_attention_fn rather than calling F.scaled_dot_product_attention directly. The attention class inherits AttentionModuleMixin and declares _default_processor_cls and _available_processors.
# transformer_mymodel.py
class MyModelAttnProcessor:
_attention_backend = None
_parallel_config = None
def __call__(self, attn, hidden_states, attention_mask=None, ...):
query = attn.to_q(hidden_states)
key = attn.to_k(hidden_states)
value = attn.to_v(hidden_states)
# reshape, apply rope, etc.
hidden_states = dispatch_attention_fn(
query, key, value,
attn_mask=attention_mask,
backend=self._attention_backend,
parallel_config=self._parallel_config,
)
hidden_states = hidden_states.flatten(2, 3)
return attn.to_out[0](hidden_states)
class MyModelAttention(nn.Module, AttentionModuleMixin):
_default_processor_cls = MyModelAttnProcessor
_available_processors = [MyModelAttnProcessor]
def __init__(self, query_dim, heads=8, dim_head=64, ...):
super().__init__()
self.to_q = nn.Linear(query_dim, heads * dim_head, bias=False)
self.to_k = nn.Linear(query_dim, heads * dim_head, bias=False)
self.to_v = nn.Linear(query_dim, heads * dim_head, bias=False)
self.to_out = nn.ModuleList([nn.Linear(heads * dim_head, query_dim), nn.Dropout(0.0)])
self.set_processor(MyModelAttnProcessor())
def forward(self, hidden_states, attention_mask=None, **kwargs):
return self.processor(self, hidden_states, attention_mask, **kwargs)
Consult the implementations in src/diffusers/models/transformers/ if you need further references.
Implementation rules
- Don't combine structural changes with behavioral changes. Restructuring code to fit diffusers APIs (ModelMixin, ConfigMixin, etc.) is unavoidable. But don't also "improve" the algorithm, refactor computation order, or rename internal variables for aesthetics. Keep numerical logic as close to the reference as possible, even if it looks unclean. For standard → modular, this is stricter: copy loop logic verbatim and only restructure into blocks. Clean up in a separate commit after parity is confirmed.
- Pipelines must inherit from
DiffusionPipeline. Consult implementations insrc/diffusers/pipelinesin case you need references. - Don't subclass an existing pipeline for a variant. DO NOT use an existing pipeline class (e.g.,
FluxPipeline) to override another pipeline (e.g.,FluxImg2ImgPipeline) which will be a part of the core codebase (src).
Test setup
- Slow tests gated with
@slowandRUN_SLOW=1 - All model-level tests must use the
BaseModelTesterConfig,ModelTesterMixin,MemoryTesterMixin,AttentionTesterMixin,LoraTesterMixin, andTrainingTesterMixinclasses initially to write the tests. Any additional tests should be added after discussions with the maintainers. Usetests/models/transformers/test_models_transformer_flux.pyas a reference.
Common diffusers conventions
- Pipelines inherit from
DiffusionPipeline - Models use
ModelMixinwithregister_to_configfor config serialization - Schedulers use
SchedulerMixinwithConfigMixin - Use
@torch.no_grad()on pipeline__call__ - Support
output_type="latent"for skipping VAE decode - Support
generatorparameter for reproducibility - Use
self.progress_bar(timesteps)for progress tracking
Gotchas
-
Forgetting
__init__.pylazy imports. Every new class must be registered in the appropriate__init__.pywith lazy imports. Missing this causesImportErrorthat only shows up when users tryfrom diffusers import YourNewClass. -
Using
einopsor other non-PyTorch deps. Reference implementations often useeinops.rearrange. Always rewrite with native PyTorch (reshape,permute,unflatten). Don't add the dependency. If a dependency is truly unavoidable, guard its import:if is_my_dependency_available(): import my_dependency. -
Missing
make fix-copiesafter# Copied from. If you add# Copied fromannotations, you must runmake fix-copiesto propagate them. CI will fail otherwise. -
Wrong
_supports_cache_class/_no_split_modules. These class attributes control KV cache and device placement. Copy from a similar model and verify -- wrong values cause silent correctness bugs or OOM errors. -
Missing
@torch.no_grad()on pipeline__call__. Forgetting this causes GPU OOM from gradient accumulation during inference. -
Config serialization gaps. Every
__init__parameter in aModelMixinsubclass must be captured byregister_to_config. If you add a new param but forget to register it,from_pretrainedwill silently use the default instead of the saved value. -
Forgetting to update
_import_structureand_lazy_modules. The top-levelsrc/diffusers/__init__.pyhas both -- missing either one causes partial import failures. -
Hardcoded dtype in model forward. Don't hardcode
torch.float32ortorch.bfloat16in the model's forward pass. Use the dtype of the input tensors orself.dtypeso the model works with any precision.
Modular Pipeline Conversion
See modular-conversion.md for the full guide on converting standard pipelines to modular format, including block types, build order, guider abstraction, and conversion checklist.