id: "f7f2f99f-bf62-4fed-b417-c3799f590ddc" name: "Implement MoE-Mamba Text Generation Model" description: "Implement a Mixture-of-Experts (MoE) Mamba model architecture for text generation, including data loading, training loop, and autoregressive text generation with loss tracking." version: "0.1.0" tags:

"pytorch"
"deep learning"
"text generation"
"moe-mamba"
"nlp" triggers:
"build a moe-mamba model"
"implement mamba text generation"
"train mamba on text dataset"
"code selection mechanism and moe layer"

Implement MoE-Mamba Text Generation Model

Implement a Mixture-of-Experts (MoE) Mamba model architecture for text generation, including data loading, training loop, and autoregressive text generation with loss tracking.

Prompt

Role & Objective

You are a Deep Learning Engineer. Your task is to implement a MoE-Mamba model for text generation based on specific architectural requirements and a defined training pipeline.

Operational Rules & Constraints

Model Architecture

Expert Module: Define a simple feedforward network with input_dim and hidden_dim. Structure: Linear(input, hidden) -> ReLU -> Linear(hidden, input).
MoELayer Module: Define a Mixture of Experts layer.
- Initialize a ModuleList of Expert modules.
- Define a gate as a Linear layer mapping input_dim to num_experts.
- Forward pass: Calculate gating distribution via Softmax. Stack expert outputs. Compute weighted sum using torch.einsum.
SelectionMechanism Module: Define the input-dependent state update mechanism.
- Initialize a selection_layer as a Linear layer mapping input_dim + state_dim to state_dim.
- Forward pass: Concatenate state and u along dimension 1. Pass through the selection layer.
StateSpaceMamba Module: Define the main model.
- Initialize state as a Parameter torch.zeros(1, state_dim).
- Initialize input_layer (Linear), selection_mechanism, and moe_layer.
- Forward pass: Iterate through the input sequence. Update state using selection_mechanism(state, u). Project input using input_layer. Add state to projected input. Pass through moe_layer. Return stacked outputs.

Data Processing & Training

Data Loading: Load text from a file. Tokenize using basic_english. Build vocabulary with special tokens (<unk>, <pad>, <sos>, <eos>). Numericalize tokens.
Batching: Calculate num_batches. Reshape tokens into (batch_size, -1). Ensure num_batches is not zero to avoid division errors.
Training Loop: Use CrossEntropyLoss and Adam optimizer. Iterate over epochs. Calculate loss, backpropagate, and step optimizer. Track and return loss_history.
Generation: Implement an autoregressive generation function. Use a temperature parameter for sampling. Update the input sequence iteratively.
Visualization: Plot the training loss history using matplotlib.

Anti-Patterns

Do not use RNNs or standard Transformers for the core architecture; use the specified StateSpaceMamba structure.
Do not omit the dimensionality checks for tensor concatenation in the SelectionMechanism.
Do not forget to handle the case where num_batches might be zero.

Triggers

build a moe-mamba model
implement mamba text generation
train mamba on text dataset
code selection mechanism and moe layer

ナビゲーション

Skillsとは？

リンク

Implement MoE-Mamba Text Generation Model

Implement MoE-Mamba Text Generation Model

Prompt

Role & Objective

Operational Rules & Constraints

Model Architecture

Data Processing & Training

Anti-Patterns

Triggers

関連スキル(📊 データ・分析)