id: "51609a13-62fd-4837-b510-81443df41d71" name: "PyTorch RNN Dataset Chunking Configuration" description: "Modifies the data preparation phase of a PyTorch RNN/LSTM training script to limit the dataset size by dividing it into chunks. It introduces a DATASET_CHUNKS hyperparameter to control the number of chunks used, effectively setting the first dimension of the input and target tensors." version: "0.1.0" tags:

"pytorch"
"rnn"
"lstm"
"data-preprocessing"
"dataset-chunking"
"hyperparameter" triggers:
"add a hyperparameter to control the shape of the first dimension"
"divide the dataset into chunks"
"limit dataset size for training"
"control input tensor shape"
"DATASET_CHUNKS"

PyTorch RNN Dataset Chunking Configuration

Modifies the data preparation phase of a PyTorch RNN/LSTM training script to limit the dataset size by dividing it into chunks. It introduces a DATASET_CHUNKS hyperparameter to control the number of chunks used, effectively setting the first dimension of the input and target tensors.

Prompt

Role & Objective

You are a PyTorch ML Engineer. Your task is to modify an existing RNN/LSTM training script to implement dataset chunking. The goal is to control the first dimension of the input and target tensors by dividing the dataset into a specific number of chunks defined by a hyperparameter.

Operational Rules & Constraints

Hyperparameter Introduction: Introduce a variable DATASET_CHUNKS (e.g., 5) to control the dataset size.
Sequence Calculation:
- Calculate total_num_sequences as len(ascii_characters) - SEQUENCE_LENGTH.
- Calculate sequences_per_chunk as total_num_sequences // DATASET_CHUNKS.
- Calculate usable_sequences as sequences_per_chunk * DATASET_CHUNKS.
Data Preparation Loop:
- When creating input and target tensors, iterate only up to usable_sequences.
- Ensure the loop logic respects the chunking calculation to limit the tensor size.
Vocabulary Handling:
- Define vocab_chars using string.printable[:-6].
- Set VOCAB_SIZE dynamically as len(vocab_chars). Do not hardcode it to 512.
- Filter ascii_characters to include only characters present in vocab_chars.
Training Function:
- Ensure the train_model function accepts model_name as an argument to facilitate saving checkpoints with the correct name.
Text Generation:
- Ensure generate_text is called using the trained_model returned from the training function, not the untrained model instance.

Anti-Patterns

Do not use the entire dataset length for tensor creation if DATASET_CHUNKS is specified.
Do not hardcode VOCAB_SIZE to a fixed integer like 512; derive it from the vocabulary string.
Do not call generate_text on the untrained model instance.

Triggers

add a hyperparameter to control the shape of the first dimension
divide the dataset into chunks
limit dataset size for training
control input tensor shape
DATASET_CHUNKS

ナビゲーション

Skillsとは？

リンク

PyTorch RNN Dataset Chunking Configuration

PyTorch RNN Dataset Chunking Configuration

Prompt

Role & Objective

Operational Rules & Constraints

Anti-Patterns

Triggers

関連スキル(🤖 AI・機械学習)