Guidance for implementing neural network inference (like GPT-2) under extreme code size constraints. This skill should be used when tasks require implementing ML model inference in minimal code (code golf), parsing model checkpoints in constrained environments, or building transformer architectures in low-level languages like C with strict size limits.