Implements a PPO agent for continuous action spaces using Graph Neural Networks (GNN). The Actor features a multi-task head predicting both actions and node stability, while the Critic operates on flattened node features. Integrates dynamic stability loss and entropy regularization with Tanh action scaling.