AIKIT
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks | AIKIT