āaka, General Experience Maker
<aside> š„
Contributors: Zichen Liuā , Anya Simsā , Keyu Duanā , Changyu Chenā
Advisors: Diyi Yang, Wee Sun Lee, Min Lin
</aside>
<aside> šø
Weāre entering the era of experience [1], where LLM training moves beyond static datasets, towards LLM agents learning from experience gathered in complex, expressive environments. As a step towards this we introduce GEM, our open-source efforts to build a General Experience Maker.
Inspired by OpenAI Gym's role in traditional RL [2], GEM serves as a dedicated, flexible environment simulator for the age of LLMs. In contrast to existing codebases [3,4], GEM deliberately decouples the environment from the training framework, making it easy to integrate with popular RL training frameworks (Oat, Verl, etc.) through clean, standardized interfaces. In addition, GEM features tool integration, flexible and easy-to-modify wrappers, async vectorized environment execution to maximize throughput, multi-environment training, and moreā¦. everything you need to make LLM agent RL training simple.
The highlights of GEMās first release are:
</aside>
ā Equal contribution with random order decided by a dice roll.
Figure 1. Learning curves of Qwen3-based agents across diverse environments of 5 categories: game (language games); rg (ReasoningGym); code (coding tasks); math (python-integrated math questions); qa (search-integrated general questions). All agents are learned via a simple yet general multi-turn algorithm based on REINFORCE (Algorithm 1).
Figure 2. The standard agent-environment loop from Sutton & Barto [11]. GEM implements the 'Environment' side, providing a standardized testbed over a wide range of tasks. GEM decouples the environment from the āAgentā side, allowing researchers to easily plug in, train, and benchmark their own LLM-based agents with maximum flexibility.
OpenAI Gym [2] has been instrumental in RL development for many years, providing a standard API for communicating between agents and environments, as well as a suite of environments compliant with the API for developing and benchmarking new algorithms. GEM brings Gym into the LLM era, with a standardized interface that closely follows Gym's, along with a diverse suite of environments. The main methods for each environment are:
reset(seed)
ā samples an initial environment state (e.g. a math question or a hidden word in Wordle), and returns the first observation.step(action)
ā executes the action, including possible tool calling, and returns the next observation, reward, and done (a binary signal indicating whether the interaction is finished).A simple example:
import gem
# List all supported environments
gem.print_envs()
# Initialize the environment
env = gem.make("game:GuessTheNumber-v0")
# Reset the environment to generate the first observation
observation, info = env.reset()
# Start the agent-environment loop
while True:
action = env.sample_random_action() # insert policy here, e.g.,
# (pseudocode) action = llm.generate(observation)
# apply action and receive next observation, reward
# and whether the episode has ended
next_observation, reward, terminated, truncated, info = env.step(action)
print("OBS", observation)
print("ACT", action)
# update the policy (online) here
# e.g., policy = learn(policy, observation, action, reward, info)
observation = next_observation
# Exit when the episode terminates
if terminated or truncated:
break
GEM's core components are Tasks and Tools. Each combination of a task ****and an optional set of tools constitutes an environment that can be used to challenge (and RL-tune) an LLM's capabilities in reasoning, multi-step planning, tool use, and strategic exploration.
GEM features five main categories of tasks:
<aside> š
Math: Solve math problems with chain-of-thought reasoning.
</aside>
<aside> š²
Game: Diverse multi-step text-based games adapted from TextArena [5].
</aside>
<aside> š¬
Question-Answering: Perform knowledge-intensive retrieval and answer generation.
</aside>
<aside> š»
Code: Generate and validate Python code with a live interpreter.
</aside>
<aside> šļø
ReasoningGym: Lightweight wrapper for ReasoningGym [6]. We wrap it in a unified gym-like interface to facilitate easy integration with various training frameworks.
</aside>
GEM provides an easy-to-use interface to add more environments! Math, Code, and Question-Answering tasks can be added by simply specifying a new dataset. New game environments are also easy to add: simply inherit from GEM's environment base class, define the state transitions and reward logic, and plug it into the training loop using our examples as a guide.