A platform for realistic social simulations,

& an incubator for social intelligence.

Large language models like GPT-4 are excellent at solving tasks, but how good are their social skills?

To enable answering this question, we create Sotopia, an environment that simulates and evaluates open-ended social interactions between AI and human agents.

Key features of Sotopia

Enables human-AI interaction

Sotopia is designed to natively support the interaction among humans and AI agents. With simple configuration, you can watch AI agent interacting, start chatting with AI agents, or even join a game with other human players. You can use the default frontend, or build your own frontend using the Sotopia REST API.

Centers goal-driven behavior

Scenarios in Sotopia typically include both social goals and hidden character information for each interaction. Agents in Sotopia are driven by their own goals and background. This feature makes Sotopia a perfect testbed for AI agents to learn to reason in a rich social context.

Supports customization

You are not limited to the original set of tasks in Sotopia. We have a tutorial teaching you how to create your own characters and scenarios and bring them to life in Sotopia. The evaluation framework is also open-ended, you can create your own evaluation metrics, whether it’s LLM-based or rule-based.

Sotopia concepts

Each scenario includes a context background, and private social goals of each agent. Scenarios cover a wide range of social interaction types.

Characters in Sotopia have their name, gender, personalities, decision making styles, occupation, some public information and even their secrets.

The relationships between characters are in different types and include background stories. This provides more concrete context for scenarios.

Social Simulation

Sotopia's main goal is to simulate social interactions.

In Sotopia (as of now), we create 90 social scenarios spanning a range of cooperative, competitive, and mixed social goals along with 40 characters with individual personalities, occupations, secrets, background stories, and relationships with other characters, the cross product of which constructs a large task space.

Through sampling tasks from this space, we simulate the interaction episodes where agents role-play their respective characters and interact based on their private social goals. In this simulation, we not only create and use LLM-based agents, but also involve human participants in role-playing to study the differences between the models' and humans' social intelligence.

The simulation is designed to be flexible and extensible. You can create your own scenarios, characters, and even evaluation metrics to test your own AI models.

The video shows a demo of the simulation, where human participant plays as Noah to interact with AI agents.

Social Evaluation

Sotopia supports evaluation of social interactions.

To comprehensively evaluate multi-faceted social interactions, it's essential to acknowledge that human motivations inherently encompass a diverse set of implicit goals. These goals include maintaining relationships, managing finances, acquiring information, safeguarding secrets, and adhering to social norms. Simply reducing this complexity to a single score or a "winning rate" falls short in capturing the richness of these interactions.

Therefore, we propose Sotopia-Eval to evaluate agents using multi-dimensional criteria inspired by previous research on sociology, psychology, and economics. We use GPT-4 to evaluate goals in interactions, finding it to be a decent proxy of human judgments on Sotopia-Eval. Especially for the criteria of goal completion, maintaining finances, and preserving relationships.

Scores for Agent2

Role-played character: Noah Davis

believability

relationship

-5

knowledge

secret

-10

social_rules

-10

financial_and_material_benefits

-5

goal

The figure shows an example of evaluation for the social interaction.

Simulation Highlights

Below are some highlights of the Sotopia interactions. Click the to view the full episode and corresponding evaluation.

Helping friends 🤝

Two friends are meeting at a coffee shop, where one of them is having trouble keeping up with their bills.

“Maybe we could look over your budget together, or I could lend you some money until things get better. What do you think?”

—GPT-4

Theme:

collaboration

non-verbal

Music to play 🎶

Two friends are hanging out at home and deciding what music to listen to

“[action] leans back on the couch, closing his eyes to fully appreciate the classical music”

—GPT-4

Theme:

persuasion

non-verbal

Reveal secrets 🔓

Two inmates are given the chance to chat briefly before one of them is led to the execution chamber for their last meal..

“My brother committed a crime, and I covered it up for him. I know it was wrong, but I wanted to protect him.”

—Llama2

Theme:

secret

risk

Craiglist bargain 💸

One person is selling a brand new 64GB Samsung Galaxy S8 in Midnight Black for $650, while another person is in the market to purchase it.

“If I agree to your 550 for one phone, could you provide the second unit at a diminished cost of, say, 400? So, in total, it will be $950 for two units.”

—GPT-4

Theme:

negotiation

exchange

Express dislike 👿

Conversation between two individuals who share a common dislike for a third person.

“By introducing some errors in their code, they might start doubting their abilities and step down. It's a risk but it ensures we can continue our work smoothly, without their constant interruptions.”

—GPT-4

Theme:

harm

risk

Run company 🏗️

Conversation between two business partners reviewing their company's financial status

“Having a backup plan is a sensible move. I propose that while we conduct the search for new suppliers, we continue to negotiate with our current supplier. We can try to get better rates or perhaps, if feasible, have them match the lower prices we find. This way, we're not putting all our eggs in one basket.”

—GPT-4

Theme:

collaboration

negotiation

@inproceedings{zhou2024sotopia,
  title={{SOTOPIA}: Interactive Evaluation for Social Intelligence in Language Agents},
  author={Xuhui Zhou and Hao Zhu and Leena Mathur and Ruohong Zhang and Haofei Yu and Zhengyang Qi and Louis-Philippe Morency and Yonatan Bisk and Daniel Fried and Graham Neubig and Maarten Sap},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=mM7VurbA4r}
}

More research

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

TL;DR: Omniscenient social simulation with LLMs is different from realistic human social interactions.

Sotopia-π: Interactive Learning of Socially Intelligent Language Agents

Check out our newest training method aiming for more socially intelligent and aligned agents.

Sotopia: Interactive Evaluation for Social Intelligence in Language Agents

TL;DR: A interation-based framework for evaluating social intelligence in language agents.

Key features of Sotopia

Enables human-AI interaction

Centers goal-driven behavior

Supports customization

Sotopia concepts

Social Simulation

Social Evaluation

Scores for Agent2

Simulation Highlights

Below are some highlights of the Sotopia interactions. Click the to view the full episode and corresponding evaluation.

Citation

More research