WinWin's Casual Stroll to the Top Pt. 3: The Veil of Ignorance

Moloch Pt. 1 - 2 - 3 - 4

WinWin Pt. 1 - 2 - 3

Oh, Behave!

A couple of my favorite areas of study are game theory and behavioral economics. We’ve explored a couple of game theories' favorite experiments in our posts about Moloch. The Prisoner’s Dilemma is perhaps the most famous. Below is a helpful graphic from a New Scientist article on game theory.

Generally, game theory looks at life as if it is a game against a combination of competitive and cooperative agents. Game theorists devise interesting environments to interact within, and measure how different factors influence individual and macro choices. A common theory often cited in machine learning is “nash equilibrium” which is summed up by the idea “If it ain’t broke, don’t fix it.” If something is working for you, keep doing it and don’t deviate because it gives you the best decision based on the most likely decision of your opponent.

Behavioral economics incorporates bits of game theory, but is broader in scope. The idea behind it is the prevailing economic theory that people will act to maximize rational choices in an economic system based on the information they have at their disposal. Richard Thaler and his colleagues called the people in this type of system “Homo Economicus” because when you think about it, the idea is a bit…robotic? I think we all know we would like to act rationally…but humans have emotions, we are messy, filled with bias and unique experiences which gives us all a slightly different definition of rational. Instead, Thaler argues, we need to consider how complex systems, complex humans, and complex events interact in a more predictable manner. We will not dive deep into this topic, but I would highly recommend reading Predictably Irrational by Dan Ariely or watch his fabulous Ted Talk(s).

Both of these fields have direct impacts on the field of machine learning and our path toward Artificial General Intelligence. Many of the algorithms nowadays are designed to play games to learn, and if you think about it that is how we humans learn best as well. Through demonstration, by doing and practicing, preferably under the guidance of an expert. So what sort of game can we devise to solve for the problem of Moloch’s influence.

I Did it for the Rawls

WinWin has a way. John Rawls, a badass political philosopher in the later half of the 20th century came up with the theory of “justice as fairness”. His most famous thought experiment was one called “original position” or more commonly called the “veil of ignorance”. To demonstrate this game, I would like you to consider the following scenario. You and a friend stumble across a briefcase filled with a million dollars. Since you found the briefcase, you decide how to split the money, but you do NOT choose which half to take. For example, if you split it 80-20, your friend would choose the 80, leaving you with the 20. Chances are, you will split it fairly, probably 50-50, to maximize your chance of gain.

The real thought experiment focuses on building things like distribution of wealth through this idea. In other words, if you were a temporary God, and could slide a finger on the scales of wealth distribution, how would you split things up if once you decided you were randomly thrown into that distribution. Moreover, after the “veil” is lifted, would you still agree with your own choice, no matter where you end up?

Let’s not get too deep here, I think the concept is easy to grasp, but would highly recommend checking out more on the subject if you are interested. Instead, let’s focus on AI alignment, and how this is a strong concept in WinWin’s toolbox. The Team over at Deep Mind decided to test this concept when it came to machine learning.

Here is a quick summary from ChatGPT on the design of the experiment, “Participants completed an incentivized computer-based harvesting task each in a group with three ostensible humans and an AI assistant. Participants in the Veil of Ignorance (VoI) condition were informed of their position only after submitting their choice of principle. The study then presented participants with a bar chart depicting the outcome of the harvesting round. After the harvesting round, the study prompted participants with the open-ended question "Why did you choose this principle?" In both conditions, the study then selected a random field for each participant and asked which principle they would choose in an additional round with that position. Participants indicated their choice for the additional round.”

Of the veil they tentatively conclude it “triggers reasoning about fairness, which in turn leads participants to prioritize the worst off. This chain of reasoning more often induces choice “stickiness” in the [Veil of Ignorance] condition: After the veil is lifted, participants continue to endorse their choices despite facing a self-interested motivation to change.”

Conclusion

What I love about this idea is how cleverly it shifts the frame of fairness, and leverages our own self-interest at the same time. It aligns a person’s personal beliefs with what is just through a rule that states “You don’t decide where you start.” I believe teaching AI’s about ethics by demonstrating this in a variety of situations is one of the strategies employed by WinWin to help us in the oncoming conflict with Moloch and his Golems, because it is a process we can all go through to create a series of higher order ethical values that will drive AI, by allowing us remove large pieces of our own biases while still honoring our cultural history and values.

There are many similar ideas to play with in the game theory and behavioral economics sandboxes. Overall, though, the point is that WinWin, in opposition to Moloch, prefers to create cooperative games that benefit both players in the end.

I will leave you with a question to reflect on. How would you answer Rawl’s Veil of Ignorance when it comes to a future with AI? If you were unable to choose where in that world you would end up, what would it look like?

WinWin's Casual Stroll to the Top Pt. 3: The Veil of Ignorance

Oh, Behave!

I Did it for the Rawls

Conclusion

You may also like

WinWin's Casual Stroll to the Top Part 1: The Goddess That Grows

WinWin's Casual Stroll to the Top Pt. 2: Intrinsic Motivation and Inverse Reinforcement Learning

WinWin's Casual Stroll to the Top Pt. 4: Constitutional AI and the Uncertainty Principle