Being Stuart Russell – The comeback of Moral Philosophy

Reading time: 9 minutes

Translation by AB – April 15, 2020


Stuart Russell

Stuart Russell

Stuart Russell

Stuart Russell is a multi-honored Artificial Intelligence (AI) researcher. He is currently a member of the “Future of Life Institute” (FLI) and of the “Center for the Study of Existential Risk” in Cambridge (A future without us). This eminent specialist was interviewed in 2015 by the excellent Quanta Magazine to raise the following question: how to ensure that our machines that have become “intelligent”, therefore more or less autonomous, conform to our “human values”? We are trying here to understand how an AI researcher considers certain aspects of this very broad question, to put himself in his place in a way: being Stuart Russell, so as to speak …

Stuart Russell is one of many co-authors of an open letter published on FLI website in 2015, urging researchers to focus on developing “virtuous” AI systems. This call will guide the Asilomar conference of 2017 and will lead to the development of the “principles” of the same name. In this open letter, Stuart Russell notably wrote:

Our AI systems must do what they are asked to do.

It seems rather necessary to us, but for an AI researcher this is not obvious. Russell claims that, if we are not careful, increasing the level of intelligence and autonomy of these systems could lead them to do anything other than what they were intended for. They should therefore at least be designed considering at first a “benefit for humanity” but above all a guaranteed benefit. This is the notion of robustness, that is to say of permanent compliance of our systems with the objectives initially planned.

This call is largely inspired by a small document written by Stuart Russell, Daniel Dewey and Max Tegmark, the latter being by the way a co-founder of FLI and the scientific director of FQXi (the “Foundational Questions Institute” originally funded by the John Templeton Foundation…). In this document1, we find some now classic concerns about the job market, the “disruptions” of models, those of finance for example, but also some openings concerning the relevance of our economic indicators. There is also the mention of research that should be carried out in terms of ethics and regulations concerning for example autonomous vehicles, smart weapons or protection of personal data.

Then, we were, three years ago, in one of these particular moments in the history of science and technology where researchers, from their own field, raise very general and still confused questions about the application of their discoveries in the economic and social field and therefore questions of ethics. But how do they approach these subjects? We believe it is important to understand this insofar as these same researchers now guide all political thinking on the subject.

“Human Values” ?

Quanta Magazine2: You think the goal of your field should be developing artificial intelligence that is “provably aligned” with human values. What does that mean?

Stuart Russell: It’s a deliberately provocative statement, because it’s putting together two things — “provably” and “human values” — that seem incompatible. It might be that human values will forever remain somewhat mysterious. But to the extent that our values are revealed in our behavior, you would hope to be able to prove that the machine will be able to “get” most of it.

These words may seem odd to those unfamiliar with AI, particularly the strange ease with which Russell conjures up “human values” as if they were defined characteristics or attributes. So, what does “human values” mean? We come back to this later.

For Russell, the principle of acquisition of these “values” by a machine does not seem too difficult: it would be enough for the machine to induce, deduce or imitate them (we do not know too much) more or less explicitly from of our “behavior”. It should be remembered here that the learning paradigm is the foundation of most AI solutions. The machine gradually learns from examples to produce the best “reaction” to a given situation: recognize a face, perform a movement in the game of go, etc. It is therefore not surprising that, according to Russell, our values being revealed by our behavior, they can be acquired by a machine learning to reproduce this behavior. By analyzing the state of the machine after this learning, it would then be possible to prove that our values have been acquired.

We understand the reasoning well but it rests on two rather questionable premises.

First of all, our values could be deduced entirely from our behavior. Suppose that we have an almost infinite capacity for learning and that we can thus analyze all the behaviors of everyone, read all the books, watch all the films … Could we deduce from this what we must do in all circumstance from the point of view of “values”, but also what we should not do? That’s what Stuart Russell seems to think (what about you?). Second, our values should be fairly stable and universal. However, it is far from obvious that we share them all or that they are immutable.

“Human Values” !

Now let’s go to the technique. We will better understand what Russell means by “human values”:

Stuart Russell: Where does a machine get hold of some approximation of the values that humans would like it to have? I think one answer is a technique called “inverse reinforcement learning.” Ordinary reinforcement learning is a process where you are given rewards and punishments as you behave, and your goal is to figure out the behavior that will get you the most rewards. […] Inverse reinforcement learning is the other way around. You see the behavior, and you’re trying to figure out what score that behavior is trying to maximize. For example, your domestic robot sees you crawl out of bed in the morning and grind up some brown round things in a very noisy machine and do some complicated thing with steam and hot water and milk and so on, and then you seem to be happy. It should learn that part of the human value function in the morning is having some coffee.

If the pleasure of having coffee is a “human value”, then by “human value” is meant “benefit” in the most general sense. But above all, “approximate values” or use the term “value function” is to place “human values” in the digital scope.

We inevitably think of another concept, economic one: utility3. Utility represents the satisfaction experienced by someone in the consumption of a good or service. Satisfaction and well-being are eminently abstract concepts and economists therefore measure usefulness from observed preferences (samples of consumers are asked to classify items by “preference”, for example). Russell’s “value” and the “utility” of economists work in exactly the same way: they digitize and mathematize abstract concepts to make them manipulated by a system or an algorithm. The usefulness of a good, the value of a behavior are deduced by sampling and learning.

But this analogy also has limits. Because utility is a concept with positive value: in economics, there is no zero utility, even less negative. Consuming a good or a service is always better than refraining from it. Indeed, utility is the measure of what consumption does to us, regardless of what it does to others. Now, “human value”, if it is measured, quantifies action in terms of what it produces in us but also in others. It cannot be the intrinsic and absolute measure of a given action but must consider the context. The “value function” evoked by Russell is, from this point of view, ineffective. If we want to extend the analogy with economy, while preserving a more consequentialist approach, we should perhaps look for the notion of “externality” ...

For the moment, let us remember this: when Russell and his peers evoke “human values”, they conceive them, voluntarily or not, as constitutive of an “economy of values”, that is to say of a mathematical field.

Invisible Hand

Russell is optimistic. Our machines are now able to learn and they only have to observe our behavior to acquire our values. These values would be nothing other than measures of situations, therefore accessible to the algorithmic. But there is at least one other reason to be confident:

Stuart Russell: If you want to have a domestic robot in your house, it has to share a pretty good cross-section of human values; otherwise it’s going to do pretty stupid things, like put the cat in the oven for dinner because there’s no food in the fridge and the kids are hungry… […] There’s a huge economic incentive to get it right. It only takes one or two things like a domestic robot putting the cat in the oven for dinner for people to lose confidence and not buy them.

Russell’s vocabulary is still a bit confusing, but “being Stuart Russell” is already getting easier. By “pretty good cross-section of human values“, we must understand that we have instilled in our domestic robot, by a learning technique, the ability to measure a large number of situations but not all. It misses for example “putting the cat in the oven”. But in this case, the invisible hand intervenes: the robot will never find a take.

Let us remember that the designer of an AI therefore has this last safeguard: if his creation does not respect human values, it will naturally be rejected. This will most likely be the case in general, but society is not free from strange individuals, for whom “putting the cat in the oven” remains an acceptable possibility.

What about “true” AI?

Stuart Russell traces a path leading to techniques for respecting human values. It preempts any concerns we may have before we express them in our own way. But basically, even if these questions are far from being resolved, we share Stuart Russell’s feeling that there is no essential difficulty when we speak about “weak” AI. On the other hand, all AI researchers share the same problematic horizon (which no one knows, apart from a few gurus, if it can be reached): a machine capable of reflexivity, improving on its own, faster and much better than could have done it, to the point of overtaking us.

Stuart Russell: Could you prove that your systems can’t ever, no matter how smart they are, overwrite their original goals as set by the humans? […] that’s a serious problem if the machine has a scope of action in the real world. […] They will rewrite themselves to a new program based on the existing program plus the experience they have in the world. What’s the possible scope of effect of interaction with the real world on how the next program gets designed? That’s where we don’t have much knowledge as yet.

The AI researcher reaches a limit here that he can never overcome alone. Let us remember that, when we hear Elon Musk raising alarms every two weeks, beyond that he does it in his own interest, it is this limit that he evokes, this horizon of the machine which is reprogrammed in the real world, like a living being.

The comeback of Moral Philosophy

Here is how an AI researcher talks to us today about “human values” and how to make them respected by “intelligent” machines. Articles on this subject will proliferate as AI techniques penetrate our daily lives, and all Stuart Russell will be summoned to speak. Might as well try to understand them.

It is clear that some moral injunctions, the injunction “zero” being “you will not kill”, remain to be interpreted on a case by case basis (“you will not kill” except in the case of self-defense …). Morality is not fixed: it adjusts to our progress and is sedimented by jurisprudence. The world modified by machines will inevitably ask us new moral questions, will invite us to propose new “human values”, which are not yet present in existing films and books. But already, the push of AI forces us to question our values, what they are, how we acquire them, what we do when they are violated, etc. There is no doubt: as Joshua Greene, a psychologist at Harvard University, writes:

Before we can program our moral values into machines, we must strive to clarify and make them coherent, which could be the moment of truth for Moral Philosophy in the 21st century.

Even “Being Stuart Russell”, we fully agree.


1. Stuart Russell, Daniel Dewey, Max Tegmark – 2015 – Research Priorities for Robust and Beneficial Artificial Intelligence
2. Natalie Wolchover in Quanta Magazine – April 21, 2015 – Concerns of an Artificial Intelligence Pioneer
3. Wikipedia Utility

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.