Does Ai Benefit the World – Chelsea Troy

Does AI benefit the world? – Chelsea Troy #

Excerpt #
Photo Credit: Tyler Pasciak LaRiviere for the Chicago Sun-Times “Let’s start with cars.” My students break out into groups of four, each quartet gathered around a collection of pi…

Reading Time: 20 minutes

Photo Credit: Tyler Pasciak LaRiviere for the Chicago Sun-Times

“Let’s start with cars.” #

My students break out into groups of four, each quartet gathered around a collection of pink and green post-it notes.

For the first few minutes, the room remains quiet while each student jots down a personal story about one way that cars have impacted their lives. As the students finish and begin to read their stories to each other, the air fills with the gentle hum of conversation.

The first quartet to finish their discussion approaches the whiteboard. Two of their post-its share a theme: stories about a car getting a family member to the hospital quickly. Those post-its get placed on the whiteboard together. The other two stories—one about working at a car wash as a teenager, the other about getting hit by a car turning right on red while cycling through an intersection—go to different respective whiteboard locations.

More groups approach the board and attach their post-its. Stories under the same theme go together, creating several little collections. Each collection tends to have one color of post-it (green for perceived positive impact, pink for perceived negative impact). A few cases have mixed colors: for example, a few additional post-its join the one about working at a car wash. In the next step, when I ask students to students label each collection, this grouping will receive the label “car-related jobs.” Students either loved or hated the jobs.

For homework, I ask the students to do two things. First, they add digital versions of their post-its to an enormous Google slide, grouped together and labeled just like our whiteboard. Second, they collect more car-related stories from three sources:

Their own experiences: 2 more posit-its
Two additional friends or family members: 2 more post-its
A Twitter hashtag—a different one selected by each student from a list that I predicted would broaden the range of perspectives beyond those available inside the classroom. The hashtags bring in the anecdotes of mechanics, cyclists, pedestrians, traffic engineers, pulmonologists, and a variety of other groups. Again, 2 more post-its.

By the next class period, the Google slide’s septupled post-it population includes several more groups, which we finish labeling together. Our original list includes car-related jobs, emergency medical transport, cycling accidents, parking costs, noise. Now we add several more: pollution, storefront visibility, status symbols, child safety, travel time, carbon footprint, and others.

Next, I offer each student an unlimited number of three types of emojis:

A small fire for “consequences”
A human silhouette for “prevalence”
A snake for “sneakiness”

I ask each student to place the fire emoji next to each label for which the outcomes sound dire to them: most students put these, for example, on the labels with obvious and immediate life or death outcomes. Similarly, they apply the silhouette sticker next to labels that they think affect a lot of people, and the snake next to labels that they think people might not notice or think about relative to the size of the outcome.

We’re doing an impact assessment not unlike the risk assessments I ask of students in software engineering classes. This class, though, focuses not on code, but on what code does to Us—and on how we, its authors, can evaluate and influence that.

The next homework assignment lets each student select a different category label from the slide. They’ll each do some research and come back with a short report comparing the class’s emojified impact assessments to actual literature and findings. Where are we accurate? Where do our estimates not match the information students find?

Over the course of the nine week quarter, we do this three-session activity three times. #

Each time we start from our own anecdotes, then broaden our search to additional perspectives, and finally compare our own assessments to population data and available studies. First, we discuss the impact of cars. Then, we discuss the impact of the consumer internet. Finally, we do generative text and image models—in other words, the thing colloquially termed “AI.”

Three times sounds like overkill, but I have found that the prior practice on less nascent, hotly contested developments like cars and the internet helps defuse conflict on the third topic. After we do the internet, I ask students to reflect on the patterns of impact they’ve seen across the car and internet discussions. Their comparisons draw out some common themes:

Both cars and the internet boast some excellent anecdotal outcomes in isolated cases.
Both cars and the internet boast some positive outcomes for millions.
Both cars and the internet have produced negative outcomes that affect a lot of the same people who experience the positive outcomes, plus some collection more.
Many of the negative outcomes emerge from failure to predict and implement against emergent properties of the technologies becoming popular and widely accessible.
The direst consequences often affect different people than the most glittery benefits, and those divisions tend to break along class lines.

I also ask students to reflect on the patterns they witness in the difference between their impact assessments before and after their research projects. Again, common themes arise:

Students tend to overestimate the proportion of the population that experiences the positive outcomes relative to the negative outcomes (worth noting here that the opportunity to participate in a graduate program tends to select for class).
Students tend to drastically overindex on their own experience when estimating the impact (positive or negative) of a technology on any given life globally.
When faced with an explicit menu of positive and negative impacts, students tend to rapidly identify new-to-them opportunities to retain positive outcomes while reducing or eliminating negative outcomes.

When I then tell students we’re about to do this exercise a third time for “AI,” they approach with curiosity. They wonder whether the patterns they’ve observed will appear again. They hope to beat their prior collective accuracy at assessing impact in each category label.

A student displayed this map—found in this article from this data source—while comparing their findings to class estimates about the internet’s impact on job networking opportunities. They concluded that the pattern of WHO could access these networking opportunities might look less evenly distributed than they initially assumed, and they also noted the correlation between darkness of hue on this map and population in our classroom from various countries.

A few weeks ago, my day job hosted a training about ChatGPT. #

Members of the company expressed both wanton excitement and deep concern about the normalization of its use in the workplace.

Afterwards, a colleague of mine asked me about the discussion. He’d been using the products since GPT-2 in 2019, and he really enjoys them. He wanted to know: “outside of speculating net benefit, why so many concerns with Gen AI? And what are they?”

Then he shared an anecdote: the chatbot had helped his wife uncover the root of some health issues she was having that were otherwise terminal (sic). He also described really enjoying using ChatGPT, personally. From his perspective, sure, there are environmental concerns and social problems with oppression, but these don’t overwhelm the positives.

We talked about the topic for a while.

Full disclosure: I don’t think “net impact” makes sense to estimate at scale. #

I understand why businesses like to calculate this. They want to compare cost to revenue to figure out whether they’ll have more money or less money as a result of building their product. But besides a particular, well-circumscribed set of individuals deciding whether a project lives or dies, I don’t have much use for the calculation beyond philosophical navel-gazing.

How could we possibly approach an accurate estimate? We’re not just talking about money here: we’re talking about general impact. That amount doesn’t come in a convenient, single numerical unit. It’s instead delivered in a million different currencies, from health to ecstasy to money to misery. You can’t convert these into like metrics.

And we don’t get to scope our analysis to a relatively homogeneous customer base like companies get to do. Global impact, by definition, includes everyone, and it taunts us with the question “impact to whom?” It ain’t as small a world as the specious aphorism claims. Most of us live within ten miles of neighborhoods in which no one thinks like us, and we’ve never had a single person from there over for board game night, and we never will. By traveling twelve thousand miles east or west, most of us can find ourselves in a land whose language we can’t even begin to comprehend, whose cultural customs would tie us in knots, and whose celebrities we’ve never heard of. The mini-worlds into which we isolate by building networks of people like ourselves—those worlds are small. THE world? Enormous, and capable of absorbing a kaleidoscope of impact patterns whose news will never, ever reach us.

I don’t ask my students to decide whether the arrival of cars, or the internet, or generative models were good or bad for society. In fact, I want them to reach the conclusion that they’re wholly unqualified to do so. I watch them discover the patterns of inaccuracy in their assumptions about impact. Then I watch them look at their post-it maps and brainstorm ways to improve the net impact regardless of where the navel-gazing lands on its current value. For me, this is enough.

But for folks who have spent years in industry, hounded by executives to consider the bottom line first and foremost, I understand the motivation to peg a theoretical impact number relative to zero. My conversation with my colleague starts there.

We begin, like the students, 135 years before ChatGPT: with cars. #

A car permits someone, my colleague notes, to be driven to the hospital when found in critical condition an unreachable by an ambulance. It’s true that cars allow individuals to drive other individuals to hospitals. I’d place this among the “excellent anecdotal outcomes” students notice in their assessments. Along with that, the annual death toll of car crashes in the US alone—that “same people who experience the positive outcomes, plus some collection more” group that my students identify—is in the tens of thousands. This does not count pedestrian deaths or cyclist deaths: just people in cars. Then there are deaths caused by comorbidities with asthma, whose rates have risen due to pollution from vehicle exhaust. A dire outcome, and one that affects many who live in polluted areas and without the financial resources to access a car themselves. “Okay,” my colleague concedes, “but what about the distribution of food and access to information?” A fair point. As it happens, most commercial freight travels by train or ship or semi-truck, not cars. But then, of course, what about the drive to the grocery store? That last leg requires a car, doesn’t it? For some, yes. For others, no. For still others living in food deserts, the answer might be yes if they had the resources to access a car, which again, many don’t.

Reluctant to get too deep in the weeds and lose the original thread, we move on to discussing the impact of the internet. I like this example because often folks discuss the value of generative models, not in terms of what they can do right now, but in terms of what they theoretically should be able to do 30 years from now. The consumer internet is a little over 30 years old and aptly demonstrates the ambiguity of attempted impact calculations.

I ask my colleague if the internet has improved the world. He replies “You and I are talking right now using it, so yes.” I point out “you and me. The top 2% of privilege. How many have the opportunity to interact like this?” The official number, as of this writing: 5.35 billion. That’s about two-thirds of the population, officially; I couldn’t find definitive details on what amount of activity constitutes using the internet, nor what the network strength distribution looks like over those people.

It’s worth noting that we (myself included), people reading this, generally consider the existence of the internet to be a well-received development. It’s also worth noting that our impression comes from personal reference and survivorship bias. For us to hear someone’s opinion on this, they have to either have access to the internet, or they have to have a relationship to us (which means, given that we have access to the internet and given that we network extremely homogeneously, they almost certainly have access to the internet).

I’m not saying kill the internet and live on subsistence farms, which is what my colleague accused me of next. It’s just that, for a software engineer to estimate a global 30 year positive impact for a nascent technology because that person has had a positive individual experience with the technology, doesn’t mean a lot. This is a key takeaway for my students from the estimation exercise. During both the hashtag part and also the research project part, they tend to learn about impacts of cars, the internet, and generative models that they had no idea existed. They tend to conclude—this is my _favorite part—_that they might not be in the best position, based on their current knowledge and perspective, to estimate the technology’s global impact. They also tend to conclude that they are more likely to see and experience many of the benefits while avoiding, and even remaining oblivious to, some of the costs.

Mercy Mutemi and fellow counsel during a virtual pre-trial consultation in April 2023. Mutemi represents a collection of content moderators bringing a class action lawsuit against Meta and outsource company Sama for mistreatment of workers hired to filter gruesome content off of Facebook. Facebook users rarely have to confront the existence of these roles because they never see the gruesome content. Image from this article.

When we reached the topic of generative models, my colleague put it this way: #

“If I distribute the environmental impact to every human being, and then measure the net positive of Gen AI for that individual, given that so much of the world doesn’t even have access to that technology let alone cars, I can see where it’s definitely a net negative.”

Again, setting aside attempts to estimate global net impact, what impacts do we know about from generative models? We know that they bring joy to many who can afford a license and chat to them regularly. We know that thousands of content creators applaud them as a tool for brainstorming (and many, it must be said, outright print and distribute designs generated entirely by the models). We know that the models offer assistance to folks who need to write documents in English for work, who speak English as a second language, or who struggle with writing.

We also know that the model creators’ unfettered use of copyright material without creators’ consent constitutes outright thievery. The environmental impact of these models, climate scientists agree, drives us toward catastrophe. OpenAI’s exploitation of global south labor at a wage of two cents per hour, only to turn around and take ten billion from Microsoft, clearly consolidates global wealth to, basically, 1-3 dudes.

Our ethical struggle with generative models derives in part from the fact that we…sort of can’t have them ethically, right now, to be honest. We have known how to build models like this for a long time, but we did not have the necessary volume of parseable data available until recently—and even then, to get it, companies have to plunder the internet. Sitting around and waiting for consent from all the parties that wrote on the internet over the past thirty years probably didn’t even cross Sam Altman’s mind. But also, I can tell you, based on numbers I have seen as an engineer at a company that asks for user consent before changing almost anything, it wouldn’t work. Too many people would say no. The project would not obtain sufficient consenters for companies to have enough data to build generative models.

On the environmental front, fans of generative model technology insist that eventually we’ll possess sufficiently efficient compute power to train and run these models without the massive carbon footprint. That is not the case at the moment, and we don’t have a concrete timeline for it. Again, wait around for a thing we don’t have yet doesn’t appeal to investors or executives.

The exploitation part is unfortunately a time-honored tradition in global enterprise. The tech industry specifically gets away with new and more atrocious versions all the time by dint of creating things that regulatory bodies have never seen, let alone regulated. Tech companies move quickly; regulatory bodies move slowly. It took the Sherman Anti-Trust Act almost fifteen years to catch up to Google, and the sentence from that ruling is expected to take another 3-5 years. I understand, and even agree with, calls to regulate what folks have named the AI industry. But gurl, that will not be a quick process.

The invention of the automobile offers prescient precedent here. Though it’s hard to argue against the utility of cars to those that can access them, emergent developments after their introduction dragged the “net impact” meter down. The United States built cities around the assumption that everyone had a car. This forced residents to depend on them—a situation we compare disfavorably to life in cities with reliable transit, bicycle infrastructure, and pedestrian options. As cars got bigger, survival in car crashes favored the larger car, resulting in a car size arms race. Could we have done all this in a better way, that better leveraged cars for positive impact while avoiding some of the negative emergent properties? I’d like to believe we could have.

A busy intersection in Amsterdam received a redesign in 2023 (you can drag the white vertical separator left and right to compare the before and after). The new version dedicates space for bicycles and pedestrians, while converting the car section to a roundabout structure with crossings for the transit trolleys. The redesign achieves higher throughput of people at rush hour and dispenses with the need for a lot of traffic signals. Images from this article: https://bicycledutch.wordpress.com/2024/01/31/cycling-in-amsterdam-watch-the-traffic-flow-at-a-transformed-busy-intersection/

Could we theoretically improve the net benefit of any given technical development today if we make efforts to maximize its positive outcomes and mitigate its negative ones? I believe so, and I believe that’s basically the option available to us.

Fine, Chelsea; you want us all to abandon modernity, live in huts, and gather berries? #

Jesus, guys, we’ve been over this. Crying luddism every time somebody critiques your pet thing is a super weird look.

I do want practitioners to do three things.

1. I want people to recognize and acknowledge the massive difference between “what this thing has done for me” and “what this thing is doing for everyone.”

I think the internet has made life better for me. I am immensely grateful for that. I work at Mozilla because I believe in the internet and I want to maximize its positive impact. I think it’s okay to acknowledge that just because something was good for me doesn’t mean it saved the world. GenAI saved my colleague’s wife’s life (or, I’d argue, his wife used a tool in a positive way to save her own life). It’s reasonable for him to believe in it. I am grateful that they had it available to do that. That’s different from calling it a global net positive, and that’s okay.

2. I want people to think about “net impact” in a more nuanced, complex way than product net revenue.

The emergent outcomes of a technical shift on the entirety of global society cannot be reduced to the same mathematics we use to decide if a widget made a profit. I don’t think we have to start huge; I have my students start with personal anecdotes and then broaden from there. The approach I use for them is abbreviated, a little hamfisted, and certainly imperfect. But it captures more nuance than we often do when we make a gut estimate about whether a development was good for the world. I’d like to see engineering practitioners more regularly engage in a practice of organized, collective investigation and reflection.

3. I want people to arrive, not at an estimate of net impact, but at a set of ideas for improving that net impact.

This is what I think is within our purview to execute against anyway. The last time I did this exercise I watched students arrive, unprompted, at a long and varied list of ideas for how they might shape a future with generative models in it. “Can we build tools for content creators to proactively thwart their materials’ digestion by online scrapers?” “Can technical innovation permit us to minimize energy use in retraining somehow?” “What is universal basic data income, and is there any path to viability for that?” “How…how might someone like me chart a course to becoming a subject matter expert for regulation development?”

I hope these questions inform students’ choices as practitioners in the field. And I think arriving at similar questions can inform the choices of current practitioners. It’s tempting to sit around and speculate about whether a parallel universe without might be better off. But we don’t live in that universe; we live in this one. This universe is the one where we get to answer that question: not with navel-gazing, but with empirical evidence derived from attempts to drive change.

If you liked this piece, you might also like: #

Unless I am wrong (which I could be), I expect this post to conclude my writing on this topic for some time. I have other subjects I’d like to discuss here, including some interesting books I’ve read lately. Stay tuned for more, and if you can’t get enough, join me on Patreon: I put out audio recordings of blog posts every Monday and maintain a few regular written columns.

📓 Cabinet of Ideas