Mon, Nov 25 2024
This week, Google made another humiliating AI gaffe: an image-generating algorithm that absurdly added diversity to images while paying little respect to the past. Google has since issued an apology, or at least almost apologized. Although the fundamental problem is easily understood, Google accuses the model of "becoming" overly sensitive. Guys, the model didn't just appear.
The conversational AI platform used by the company, Gemini, generates images on demand by requesting a version of the Imagen 2 model when prompted.
But recently, people discovered that asking it to conjure up images of specific historical events or individuals led to absurd outcomes. The Founding Fathers, for example—whom we now know to have been white slave owners—were portrayed as a multiracial group that included individuals of color.
Online commentators swiftly made fun of this embarrassing and easily replicable problem. It was also, inevitably, drawn into the ongoing discussion over diversity, equity, and inclusion (which is now at a minimum in terms of reputation locally), and commentators seized on it as proof that the already liberal tech industry was being further infected by the woke mind virus.
Clearly alarmed citizens exclaimed, "DEI gone mad." This is America under Biden! Google serves as a stalker for the left and a "ideological echo chamber." (It must be acknowledged that this strange phenomena also sufficiently alarmed the left.)
However, as Google notes in its rather pitiful little apology-adjacent piece today, and as anyone with any expertise with the technology could tell you, this issue was caused by a perfectly logical solution for systemic bias in training data.
Let's say you want to utilize Gemini to make ten images of "a person walking a dog in a park" for a marketing campaign. It is up to the dealer to decide what kind of person, dog, or park to utilize; the generative model will output what it knows best. Furthermore, that is frequently a result of training data, which contains a variety of biases, rather than reality.
From the thousands of relevant photographs the model has consumed, which types of people—and dogs and parks, for that matter—are most prevalent? Because white people are disproportionately represented in many of these image collections (stock images, rights-free photography, etc.), the model will frequently default to showing white people if you don't specify.
Although Google notes that "because our users come from all over the world, we want it to work well for everyone," it is really an artifact of the training data. You might wish to get a variety of people if you ask for a photo of a football player or a person strolling a dog. You most likely don't want to see pictures of people who are exclusively of a certain ethnicity (or any other feature).
Taking a photo of a white man walking a golden dog in a suburban park is perfectly OK. However, what if all ten of them are white men walking golden retrievers in suburban parks when you ask for them? And you reside in Morocco, where everyone looks different—including the people, pets, and parks? That's just not what you want to happen. Despite the potential bias caused by its training data, the model should choose diversity over homogeneity when a characteristic is left unspecified.
This is an issue that all generative media types share. Additionally, there's no easy fix. However, businesses such as Google, OpenAI, Anthropic, and so forth subtly incorporate additional instructions for the model in situations that are very common, delicate, or both.
The prevalence of this form of implicit instruction is something I cannot emphasize enough. Implicit instructions, or system prompts as they are frequently called, are the foundation of the whole LLM ecosystem. Before every discussion, the model is given instructions like as "be concise," "don't swear," and other guidelines. You won't receive a racist joke when you ask for one because, although the model has consumed thousands of jokes, it has also been trained, like the majority of us, to refrain from telling them. This is infrastructure, not a covert objective, albeit it should be more transparent.
Google's model was flawed in that it lacked implicit guidance for scenarios in which historical context was crucial. Thus, whereas the silent addition of "the person is of a random gender and ethnicity" or whatever else they put improves a prompt like "a person walking a dog in a park," "the U.S. Founding Fathers signing the Constitution" is clearly not improved by the same.
In the words of Google SVP Prabhakar Raghavan:
First, there were occasions where it was obvious that Gemini should not have displayed a range, despite our efforts to guarantee that it did. Furthermore, the model developed a tendency to be far more circumspect than we had anticipated, refusing to respond to some cues completely and misinterpreting some very innocuous ones as delicate.
These two factors caused the model to overcompensate in certain situations and overconservative in others, resulting in awkward and incorrect visuals.
I understand that saying "sorry" might be difficult at times, so I'm forgiving Raghavan for stopping short. What's more noteworthy is this intriguing sentence: "The model became way more cautious than we intended."
How might a model "become" anything at this point? Software is what it is. Thousands of Google engineers created it, tested it, and made iterations. Someone wrote the implicit instructions, which made some responses better and led to funny failures for others. If someone had looked over the entire prompt when this one failed, they probably would have discovered the error that Google's staff made.
The model "became" something it wasn't "intended" to be, according to Google. However, they created the model! It's like when they break a glass, they say "it fell" instead of "we dropped it." (I've completed this.)
It is unavoidable that these models will make mistakes. They exhibit unusual behavior, bias reflection, and hallucinations. However, the individuals who made those errors bear accountability, not the models themselves. That's Google as of right now. OpenAI will be it tomorrow. It will be X.AI the following day and most likely for a few months at a time.
These businesses have a vested interest in persuading you that AI is fallible. Keep them from doing so.
Leave a Comment