Search engines are undergoing significant changes of late, with many users particularly worried that generative AI could introduce new biases into the results page.
But a new study by researchers at Northeastern and Stanford suggests search was far from perfect in the first place—at least, when it comes to image results.
Avijit Ghosh and his colleagues collected 54,070 unique image search queries conducted by more than 640 people on both Google and Bing. They then looked at the top 15 results for each query—finding that the average skin tone of the images returned in response to those queries was the second-whitest possible on the 10-point Monk skin tone scale. (The Monk skin tone scale was developed by a researcher called Ellis Monk in conjunction with Google in 2022.)
Ghosh and his colleagues also found that 11 of the top 15 categories of queries analyzed returned images that depicted people with an average lower than the median age in the United States.
“There’s been existing work in this area by Professor Safiya Umoja Noble looking at Google search results,” says Ghosh. “And there’s been a lot of follow-up work.” For that reason, Ghosh was surprised that many of the biases that have been previously raised remained in search results. However, Ghosh points out that a lot of prior research looked at biases in searches for terms like job titles—which his data suggests people don’t often do (less than 5% of searches are open-ended people queries, his data suggests). “We wanted to actually measure real-world uses—what type of terms are people searching for, and what are the biases in them.”
Looking at gender, the researchers found slightly more feminine faces than masculine ones in search results, but only marginally so. There were also minor differences in what Google and Bing presented when it came to results: on the 10-point skin tone scale, Google’s average was 3.19, compared to Bing, which was 2.99—meaning Google presented slightly darker faces than Bing’s, but not significantly. When it came to age, both depicted people who on average looked between 20 and 30.
“It’s very easy for somebody to look for, like, a summer dress and only see white models,” says Ghosh. “Somebody who is not a young white woman might feel bad and that they could not wear that dress.” Ghosh says he’s experienced something similar when looking for shirts on search engines, and being presented with only white models—a world away from his childhood growing up in India. “I have no clue if that outfit would look good on my skin or not,” he says.
Ghosh suggests the search engines could try to solve the problem of representation in a number of ways. One method he suggests is adopting the approach Pinterest takes, which allows users to search by skin tone, in the same way you can search for clothes color on fashion websites. He also believes that search engines could use the data they collect on individuals to provide context-specific search results that better match the likely thing an individual user is searching for. He points to data within the study that shows Black people are more likely to include “Black” in their search query in order to better direct the model—suggesting they oughtn’t need to do that, and it could be done automatically.