this post was submitted on 01 Aug 2023
530 points (82.5% liked)
Technology
59314 readers
5725 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Shouldn't they be able to code them out?
You can't "code them out" because AI isn't using a simple script like traditional software. They are giant nested statistical models that learn from data. It learns to read the data it was trained on. It learns to understand images that it was trained on, and how they relate to text. You can't tell it "in this situation, don't consider race" because the situation itself is not coded anywhere. It's just learned behaviors from the training data.
Shouldn't they be able to lessen them?
For this one the answer is YES. And they DO lessen them as much as they can. But they're training on data scraped from many sources. You can try to curate the data to remove racism/sexism, but there's no easy way to remove bias from data that is so open ended. There is no way to do this in an automated way besides using an AI model, and for that, you need to already have a model that understands race/gender/etc bias, which doesn't really exist. You can have humans go through the data to try to remove bias, but that introduces a ton of problems as well. Many humans would disagree on what is biased. And human labelers also have a shockingly high error rate. People are flat out bad at repetitive tasks.
And even that only covers data that actively contains bigotry. In most of these generative AI cases, the real issue is just a lack of data or imbalanced data from the internet. For this specific article, the user asked to make a photo look professional. Training data where photos were clearly a professional setting probably came from sites like LinkedIn, which had a disproportionate number of white users. These models also have a better understanding of English than other languages because there is so much more training data available in English. So asian professional sites may exist in the training data, but the model didn't understand the language as well, so it's not as confident about professional images of Asians.
So you can address this by curating the training data. But this is just ONE of THOUSANDS and THOUSANDS of biases, and it's not possible to control all of them in the data. Often if you try to correct one bias, it accidentally causes the model to perform even worse on other biases.
They do their best. But ultimately these are statistical models that reflect the existing data on the internet. As long as the internet contains bias, so will AI
Thank you so much for taking the time to answer!