To End Online Hate, Big Tech Must Let Those Who Are Targeted Lead the Way

The 3D printed Facebook and Twitter logos are seen in this illustration taken in Zenica, Bosnia and Herzegovina, January 26, 2016. Photo: REUTERS/Dado Ruvic

Billions of people log on to social media platforms every day. As we spend an increasing portion of our lives online, our exposure to hateful content becomes routine. The Anti-Defamation League 2021 Social Media Hate and Harassment Survey found that 41% of Americans have experienced online harassment, while 27% have experienced serious harassment, which includes sexual harassment, stalking, physical threats, swatting, doxing, and ongoing harassment. We’re inundated with conspiracy theories, scams, misinformation, or racist rhetoric that frustrates users — or, worse, threatens our safety.

One of the ways tech companies can create safer and fairer online spaces is to moderate content more consistently and comprehensively. Tech companies have often been criticized for inconsistently applying their stated policies across billions of users, causing seismic levels of damage. It’s unclear how content moderation teams are trained to recognize and address various forms of hate, such as anti-Semitism. Their training materials or even their operational definitions have not been made public or privately shared with civil society. Additionally, as tech companies increasingly rely on artificial intelligence to remove offensive posts on social media platforms, we don’t know if the perspective of online hate targets is being used to create these technologies.

For example, evidence of leaked Facebook documents submitted by whistleblower Frances Haughen to the SEC in 2021 suggest that the haphazard way in which automated content moderation technologies are being developed is too broad in their understanding of hate and ineffective. The document states that current automated methods remove “less than 5% of all hate speech posted on Facebook.” Additionally, studies have shown that algorithms that detect hate speech online can often be racist.

It is crucial to find ways for communities that are often the target of hate to contribute to the creation of technological tools that automate and increase content moderation. To model what this process might look like, the ADL Center for Technology and Society is constructing the Online Hate Index (OHI), a set of machine learning classifiers that detect hate targeting marginalized groups on online platforms. The first in this set, the OHI Antisemitism Classifier, draws on the insights of ADL antisemitism experts and Jewish community volunteers who may have experienced antisemitism. Together, these groups are best placed to understand and operationalize a definition of antisemitism.

To better understand how machine learning classifiers work, imagine a child gathering information that, through practice, helps them discern and understand their world. Machine learning works the same way. In the case of the IHO, our machine-learning antisemitism classifier takes into account pieces of information (here, text) that have been determined to be antisemitic, or not, by ADL experts and volunteers. jews.

Through practice, the algorithm learns to recognize antisemitic content and begins to generalize linguistic patterns when it receives many examples of antisemitism and non-antisemitic content. Similarly, a child may take in specific information about a situation (“This cup is orange”) and begin to generalize to their larger experience of the world (“This is what the color orange looks like”). Over time, the model improves to predict the likelihood that a piece of content it has never seen before – a tweet, comment or post – is or is not anti-Semitic.

In August 2021, ADL conducted what we believe to be the first independent, AI-assisted measurement grounded in the identity-based hate community on Reddit and Twitter. We found that the rate of antisemitic content on Twitter during the week we investigated was 25% higher than on Reddit. The potential reach of anti-Semitic content we found on Twitter in that week alone was 130 million people. If so on some of the most responsible tech platformsit goes without saying that these problems are much more serious on platforms run by other less forward-thinking tech companies, such as Facebook.

If all platforms were as open to sharing data as Twitter and Reddit, the future might be brighter. Groups like ADL could use tools like the IHO to audit all social platforms, grounded in the perspective of targeted communities, and determine the prevalence of hate against those groups on those platforms. We would then be able to assess whether the tech company’s efforts have been sufficient to reduce hate on their platforms. We would be able to compare rates between platforms using the same metrics and determine which hate mitigation methods were most effective.

Unfortunately, as we described in our data accessibility dashboard, platforms other than Reddit and Twitter do not provide the necessary data to make this a reality. Platforms should provide the data needed to make this a reality – and if they don’t, governments should find thoughtful ways to demand it.

ADL hopes that the way the IHO combines machine learning and human expertise ± and centers targeted communities in technology development – ​​offers a practical path to empowering platforms. Other civil society organizations may develop similar tools using volunteers to label homophobia, transphobia, misogyny and racism.

We need more technology that detects identity-based hate. If social media platforms are to effectively combat hate, they must empower those most affected by it to lead the way.

Daniel Kelley is director of strategy and operations for the ADL Technology and Society Center

Comments are closed.