To Apply Machine Learning Responsibly, We Use It In Moderation

Illustration by Hollie Fuller

Concerns over implicit bias in machine-learning software raised important questions about how New York Times comment moderators can leverage this powerful tool, while also mitigating the risks.

By Matthew J. Salganik and Robin C. Lee

It’s a common refrain on the internet: never read the comments. All too often, the comment section is where trolls and toxic behavior thrive, and where measured debate and cordial conversation ends. The New York Times comment section, however, is different. It is generally civil, thoughtful and even witty. This did not happen by itself; it is the result of careful design and hard work by the Times Community Desk.

For years, people on the Community Desk manually moderated all comments that were submitted to Times articles. The work led to high-quality comments, but it was time consuming and meant that only a small number of stories could be open to comments each day.

In 2017, this changed. To enable comments on many more stories, The Times introduced a new system named Moderator, which was created in partnership with Jigsaw. Moderator is powered by a widely available machine-learning system called Perspective that is designed to help make the comment review process more efficient. After deploying Moderator, The Times dramatically increased the number of stories that are open to comments.

This might sound like another machine learning success story, but it is not that simple.

Since 2017, researchers have discovered that some well-intentioned machine-learning tools can unintentionally and invisibly discriminate or reinforce historical bias at a massive scale. These discoveries raised important questions about how to use machine-learning systems responsibly, and they led us to take a second look at how machine learning was being used in The Times’s comment moderation process.

How a comment makes it onto The Times

When a comment is submitted by a reader, Perspective assigns it a score along a number of dimensions, such as whether the comment is toxic, spam-like or obscene. A comment that includes the line, “free Viagra, free Viagra, free Viagra” might get a high spam score, and a comment that uses a lot of four-letter words might get a high obscenity score. Perspective assigns these scores entirely based on the content of the text; it does not know anything about the identity of the commenter or the article that attracted the comment. In addition to providing scores, Perspective also identifies specific phrases that it thinks might be problematic.

Next, the Community Desk, made up of about 15 trained moderators, reads the highlighted comments for each story and decides whether to approve or deny them for publication based on whether they meet The Times’s standards for civility and taste. This review is enabled by Moderator’s interactive dashboard, which was designed to empower the human moderators, making it easier for them to do their work efficiently and accurately. The dashboard presents the scores and the potentially problematic phrases, as well as other contextual information about the original story.

A screenshot from the Moderator interface that shows every pending comment — represented by a dot — for an article. The slide along the bottom shows the scoring percentage for how likely the comments will be rejected.

All of the human moderators have a background in journalism and have been Times employees for many years. In other words, the work of moderation is not outsourced; instead, it is treated as an important responsibility.

Together, this hybrid system combines sophisticated machine learning and skilled human moderators to sift through the thousands of comments submitted each day. It has allowed The Times to foster high-quality conversations around our journalism.

A reason for concern

With the concern about potential bias in machine-learning technology in mind, our colleague Robin Berjon, who is the Vice President of Data Governance at The Times, decided to run an experiment to see if he could trick Perspective, the machine-learning system that powers Moderator. Berjon created duplicate pairs of fake comments with names that are strongly associated with different racial groups to see whether Perspective gave different scores to the comments. It did.

This looked bad for Perspective, but it was actually more complicated. The fake comments that Berjon created didn’t look anything like the real comments that readers submit to The Times — those are typically long and almost never include anyone’s name. And Berjon only attempted to trick Perspective; his fake comments never made it to The Times’s human moderators, who would have detected them.

Despite these limitations, Berjon’s demonstration inspired us to further investigate. We discovered that we were not the first people to be interested in biases in Perspective’s scores.

A team of researchers at the University of Washington assessed Perspective using Tweets and found that the machine-learning software was more likely to rate African-American English as toxic. Researchers at Jigsaw — the very people who created Perspective — published several academic papers attempting to understand, describe and reduce the unintended biases in Perspective. Critically, none of this research was available in 2017 when The Times first started using Perspective as part of the comment moderation process.

Like Berjon’s experiment, this academic research was provocative but incomplete for our purposes. None of it used comments like those submitted to The Times, and none of it accounted for the human moderators who are part of the Times system.

One might suppose that the right response to this research is to build a better machine-learning system without bias. In an ideal world, this response would be correct. However, given the world and the state of current technology, we don’t think that’s possible right now. Roughly speaking, machine-learning systems learn from identifying patterns that exist in data. Because the world contains racism and sexism, the patterns that exist in data will likely contain racial and gender bias.

As an example, a machine-learning system might learn that comments are more likely to be toxic if they include the phrase “Jewish man.” This pattern might exist in the data because the phrase is more frequently used in online comments with the intent to harass. It is not because Jewish men are toxic or because the algorithm is biased against Jewish men.

If the machine-learning system was taught to unlearn this pattern, it could make the system less useful at identifying anti-semitic comments, which might actually make the system less practical for promoting healthy conversation.

Given that a quick technical fix was not possible, we decided to build on prior research and conduct our own investigation to see how these issues might play out in the context of The Times.

Our investigation at The Times

Our investigation of the role of machine learning in Times comment moderation had three parts: digging through the comment moderation database logs; creating comments that tried to trick Perspective; and learning more about the human moderators who ultimately decide whether a comment gets published.

Combing through the database logs of past comments, we wanted to know whether comments scored as more problematic by Perspective were getting rejected by the Times moderators more often. They were. Then, we looked for cases where Perspective and the moderators disagreed. We examined some comments that Perspective scored as risky but were published by The Times’s human moderators, and some comments that were scored as not risky but were rejected. We did not see any systematic patterns related to issues like race or gender bias.

To further test how Perspective works with Times comments, we created a set of 10 identity phrases such as, “Speaking as a Jewish man” and “Speaking as an Asian-American woman,” and we added them to the beginning of one thousand comments that had been published on the Times website. We then had Perspective score all of these modified comments and we compared the results to the scores for the original, unmodified comments.

Similar to what others have found, we saw that these identity phrases led Perspective to score the comments as riskier. But, critically, we saw that these effects were much smaller when the identity phrases were added to real comments, rather than submitted on their own. And we found that the longer the comment, the less impact the identity phrase had on the scores.

The distinctive, and perhaps most important, part of the Times system is the human moderators. We wanted to understand how the moderators used and interpreted the Moderator software to make decisions about what comments to publish on the Times website.

To learn more about the process, we went through comment moderator training and shadowed moderators as they worked. Through observation and discussion, we learned that the moderators felt free to overrule the scores generated by Perspective (and from the log data we could see that this did indeed happen).

The human moderators were keenly aware of Perspective’s limitations, yet they found Moderator’s user interface to be helpful, especially the way it highlights phrases in comments that might be problematic. They told us that having comments ordered by estimated risk makes it easier to make consistent decisions about what to publish.

To be clear, Perspective is not perfect and neither are the human moderators. However, creating a system that maximizes the number of Times stories that allow comments, while also fostering a healthy and safe forum for discussion, requires trade-offs. An all-human model would limit the number of stories that are open for comment, and an all-machine-learning model would not be able to moderate according to The Times’s standards.

Just as Perspective and the moderators are imperfect, so is our investigation. We ruled out large patterns of racial and gender bias, but we might have missed small biases along these dimensions or biases along other dimensions. Also, our investigation happened during a specific point in time, but both Perspective itself and the comments submitted to The Times are always changing.

Overall, our three-pronged investigation of the role of machine learning in comment moderation at The Times led us to conclude that the system — one that combines machine learning and skilled people — strikes a good balance.

General recommendations

Our investigation focused on the role of one specific machine-learning system in comment moderation at The Times. However, the concerns that sparked our investigation apply to all machine-learning systems. Therefore, based on our investigation, we offer three general recommendations:

Don’t blindly trust machine-learning systems
Just because someone tells you that a tool was built with machine learning does not mean that it will automatically work well. Instead, examine how it was built and test how it works in your setting. Don’t assume that machine-learning systems are neutral, objective and accurate.

Focus on the people using the machine-learning systems
An imperfect machine learning tool can still be useful if it is surrounded by dedicated, experienced and knowledgeable people. For example, it did not matter if Perspective occasionally produced inaccurate scores as long as the human moderators ultimately made the correct decisions. Sometimes the best way to improve a socio-technical system is to empower the people using the machine-learning system; this can be done by providing training on how the system works and by giving them the authority to push back against it whenever necessary.

Socio-technical systems need continual oversight.
The initial deployment of a machine-learning system can require difficult technical and organizational changes, but the work does not end there. Just as organizations routinely audit their operation and accounting, they should regularly assess the use of machine-learning systems.

The New York Times has created a thriving comments section through years of effort and innovation. We hope this investigation contributes to that long-term project and also serves as a blueprint for examining the responsible use of machine learning in other areas.

Matthew J. Salganik was a Professor in Residence at the New York Times, and he is a Professor of Sociology at Princeton University, where he serves as the Interim Director of the Center for Information Technology Policy. He is the author of “Bit by Bit: Social Research in the Digital Age”. Follow him on Twitter.

Robin Lee is a PhD student in Sociology at Princeton University. He was a Data Analyst for the Data & Insights team at The New York Times. Outside of work, he’s a data meetup organizer. Follow him on Twitter.