According to research, if users continue using hateful language on Twitter it can make a difference.

The change in Twitter behavior will be even greater if the warnings are delivered respectfully.

New York University’s Center for Social Media and Politics developed warnings for Twitter users. This meant that the user was following someone suspended for having violated hate speech policies.

These warnings were generally received by users, who decreased their usage of racist, sexist or any other prohibited language, at least 10%, after they had been given a prompt.

If the warning was couched politely—’I understand that you have every right to express yourself but please keep in mind that using hate speech can get you suspended’—the foul language declined up to 20 percent.

In a paper that was published in Perspectives on Politics, the authors stated that warning messages intended to make the target person feel legitimate are the most effective. 

Scroll down for the video

Researchers sent 'suspension candidates' personalized tweets warning them that they might face repercussions for using hateful language

Researchers warned’suspension candidate’ tweeters that hateful language could result in disciplinary action

Twitter has become increasingly polarized. The company is trying various strategies to fight hate speech and disinformation via the pandemic, 2020 US presidential campaign and January 6th attack on Capitol Building.

‘Debates over the effectiveness of social media account suspensions and bans on abusive users abound,’ lead author Mustafa Mikdat Yildirim, an NYU doctoral candidate, said in a statement.  

‘But we know little about the impact of either warning a user of suspending an account or of outright suspensions in order to reduce hate speech,’ Yildirim added.

Yildirim, along with his colleagues, suggested that followers who knew someone had been suspended might alter their tweeting behaviour after being warned.

The more polite and respectful messages led to a 20 percent decrease in hate speech, compared to just 10 percent for general warnings

The percentage of hate speech dropped by 20% when messages were more polite, respectful, and less for general warnings. 

They wrote that to effectively send a message of warning to their target, it must make them aware of the consequences and make them believe they will be dealt with.

To test their hunch, they looked at the followers of users who had been suspended for violating Twitter’s policy on hate speech—downloading more than 600,000 tweets posted the week of July 12, 2020 that had at least one term from a previously determined ‘hateful language dictionary.’

According to the release Twitter was flooded with hateful messages against Asian and Black communities during that period. This is due to ongoing coronavirus pandemics and Black Lives Matters protests following George Floyd’s passing.

The researcher was able to identify 4,300 possible’suspension candidate’ from the flurry. 

Six different messages were tested to the subjects. Each message began with “The user” [@account]Your following was removed, I think because you used hateful language.

The preamble was followed by several warnings. These ranged from “If you keep using hate speech, your account might be suspended temporarily” to “If you keep using hate speech you could lose your friends, followers and posts and possibly not get your account back.”

These warnings weren’t issued by official Twitter accounts. Some were sent from fake accounts using handles such as ‘hate speech warrior’, while some others identified themselves to be professional researchers.

Engadget: Yildirim stated that they had tried to make the interview as authentic and convincing as possible.

The warnings were not issued by official Twitter accounts: Some came from dummy accounts with handles like 'hate speech warner,' while others identified themselves as professional researchers

These warnings weren’t issued by Twitter’s official accounts. Some were sent from fake accounts using handles such as ‘hate speech warrior’, while some others claimed to be professional researchers.

The number of hateful tweets by users who were warned was reduced up to 10% for those who got one. However, polite warnings proved twice as effective.

The authors stated that they had used the deterrence literature to design their messages. They also tested versions which stressed the authenticity of the sender and the credibility of the message. 

Although the effect was temporary, users were able to return to their native language within one month.

“Even though these warnings only have temporary effects, this research nevertheless provides potential paths forward for platforms trying to reduce hateful language use by users,” the authors said.  

The authors also recommended that Twitter take a more assertive approach to warning users about possible suspensions of their accounts in order to curb hate speech online

Yildirim admitted that a Twitter warning may be more effective, but acknowledged that acknowledgment is possible.

“The important lesson we learnt from the experiment was that it could have been the fact that these people were made aware by us that someone or something is following and monitoring them,” he said to Engadget. The fact that someone else saw their hate speech could have been the key factor to these people reducing their hate speech.

Twitter’s recent increase in account sanctions has been significant: According to Bloomberg reports, Twitter took on 77% more hate speech-related accounts than in the first half of 2020. Penalties range from banning accounts completely to removing tweets.

According to the authors, banning users completely ‘can have unintended consequences’ like them migrating onto more extreme platforms such as Parlor, Gab, or Rumble.

Twitter also introduced a number of new features that discourage hate speech. This includes the ability for iOS users to edit or delete ‘a potentially dangerous reply’ prior to posting.

This feature was first introduced in May 2020. It then quietly vanished, only to reappeared a few months later in August 2020.

In February 2021 it resurfaced. with Twitter sharingIt had “relaunched the experiment on iOS which asks you for feedback about a response that could be harmful or offensive.”

The warning message can be ignored by users, and they may still reply.

The platform has been testing an alert system that will warn users before they engage in a Twitter battle.

Twitter is testing a Safety Mode that automatically filters out messages its AI flags as likely containing hate speech

Twitter’s Safety Mode automatically flags messages it deems likely to contain hate speech and filters them out

Twitter Support states that messages can announce “conversations like these can be intense” depending on their topic.

Twitter stated it September 2012. was testing a ‘Safety Mode’That will automatically block hateful messages

Users will see their’mentions’ filter for a week after activating the mode so they don’t see any messages flagged as likely to contain hate speech, insults or other offensive language.

Twitter had a very small test group that was using the feature. said in a blog postPriority given to “marginalized” communities

The new features include being able untag yourself in a conversation as well as the ability of removing a follower from an account without blocking them or notifying.