from the Darcy archives:
## What works:
- Deplatforming problematic speech (hate, bullying…)
- It is proving to be an efficient deterrent against hate speech according to a [study](http://comp.social.gatech.edu/papers/cscw18-chand-hate.pdf) that questioned whether banning specific hateful sub-reddits would successfully diminish hateful speech or just relocate it (Chandrasekharan et al., 2017).
- While the above mentioned study does have technical limitations, various news reports come to a similar conclusion including an in-depth [New York Times report on Alex Jones’s Infowars](https://www.nytimes.com/2018/09/04/technology/alex-jones-infowars-bans-traffic.html),
- Context sensitivity to understand content according to a [survey](https://datasociety.net/wp-content/uploads/2018/11/DS_Content_or_Context_Moderation.pdf) of 10 major platforms (Caplan, 2018)
- Automated flagging (for human human review)
- Caplan (2018) notes that larger companies practicing what she calls industrial moderation, use use automated detection to flag spam, child pornography, or pro-terrorism content. This then usually goes to human review.
- This is consistent with YouTube’s [transparency reports](https://transparencyreport.google.com/youtube-policy/removals) which highlight the efficiency of automated flagging (built out of pattern recognition) as primary sources of detection.
- Individual trusted flaggers
- Individual trusted flaggers are individuals who have access to “priority flags”. They are often professionals in a field relevant to the content they flag (anti-terrorism, child safety, anti-racism…). According to a YouTube Transparency report from October 2018, these individuals represent only 6.2% of overall flaggers but are responsible for more than 3 times as much accurate flagging.
- Taking more time to review each report, which is only possible with low volumes (Caplan, 2018).
## What doesn’t work:
- Over-reliance on the community / social norms
- Allow for users to directly reach out to moderators, as it causes retaliation and harassment, or rely on volunteers left to fend for themselves (ex [Reddit](https://www.engadget.com/2018/08/31/reddit-moderators-speak-out/)).
- Unsupervised AI
- Image-recognition classifiers did not work for [Tumblr trying to take down porn](https://edition.cnn.com/2019/01/02/tech/ai-porn-moderation/index.html)
- Nor is it working for speech moderation (Young Swamy & Danks, 2017).
- Also see [this](https://www.theverge.com/2019/2/27/18242724/facebook-moderation-ai-artificial-intelligence-platforms) article (Vincen 2019)
- Adapting different community standards for various cultural context.
- While [research](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8554954) (on hate speech) indicates that while interpretation does vary by country, there is also significant difference from one individual to another (Salminen et al., 2018).
- In addition as many people live transnational lives and online communities are increasingly the product of intersecting offline context, making offline and online context correspond is not feasible beyond tracking user’s geographic distribution