Paper:
A Method for Detecting Harmful Entries on Informal School Websites Using Morphosemantic Patterns
Michal Ptaszynski*1,†, Fumito Masui*1, Yoko Nakajima*2, Yasutomo Kimura*3, Rafal Rzepka*4, and Kenji Araki*4
*1Department of Computer Science, Kitami Institute of Technology
165 Kouen-cho, Kitami, Hokkaido 090-8507, Japan
*2Department of Information Engineering, National Institute of Technology, Kushiro College
2-32-1 Otanoshike-Nishi, Kushiro-shi, Hokkaido 084-0916, Japan
*3Department of Information and Management Science, Otaru University of Commerce
3-5-21 Midori, Otaru 047-8501, Japan
*4Graduate School of Information Science and Technology, Hokkaido University
Kita 14, Nishi 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
†Corresponding author
This paper presents a novel method of analyzing morphosemantic patterns in language to the detect cyberbullying, or frequently appearing harmful messages and entries that aim to humiliate other users. The morphosemantic patterns represent a novel concept, with the assumption that analyzed elements can be perceived as a combination of morphological information, such as parts of speech, and semantic information, such as semantic roles, categories, etc. The patterns are further automatically extracted from the data containing harmful entries (cyberbullying) and non-harmful entries found on the informal websites of Japanese high schools. These website data were prepared and standardized by the Human Rights Center in Mie Prefecture, Japan. The patterns extracted in this way are further applied to a document classification task using the provided data in 10-fold cross-validation. The results indicate that morphosemantic sentence representation can be considered useful in the task of detecting the deceptive and provocative language used in cyberbullying.
- [1] National Crime Prevention Council, http://www.ncpc.org/cyberbullying [accessed January 18, 2017]
- [2] Ministry of Education, Culture, Sports, Science and Technology (MEXT), “ “Bullying on the Net” Manual for handling and collection of cases (for schools and teachers),” MEXT, 2008 (in Japanese).
- [3] S. Hinduja and J. Patchin, “Bullying beyond the schoolyard: Preventing and responding to cyberbullying,” Corwin Press, 2009.
- [4] J. W. Patchin and S. Hinduja, “Bullies move beyond the schoolyard: A preliminary look at cyberbullying,” Youth Violence and Juvenile Justice, Vol.4, No.2, pp. 148-169, 2006.
- [5] J. J. Dooley, J. Pyżalski, and D. Cross, “Cyberbullying Versus Face-to-Face Bullying: A Theoretical and Conceptual Review,” Zeitschrift für Psychologie, Vol.217, No.4, pp. 182-188, 2007.
- [6] L. Lazuras, J. Pyżalski, V. Barkoukis, and H. Tsorbazoudis, “Empathy and Moral Disengagement in Adolescent Cyberbullying: Implications for Educational Intervention and Pedagogical Practice,” Studia Edukacyjne, nr23, pp. 57-69, 2012.
- [7] T. Ishisaka and K. Yamamoto, “Extraction of abusive expressions from 2channel,” Proc. of the 16th Annual Meeting of the Association for Natural Language Processing (NLP2010), pp.178-181, 2010 (in Japanese).
- [8] M. Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi, “In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis,” Int. J. of Computational Linguistics Research, Vol.1, Issue 3, pp. 135-154, 2010.
- [9] T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, and K. Araki, “Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximization,” Proc. of the 6th Int. Joint Conf. on Natural Language Processing (IJCNLP 2013), pp. 579-586, 2013.
- [10] T. Matsuba, F. Masui, A. Kawai, and N. Isu, “A study on the polarity classification model for the purpose of detecting harmful information on informal school sites,” Proc. of the 17th Annual Meeting of the Association for Natural Language Processing (NLP2011), pp. 388-391, 2011 (in Japanese).
- [11] P. D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 417-424, 2002.
- [12] S. Hatakeyama, F. Masui, M. Ptaszynski, and K. Yamamoto, “Statistical Analysis of Automatic Seed Word Acquisition to Improve Harmful Expression Extraction in Cyberbullying Detection,” Int. J. of Engineering and Technology Innovation, Vol.6, No.2, 2016, pp. 165-172.
- [13] B. Levin and M. R. Hovav, “Morphology and Lexical Semantics,” The Handbook of Morphology, A. Spencer and A. M. Zwicky (Eds.), pp. 248-271, 1998.
- [14] P. Kroeger, “Morphosyntactic vs. morphosemantic functions of Indonesian -kan,” Architectures, Rules, and Preferences: Variations on Themes of Joan Bresnan, A. Zaenen, J. Simpson, T. H. King, J. Grimshaw, J. Maling, and C. Manning (Eds.), pp. 229-251, CA: CSLI Publications, 2007.
- [15] C. Fellbaum, A. Osherson, and P. E. Clark, “Putting semantics into WordNet’s “morphosemantic” links,” Human Language Technology. Challenges of the Information Society, Springer Berlin Heidelberg, pp. 350-358, 2009.
- [16] I. Raffaelli, “The model of morphosemantic patterns in the description of lexical architecture,” Lingue e linguaggio, Vol.12, No.1, pp. 47-72, 2013.
- [17] Y. Nakajima, M. Ptaszynski, H. Honma, and F. Masui, “A Method for Extraction of Future Reference Sentences Based on Semantic Role Labeling,” IEICE Trans. on Information and Systems, Vol.E99-D, No.2, pp. 514-524, 2016.
- [18] K. Takeuchi, S. Tsuchiyama, M. Moriya, and Y. Moriyasu, “Construction of Argument Structure Analyzer Toward Searching Same Situations and Actions,” IEICE Technical Report, Vol.109, No.390, pp. 1-6, 2010.
- [19] MeCab: Yet Another Morphological Analyzer, http://taku910.github.io/mecab/ [accessed January 18, 2017]
- [20] M. Ptaszynski, R. Rzepka, K. Araki, and Y. Momouchi, “Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion,” Int. J. of Computational Linguistics (IJCL), Vol.2, No.1, pp. 24-36, 2011.
- [21] T. Matsuba, F. Masui, A. Kawai, and N. Isu, “Detection of harmful information on informal school websites,” Proc. of the 16th Annual Meeting of the Association for Natural Language Processing (NLP2010), 2010 (in Japanese).
- [22] Human Rights Research Institute Against All Forms for Discrimination and Racism in Mie Prefecture, Japan, http://www.pref.mie.lg.jp/jinkenc/hp/ [accessed January 18, 2017]
- [23] Student, “The probable error of a mean,” Biometrika, Vol.6, No.1, pp. 1-25, 1908.
- [24] M. Ptaszynski, F. Masui, R. Rzepka, and K. Araki, “Emotive or Non-emotive: That is The Question,” Proc. of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2014), pp. 59-65, 2014.
- [25] A. Kidman, “How to do things with four-letter words: A study of the semantics of swearing in Australia,” Unpublished Dissertation, University of New England, 1993.
- [26] R. Zhao and K. Mao, “Cyberbullying Detection based on Semantic-Enhanced Marginalized Denoising Auto-Encoder,” IEEE Trans. on Affective Computing, 2016.
- [27] Google Anti-harrassment Policy, https://www.google.com/events/policy/anti-harassmentpolicy.html [accessed January 18, 2017]
- [28] Twitter, Building a Safer Twitter, https://blog.twitter.com/2014/building-a-safer-twitter [accessed January 18, 2017]
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.