Definition of Implicit Offensiveness

Implicit offensiveness is a form of offensive language characterized by a tone of disregard or mockery that conveys derogatory meaning, such as sarcasm or social bias within context, while avoiding explicit expressions.
This figure illustrates the types of offensive comments collected from Korean online communities. These expressions are hard to capture without proper context. We divide the implicitly offensive comments into three subcategories:
(1) disregard and mockery, consistent with past definitions of implicit offensiveness
(2) community-specific slang that is familiar within certain groups but difficult for outsiders to interpret
(3) variations of profanity used to avoid detection
Specifically, communities with high-context languages such as Korean are more likely to use these types of implicit offensive expressions. Therefore, we use these categories to guide the data generation process. Furthermore, we demonstrate the language- and model- agnostic nature of this pipeline by generating data in English.