heegyu/KoSafeGuard-8b-0503
KoSafeGuard-8b-0503 is an 8 billion parameter model developed by heegyu, specifically designed to detect harmful content in Korean language text generated by other language models. Trained on a translated dataset (heegyu/PKU-SafeRLHF-ko), it identifies risks across categories such as self-harm, violence, crime, hate speech, and sexual content. This model's primary differentiator is its specialization in Korean safety moderation, enabling the construction of safer chatbots by filtering unethical or dangerous outputs.