heegyu/KoSafeGuard-8b-0503
KoSafeGuard-8b-0503 is an 8 billion parameter model developed by heegyu, specifically designed to detect harmful content in Korean language text generated by other language models. Trained on a translated dataset (heegyu/PKU-SafeRLHF-ko), it identifies risks across categories such as self-harm, violence, crime, hate speech, and sexual content. This model's primary differentiator is its specialization in Korean safety moderation, enabling the construction of safer chatbots by filtering unethical or dangerous outputs.
No reviews yet. Be the first to review!