In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > OpenAI API > OpenAI Moderation API - Content Safety Filtering

OpenAI Moderation API - Content Safety Filtering

Author: Venkata Sudhakar

The OpenAI Moderation API detects harmful content in text across categories such as hate speech, harassment, self-harm, sexual content, and violence. ShopMax India uses the Moderation API to automatically screen customer-submitted product reviews and Q&A posts before they appear on the website, preventing abusive or off-topic content from reaching other shoppers.

The API call is client.moderations.create(), accepting input (a string or list of strings) and an optional model (omni-moderation-latest for the most accurate results, or text-moderation-latest for faster, lower-cost screening). The response includes a flagged boolean and a categories object showing which specific categories triggered, plus category_scores showing confidence values between 0 and 1 for each category.

The below example shows ShopMax India screening a batch of customer-submitted product reviews, flagging harmful ones, and logging the results for a content moderation queue.

import openai

client = openai.OpenAI(api_key="your-openai-api-key")

# Sample customer reviews submitted on ShopMax India
reviews = [
    {"id": "REV-1001", "text": "Great laptop, fast delivery to Bangalore. Highly recommend!"},
    {"id": "REV-1002", "text": "This product is absolute garbage. The seller is a fraud and a cheat."},
    {"id": "REV-1003", "text": "Good value for money. The Samsung TV picture quality is excellent."},
    {"id": "REV-1004", "text": "Terrible experience. I will destroy this company if they do not refund me."},
]

print("Screening", len(reviews), "reviews...\n")

for review in reviews:
    result = client.moderations.create(
        input=review["text"],
        model="omni-moderation-latest"
    )
    outcome = result.results[0]
    status = "FLAGGED" if outcome.flagged else "APPROVED"
    print(f"{review['id']}: {status}")
    if outcome.flagged:
        triggered = [k for k, v in outcome.categories.__dict__.items() if v]
        print(f"  Categories: {triggered}")

It gives the following output,

Screening 4 reviews...

REV-1001: APPROVED
REV-1002: FLAGGED
  Categories: ['harassment']
REV-1003: APPROVED
REV-1004: FLAGGED
  Categories: ['harassment', 'harassment/threatening']

Use omni-moderation-latest for user-generated content where accuracy matters, and text-moderation-latest for high-volume pre-screening to reduce latency and cost. Never block content based on moderation scores alone - route flagged items to a human review queue for final decisions to avoid false positives. Store category_scores alongside flagged content so moderators can see confidence levels. For multilingual platforms, pass content in the original language as the API handles multiple languages. Set score thresholds per category based on your platform policy rather than relying solely on the flagged boolean.

Send your comments, suggestions or queries regarding this site to [email protected].