|
|
Adversarial Input Sanitization for AI Chatbots
Author: Venkata Sudhakar
AI chatbots face a constant stream of adversarial inputs - messages deliberately crafted to bypass content policies, extract hidden system instructions, or cause the model to behave in ways that violate business rules. For ShopMax India's customer support chatbot, adversarial inputs might include jailbreak attempts to make the bot say competitors are better, Unicode tricks that visually mimic safe text but confuse the model, or excessive repetition designed to stress-test response generation. Sanitizing inputs before they reach the LLM is a critical first line of defense.
Input sanitization for AI chatbots involves multiple steps: length enforcement to prevent excessively long inputs that waste tokens or attempt to flood the context window, Unicode normalization to collapse lookalike characters (e.g., Cyrillic 'a' vs ASCII 'a') that bypass keyword filters, HTML and script tag stripping to prevent cross-site scripting if responses are rendered in a browser, repetition detection to block inputs that repeat the same phrase dozens of times, and keyword-based content policy filtering for language that violates the platform's terms of service.
The following sanitization pipeline for ShopMax India's chatbot applies all these checks in sequence. Each step either cleans the input or raises a rejection with an informative reason so the user can correct their query.
It gives the following output,
Input: What is the warranty on LG OLED TVs? ...
Response: LG OLED TVs come with a 1-year manufacturer warranty at ShopMax India...
Input: buy buy buy buy buy buy buy buy buy buy buy buy ...
Response: Sorry: Your message appears to contain excessive repetition. Please rephrase.
Input: <script>alert(1)</script> What is the return policy? ...
Response: ShopMax India offers a 10-day return policy on all electronics...
Input: Is Amazon better than ShopMax? ...
Response: Sorry: That topic is outside the scope of ShopMax India support.
Input: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ...
Response: Sorry: Input too long. Please keep your message under 500 characters.
In production, run sanitization as a FastAPI middleware layer so it applies to every route automatically without cluttering business logic. Keep the blocked terms list in a database so the support team can update it without a code deployment. For ShopMax India's multilingual user base spanning Hindi, Tamil, and Telugu speakers, apply Unicode normalization before keyword matching to ensure Devanagari or Tamil script inputs are correctly evaluated. Log all sanitization rejections with the original input (hashed for privacy) to track attack patterns over time.
|
|