Content Moderation API

Version: 0.0.1

Overview

This API flags abusive/harmful text content using a 3-step flow:

Check against Tattle's slur list (moderate if matched)
Check with Llama Guard model (moderate if unsafe)
Check against a use case specific flagged list.

Responses are shaped consistently with a meta object, a should_moderate flag, a reason enum, and an HTTP status_code echoed in the body.

Endpoints

1) POST `/moderate`

Description: Runs the content moderation flow on the provided text.
Request Body:

{
  "text": "string"
}

Response Body (Success/Handled Error):

{
  "meta": {
    "response_time": 12.34,
    "flagged_words": ["string"]
  },
  "should_moderate": true || false,
  "reason": "safe" | "tattle_slur_list" | "llama_guard" | "flag_list",
  "status_code": 200
}

Below we explain the Input and Output structures in detail

Input/Output Doc

Request Schema

text (string, required): Raw text to be evaluated. Empty or whitespace-only strings are treated as invalid input.

Example:

{
  "text": "I think this message is fine."
}

Response Schema

meta (object, always present)
- response_time (number): Time in milliseconds taken to process the request.
- flagged_words (array of strings): Words that matched either the slur list or the flagged list (depending on the branch taken). Empty if none.
should_moderate (boolean):
- true → Content should be moderated/blocked.
- false → Content is safe to allow.
reason (enum string): One of the following values describing why the decision was made.
- tattle_slur_list → Matched with Tattle's slur list. Text should be moderated.
- llama_guard → Llama Guard model classified as unsafe. Text should be moderated.
- flag_list → Matched with use-case specific flagged list of terms/words . This flag's indicates Human review is needed.
- safe → Content assessed as safe.
status_code (integer): Mirrors the HTTP status code. Common values:
- 200: Successful evaluation (including safe or moderate outcomes).
- 400: Bad input (e.g., empty/whitespace-only text).
- 500: Internal server error.

Status Codes

200 OK: Request processed successfully. status_code in body is 200.
400 Bad Request: Input was invalid (e.g., empty text). status_code in body is 400.
500 Internal Server Error: An unexpected error occurred. status_code in body is 500.

Examples

A) Slur matched (HTTP 200, moderate)

Request:

curl -X POST \
  https://api-endpoint/moderate \
  -H 'Content-Type: application/json' \
  -d '{"text": "contains badword"}'

Response 200:

{
  "meta": {
    "response_time": 3.21,
    "flagged_words": ["badword"]
  },
  "should_moderate": true,
  "reason": "tattle_slur_list",
  "status_code": 200
}

B) Llama Guard unsafe (HTTP 200, moderate)

Request:

curl -X POST \
  https://api-endpoint/moderate \
  -H 'Content-Type: application/json' \
  -d '{"text": "some unsafe content"}'

Response 200:

{
  "meta": {
    "response_time": 45.67,
    "flagged_words": []
  },
  "should_moderate": true,
  "reason": "llama_guard",
  "status_code": 200
}

C) Use-case specific Flagged list match (HTTP 200, safe)

Request:

curl -X POST \
  https://api-endpoint/moderate \
  -H 'Content-Type: application/json' \
  -d '{"text": "contains whitelist term"}'

Response 200:

{
  "meta": {
    "response_time": 7.89,
    "flagged_words": ["whitelist"]
  },
  "should_moderate": false,
  "reason": "flag_list",
  "status_code": 200
}

D) Safe content (HTTP 200, safe)

Request:

curl -X POST \
  https://api-endpoint/moderate \
  -H 'Content-Type: application/json' \
  -d '{"text": "Hello there!"}'

Response 200:

{
  "meta": {
    "response_time": 2.34,
    "flagged_words": []
  },
  "should_moderate": false,
  "reason": "safe",
  "status_code": 200
}

E) Empty input (HTTP 400)

Request:

curl -X POST \
  https://api-endpoint/moderate \
  -H 'Content-Type: application/json' \
  -d '{"text": "   "}'

Response 400:

{
  "meta": {
    "response_time": 0.12,
    "flagged_words": []
  },
  "should_moderate": false,
  "reason": null,
  "status_code": 400
}

Notes

Word list files are read from assets/slur-list.txt and assets/flagged-list.txt if present.
When Llama Guard is unavailable, the model check is skipped and processing continues based on lists.
response_time is returned in milliseconds as a floating-point number.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
assets		assets
helpers		helpers
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
server.py		server.py
service.py		service.py
start.sh		start.sh
test.py		test.py
ui.html		ui.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Content Moderation API

Overview

Endpoints

1) POST `/moderate`

Input/Output Doc

Request Schema

Response Schema

Status Codes

Examples

A) Slur matched (HTTP 200, moderate)

B) Llama Guard unsafe (HTTP 200, moderate)

C) Use-case specific Flagged list match (HTTP 200, safe)

D) Safe content (HTTP 200, safe)

E) Empty input (HTTP 400)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tattle-made/llamaguard-uli

Folders and files

Latest commit

History

Repository files navigation

Content Moderation API

Overview

Endpoints

1) POST /moderate

Input/Output Doc

Request Schema

Response Schema

Status Codes

Examples

A) Slur matched (HTTP 200, moderate)

B) Llama Guard unsafe (HTTP 200, moderate)

C) Use-case specific Flagged list match (HTTP 200, safe)

D) Safe content (HTTP 200, safe)

E) Empty input (HTTP 400)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1) POST `/moderate`

Packages