Python package

pangram.text_classifier module

class pangram.text_classifier.PangramText(api_key: str | None = None)

Bases: object

__init__(api_key: str | None = None) → None

A classifier for text inputs using the Pangram Labs API.

Parameters:: api_key (str, optional) – Your API key for the Pangram Labs. If not provided, the environment variable PANGRAM_API_KEY will be used.
Raises:: ValueError – If the API key is not provided and not set in the environment.

submit_bulk(text: List[str] | None = None, items: List[Dict[str, str]] | None = None) → Dict

Submit a Bulk API job for asynchronous AI detection.

Provide either text as a list of input strings or items as a list of dictionaries with text and an optional customer-defined id. The response includes a bulk_id for polling and immediate per-item validation failures, if any.

Parameters:

text (List[str], optional) – A list of input texts to analyze.
items (List[Dict[str, str]], optional) – A list of item dictionaries. Each item must include text and may include id.

Returns:

Bulk submission response containing bulk_id, status, total_items, accepted_items, and failed_items.

Return type:

Dict

Raises:

ValueError – If both or neither payload shapes are provided, or if the API returns an error.

get_bulk_status(bulk_id: str) → Dict

Fetch the current status for a Bulk API job.

Parameters:: bulk_id (str) – The bulk job ID returned by submit_bulk().
Returns:: Bulk status response containing counters and timestamps.
Return type:: Dict
Raises:: ValueError – If the API returns an error or an invalid response.

get_bulk_items(bulk_id: str, offset: int = 0, limit: int = 100) → Dict

Fetch paginated item metadata for a Bulk API job.

Parameters:

bulk_id (str) – The bulk job ID returned by submit_bulk().
offset (int) – Zero-based item offset. Defaults to 0.
limit (int) – Maximum number of items to return. The API allows up to 1000.

Returns:

Paginated bulk item metadata.

Return type:

Dict

Raises:

ValueError – If the API returns an error or an invalid response.

get_bulk_results_page(bulk_id: str, offset: int = 0, limit: int = 100) → Dict

Fetch one page of results for a Bulk API job.

Completed successful items include a result field with the same response shape returned by predict(). Items that are still running have result set to None. Failed items are returned separately in failed_items.

Parameters:

bulk_id (str) – The bulk job ID returned by submit_bulk().
offset (int) – Zero-based item offset. Defaults to 0.
limit (int) – Maximum number of items to return. The API allows up to 1000.

Returns:

Paginated bulk result response.

Return type:

Dict

Raises:

ValueError – If the API returns an error or an invalid response.

get_bulk_results(bulk_id: str, page_size: int = 1000) → Dict

Fetch all available results for a Bulk API job.

This helper follows the paginated /bulk/{bulk_id}/results endpoint until every submitted item index has been covered. Failed items are returned separately in failed_items. If the job is still running, unfinished accepted items are included in items with result set to None.

Parameters:

bulk_id (str) – The bulk job ID returned by submit_bulk().
page_size (int) – Number of submitted item slots to request per API call. The API allows up to 1000.

Returns:

Aggregated bulk result response containing bulk_id, total_items, items, and failed_items.

Return type:

Dict

Raises:

ValueError – If page_size is invalid, or if the API returns an error or invalid response.

wait_for_bulk(bulk_id: str, timeout: float = 3600, poll_interval: float = 0.5) → Dict

Poll a Bulk API job until it reaches a terminal status.

Terminal statuses are succeeded, failed, and partial. Completion time depends on the number and length of submitted items and current system load.

Parameters:

bulk_id (str) – The bulk job ID returned by submit_bulk().
timeout (float) – Maximum seconds to wait for terminal completion.
poll_interval (float) – Seconds to wait between polling attempts. Values below 0.1 are clamped to 0.1.

Returns:

Terminal bulk status response.

Return type:

Dict

Raises:

ValueError – If timeout or poll interval values are invalid, or if the API returns an error.
TimeoutError – If the bulk job does not complete before timeout.

predict(text: str, public_dashboard_link: bool = False, timeout: float = 300, poll_interval: float = 0.5) → Dict

Classify text as AI-, AI-assisted, or human-written.

Submits the text to Pangram’s async inference endpoint, waits for completion, and returns analysis with windowed results.

Parameters:

text (str) – The text to be classified.
public_dashboard_link (bool) – Whether to include a public dashboard link in the completed response. Defaults to False.
timeout (float) – Maximum seconds to wait for the async task to complete. Defaults to 300.
poll_interval (float) – Seconds to wait between polling attempts. Values below 0.1 are clamped to 0.1. Defaults to 0.5.

Returns:

Pangram analysis with AI-assistance detection as a dict with the following fields:

stage (str): The terminal async task stage, normally “STAGE_SUCCESS”.
text (str): The input text.
version (str): The API version identifier (e.g., “3.0”).
headline (str): Classification headline summarizing the result.
prediction (str): Long-form prediction string describing the classification.
prediction_short (str): Short-form prediction string (“AI”, “AI-Assisted”, “Human”, “Mixed”).
fraction_ai (float): Fraction of text classified as AI-written (0.0-1.0).
fraction_ai_assisted (float): Fraction of text classified as AI-assisted (0.0-1.0).
fraction_human (float): Fraction of text classified as human-written (0.0-1.0).
num_ai_segments (int): Number of text segments classified as AI.
num_ai_assisted_segments (int): Number of text segments classified as AI-assisted.
num_human_segments (int): Number of text segments classified as human.
dashboard_link (str): A link to the dashboard page containing the full classification result, if requested.
windows (list): List of text windows and their classifications. Each window contains:
- text (str): The window text.
- label (str): Descriptive classification label (e.g., “AI-Generated”, “Moderately AI-Assisted”).
- ai_assistance_score (float): Score detailing the level of AI assistance within the window (0.0-1.0), where 0 means no AI assistance and 1.0 means AI-generated.
- confidence (str): Confidence level for the classification (“High”, “Medium”, “Low”).
- start_index (int): Starting character index in the original text.
- end_index (int): Ending character index in the original text.
- word_count (int): Number of words in the window.
- token_length (int): Token length of the window.

Return type:

Dict

Raises:

ValueError – If the API returns an error or if the response is invalid
TimeoutError – If the async task does not complete before timeout

predict_files(file_paths: List[str | PathLike], public_dashboard_link: bool = False, timeout: float = 300) → List[Dict]

Upload one or more files for AI detection.

Files are submitted to Pangram’s file upload endpoint as multipart form data with one files field per uploaded .docx, .pdf, or .rtf file. Each returned result includes the extracted text, prediction fields, window-level analysis, and the uploaded filename. When public_dashboard_link is true, each result also includes a dashboard_link URL.

Parameters:

file_paths (List[Union[str, os.PathLike]]) – Paths to files to upload and analyze.
public_dashboard_link (bool) – Whether to create public dashboard links for the uploaded files. Defaults to False.
timeout (float) – Maximum seconds to wait for the upload request to complete. Defaults to 300.

Returns:

A list of per-file result dictionaries returned by the API.

Return type:

List[Dict]

Raises:

ValueError – If no files are provided, if timeout is invalid, if the API returns an error, or if the response is invalid.
requests.RequestException – File open errors are raised by Python before the request is sent.

predict_file(file_path: str | PathLike, public_dashboard_link: bool = False, timeout: float = 300) → Dict

Upload a single file for AI detection.

This convenience method calls predict_files() with one path and returns the first per-file result.

Parameters:

file_path (Union[str, os.PathLike]) – Path to the file to upload and analyze.
public_dashboard_link (bool) – Whether to create a public dashboard link for the uploaded file. Defaults to False.
timeout (float) – Maximum seconds to wait for the upload request to complete. Defaults to 300.

Returns:

The per-file result dictionary returned by the API.

Return type:

Dict

Raises:

ValueError – If the API returns an error or an invalid response.

predict_short(text: str) → Dict

Classify text using the main async prediction endpoint.

Deprecated since version This: compatibility alias forwards to predict(). Use predict() directly for Pangram’s current response schema. This method may be removed on August 1, 2026.

Parameters:: text (str) – The text to be classified.
Returns:: The same classification result returned by predict().
Return type:: Dict

batch_predict(text_batch: List[str]) → List[Dict]

Classify a batch of text as AI-, AI-assisted, or human-written.

This method iterates through the batch and calls predict() for each text.

Deprecated since version This: compatibility method forwards to predict() once per input text. Use submit_bulk() for asynchronous bulk jobs or predict() for one-off calls. This method may be removed on August 1, 2026.

Parameters:: text_batch (List[str]) – A list of strings to be classified.
Returns:: A list of classification results from the API for each text in the batch. Each result is a dict with the same fields as returned by predict().
Return type:: List[Dict]

predict_with_dashboard_link(text: str, timeout: float = 300, poll_interval: float = 0.5) → Dict

Classify text as AI-, AI-assisted, or human-written.

Submits the text to Pangram’s async inference endpoint, waits for completion, and returns analysis with a public dashboard link.

Parameters:

text (str) – The text to be classified.
timeout (float) – Maximum seconds to wait for the async task to complete. Defaults to 300.
poll_interval (float) – Seconds to wait between polling attempts. Values below 0.1 are clamped to 0.1. Defaults to 0.5.

Returns:

The classification result from the API, as a dict with the following fields:

text (string): The classified text.
dashboard_link (string): A link to a dashboard page containing the classification result.
stage (string): The terminal async task stage, normally “STAGE_SUCCESS”.
prediction (string): Long-form prediction string describing the classification.
prediction_short (string): Short-form prediction string.
fraction_ai (float): Fraction of text classified as AI-written (0.0-1.0).
fraction_ai_assisted (float): Fraction of text classified as AI-assisted (0.0-1.0).
fraction_human (float): Fraction of text classified as human-written (0.0-1.0).
windows (list): List of text windows and their classifications.

Return type:

dict

Raises:

ValueError – If the API returns an error or if the response is invalid
TimeoutError – If the async task does not complete before timeout

check_plagiarism(text: str) → Dict

Check text for potential plagiarism by comparing it against a vast database of online content.

Parameters:

text (str) – The text to check for plagiarism.

Returns:

A dictionary containing the plagiarism check results, including:

text (str): The input text.
plagiarism_detected (bool): Whether plagiarism was detected
plagiarized_content (List): List of detected plagiarized content with sources
total_sentences (int): Total number of sentences checked
plagiarized_sentences (List): List of sentences detected as plagiarized
percent_plagiarized (float): Percentage of text detected as plagiarized

Return type:

Dict

Raises:

ValueError – If the API returns an error or if the response is invalid