Introducing MuleSoft’s AI Gateway LLM Capabilities

Ahmed Riaz

3 months ago

Reading Time: 8 minutes

If you’ve run AI experiments in the last 18 months, you’ve dealt with architecture complexity where one service making direct HTTPS calls to the OpenAI completions endpoint with a hardcoded API key in an environment variable, another calls Vertex AI with a service-account JSON file, and yet another calls Bedrock via the AWS SDK.

Each has its own retry logic, its own prompt-to-token accounting, and its own way of handling rate-limit. This creates various problems such as Credential rotation requires touching multiple applications with no central inventory of which services hold which keys.

Token spend is aggregated at the billing level, not the application or team level; you can’t tell which team is burning your budget. Provider migration from gpt-4o to claude-3-5-sonnet means rewriting request serialization, response parsing, and error handling in every consumer. Prompt injection has no consistent mitigation layer; each team implements or skips its own input validation.

AI Gateway LLM capabilities

To solve this, MuleSoft introduced the general availability of AI Gateway LLM capabilitiesa centralized control plane designed to bring order, security, and visibility to enterprise AI adoption.

Think of the AI Gateway as a universal translator and a security checkpoint for all your AI traffic. Built on the Anypoint Flex Gateway, it sits between your internal applications and external LLM providers. Instead of your apps talking directly to ChatGPT or Claude, they talk to the Gateway, which securely handles the rest.

Unified access and Open AI-compatible interface

The MuleSoft LLM Gateway simplifies development by adopting the OpenAI API pattern as the standard consumers contract. Developers can invoke any supported LLM (such as Gemini, OpenAI, Azure(OpenAI), Amazon Bedrock(Claude Models)) via a single, secure endpoint without learning a provider-specific interface.

Seamless authentication: Developers use familiar Anypoint authentication models and never have to manage or expose provider-specific API keys
Payload normalization: The gateway automatically detects the model specified in the request payload, translates the OpenAI format to the underlying provider’s format, and normalizes the response back to the standard OpenAI format

Intelligent routing strategies

To optimize performance and reliability, we introduce robust request routing and orchestration features:

Semantic routing: The gateway routes prompts based on contextual meaning. By evaluating a user’s prompt against predefined utterances, the gateway can route simple queries to faster, cost-effective models while directing complex domain tasks (like coding) to specialized models. For instance, an API admin can define a “finance” topic routed to an OpenAI GPT model and a “code” topic routed to a Gemini Pro model
Model-based routing: Administrators can configure routes to specific models and even set target models that override the consumer’s request to enforce enterprise standards
Fallback routing: If the request doesn’t meet the criterion of any route, then one can choose the LLM model to fallback. (eg from gpt-3.5-turbo to gpt-4o)

Governance and cost control

Managing AI consumption is critical. AI Gateway provides comprehensive visibility and control over token usage.

Usage reporting: Administrators can track prompt and output token usage across different client applications and business groups via monthly or daily reports
Rate limiting: AI platform admins can set granular token rate limiting policies (eg limiting token requests per minute)
Semantic prompt guard: To protect against prompt injection attacks and enforce compliance, admins can configure deny lists using regular expressions (Regex) or utilize AI semantic matching to block malicious or non-compliant prompts before they reach the LLM

Benefits for your company

Write once, route anywhere: The gateway uses a single, standard API contract (based on the OpenAI specification). Your only developers need to learn one way to send prompts. The gateway automatically translates that promptly into the specific format required by Google Gemini, Anthropic Claude, or any other supported model
Cost control and visibility: AI models charge by the token. The gateway tracks exactly how many tokens each application, or business group is consuming
Built in AI security: Sending sensitive company data to a public LLM is a major compliance risk. The gateway includes features like Prompt Guardwhich acts as a firewall for AI. It scans incoming prompts for malicious inputs (like prompt injection attacks) and blocks non-compliant requests before they ever leave your network
Intelligent routing: Not all AI tasks require the most expensive, powerful model. The gateway can automatically analyze a prompt and route simple questions to a cheaper, faster model while sending complex reasoning tasks to a premium model

Built on the enterprise-grade Anypoint Flex Gateway, the AI Gateway provides a unified access layer to multiple LLM providers, offering intelligent routing, governance, and advanced cost management. Our new AI Gateway LLM capabilities transform AI from an unmanaged experiment into a governed, scalable, and secure enterprise resource. It gives developers the speed they need to build innovative apps, while giving IT the control required to protect your business.

Source link