Using DeepInfra models with Codex

I was using Codex with DeepInfra-hosted models for my benchmarking test and ran into a small compatibility problem.

Codex sends requests using OpenAI’s Responses API. DeepInfra provides an OpenAI-compatible API, but the endpoint I needed was chat completions. Those formats are close but not enough for Codex to use directly.

So I had Codex build DeepInfra-Codex-Shim, a small local Node.js shim that sits between Codex and DeepInfra. Codex sends a POST /v1/responses request to the shim. The shim converts that request to POST /chat/completions for DeepInfra, then converts the result back into a Responses-shaped object Codex can understand.

This is not a full Responses API implementation. It is a practical bridge for using DeepInfra-hosted coding models with Codex.

The main thing a human has to do is provide a DeepInfra API token. The shim reads DEEPINFRA_TOKEN, and Codex uses that same environment variable in its provider config. Once that token is available, the rest is the kind of setup an AI coding harness can handle – just point it to this blog post.

What the shim does

The shim runs locally:

http://127.0.0.1:8797/v1

Codex points to that local URL. The request path looks like this:

Codex
  -> local shim /v1/responses
  -> DeepInfra /v1/openai/chat/completions
  -> local shim
  -> Responses-shaped output for Codex

The shim handles the pieces I needed for Codex coding workflows:

System instructions
User and assistant messages
Function tools
Function calls
Function call outputs
Basic token usage
Model listing
DeepInfra streaming responses converted into Responses-style events for Codex

If Codex asks for streaming, the shim sends stream: true to DeepInfra and converts DeepInfra’s streamed chat-completion chunks into Responses-style server-sent events for Codex.

Text deltas can stream through as they arrive. Tool calls are a little different because chat completions providers send function names and arguments in fragments, so the shim accumulates those fragments and emits a complete Responses function_call item when the upstream stream finishes.

The shim also includes POST /v1/chat/completions as a passthrough endpoint. Codex does not need that endpoint, but it makes smoke testing easier because you can check your DeepInfra token and model before debugging the Responses conversion.

Setup

First, put your DeepInfra API token in your shell environment. This is the one part you should do yourself because it is your credential.

export DEEPINFRA_TOKEN="your_deepinfra_api_token"

For persistent CLI use, add that line to ~/.zshrc or your shell profile.

Clone the project:

git clone https://github.com/billerickson/DeepInfra-Codex-Shim.git
cd DeepInfra-Codex-Shim

Install and run the tests:

npm install
npm test

Start the shim:

npm start -- --log-requests

Or run the binary directly:

node bin/deepinfra-codex-shim.js --log-requests

By default, the shim listens on:

http://127.0.0.1:8797/v1

You can confirm it is running with:

curl http://127.0.0.1:8797/health

You should see a small JSON response showing the shim is up and pointing at DeepInfra.

Codex configuration

Add a DeepInfra provider to your Codex config. The important line is env_key = "DEEPINFRA_TOKEN", which tells Codex to use the same token you exported above. Codex config formats may change, so treat this as a template:

[model_providers.deepinfra]
name = "DeepInfra via local shim"
base_url = "http://127.0.0.1:8797/v1"
env_key = "DEEPINFRA_TOKEN"
wire_api = "responses"

Then create or update a profile that uses that provider:

[profiles.deepinfra]
model_provider = "deepinfra"
model = "deepseek-ai/DeepSeek-V4-Flash"

Now you can run Codex with a DeepInfra-hosted model:

codex --profile deepinfra --model deepseek-ai/DeepSeek-V4-Flash

You can swap in any DeepInfra model ID you want to test:

codex --profile deepinfra --model stepfun-ai/Step-3.5-Flash
codex --profile deepinfra --model MiniMaxAI/MiniMax-M2.5

Model behavior varies. The shim can translate function-call requests and responses, but it cannot make a model reliable at tool use if the model itself does not emit OpenAI-compatible tool calls consistently.

Smoke testing DeepInfra

If Codex fails, test the chat passthrough first:

curl http://127.0.0.1:8797/v1/chat/completions \
  -H "content-type: application/json" \
  -H "authorization: Bearer $DEEPINFRA_TOKEN" \
  -d '{"model":"deepseek-ai/DeepSeek-V4-Flash","messages":[{"role":"user","content":"Reply with exactly OK."}]}'

If that works, your token, upstream URL, and model are probably fine. Any remaining issue is more likely in the Responses conversion or in how a specific model handles tools.

The repo also includes an optional integration test:

DEEPINFRA_TOKEN=... npm run test:integration

It makes one harmless request, Reply with exactly OK., and checks that the shim returns a Codex-compatible response.

Configuration

These are the main environment variables:

DEEPINFRA_CODEX_SHIM_HOST=127.0.0.1
DEEPINFRA_CODEX_SHIM_PORT=8797
DEEPINFRA_CODEX_SHIM_UPSTREAM=https://api.deepinfra.com/v1/openai
DEEPINFRA_CODEX_SHIM_API_KEY_ENV=DEEPINFRA_TOKEN
DEEPINFRA_CODEX_SHIM_LOG_LEVEL=info
DEEPINFRA_CODEX_SHIM_LOG_REQUESTS=false
DEEPINFRA_CODEX_SHIM_LOG_CONTENT=false
DEEPINFRA_CODEX_SHIM_TIMEOUT_MS=120000
DEEPINFRA_CODEX_SHIM_MAX_BODY_BYTES=10485760
DEEPINFRA_CODEX_SHIM_COMPAT_DROP_TOOL_CALL_CONTENT=false

You can set the same options with CLI flags:

node bin/deepinfra-codex-shim.js \
  --host 127.0.0.1 \
  --port 8797 \
  --upstream https://api.deepinfra.com/v1/openai \
  --api-key-env DEEPINFRA_TOKEN \
  --log-requests

The default host is 127.0.0.1, and I would keep it that way unless you have a specific reason to expose the shim elsewhere. The shim does not add its own authentication layer. It just forwards provider credentials upstream.

A compatibility note

During my benchmark tests, I dropped assistant text whenever the assistant also returned tool calls. That helped with some Codex tool-call turns, but it is a surprising default for a public project.

The packaged shim preserves assistant text by default, but it can now be removed with an explicit flag:

node bin/deepinfra-codex-shim.js --compat-drop-tool-call-content

What it does not support

The shim does not support images, audio, built-in OpenAI tools, web search, file search, reasoning item semantics, persistent response IDs, server-side conversation state, or advanced structured output behavior.

If your workflow is normal Codex coding with text and function/tool calls, the shim should be useful for actual work. If you need full Responses API compatibility, this is not that.

Privacy and logging

Codex may send task prompts, repository context, file contents, diffs, commands, command output, and tool results to the model provider you choose.

The shim forwards your DeepInfra token upstream but does not store it. Normal request logging is off. When you run with --log-requests, it logs request shape, model names, tool counts, and request IDs. It does not log full prompts or tool outputs.

There is a --log-content flag for deeper debugging, but it can print code and prompt content. I would only use it locally, briefly, and intentionally.

Tokens and authorization headers are redacted from structured logs and upstream error snippets.

Using DeepInfra models with Codex

What the shim does

Setup

Codex configuration

Smoke testing DeepInfra

Configuration

A compatibility note

What it does not support

Privacy and logging

Bill Erickson

Ready to upgrade your website?

Leave A Reply Cancel reply

What the shim does

Setup

Codex configuration

Smoke testing DeepInfra

Configuration

A compatibility note

What it does not support

Privacy and logging

Bill Erickson

Ready to upgrade your website?

Benchmarking LLMs on Real Client Work

Use Hermes Desktop with your remote Hermes agent

Migrating from WordPress to Astro

Reader Interactions

Leave A Reply Cancel reply