Databricks

LiteLLM supports all models on Databricks

tip

We support ALL Databricks models, just set model=databricks/<any-model-on-databricks> as a prefix when sending litellm requests

Authentication

LiteLLM supports multiple authentication methods for Databricks, listed in order of preference:

OAuth M2M (Recommended for Production)

OAuth Machine-to-Machine authentication using Service Principal credentials is the recommended method for production deployments per Databricks Partner requirements.

import os
from litellm import completion

# Set OAuth credentials (Service Principal)
os.environ["DATABRICKS_CLIENT_ID"] = "your-service-principal-application-id"
os.environ["DATABRICKS_CLIENT_SECRET"] = "your-service-principal-secret"
os.environ["DATABRICKS_API_BASE"] = "https://adb-xxx.azuredatabricks.net/serving-endpoints"

response = completion(
    model="databricks/databricks-dbrx-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)

Personal Access Token (PAT)

PAT authentication is supported for development and testing scenarios.

import os
from litellm import completion

os.environ["DATABRICKS_API_KEY"] = "dapi..."  # Your Personal Access Token
os.environ["DATABRICKS_API_BASE"] = "https://adb-xxx.azuredatabricks.net/serving-endpoints"

response = completion(
    model="databricks/databricks-dbrx-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)

Databricks SDK Authentication (Automatic)

If no credentials are provided, LiteLLM will use the Databricks SDK for automatic authentication. This supports OAuth, Azure AD, and other unified auth methods configured in your environment.

from litellm import completion

# No environment variables needed - uses Databricks SDK unified auth
# Requires: pip install databricks-sdk
response = completion(
    model="databricks/databricks-dbrx-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)

Custom User-Agent for Partner Attribution

If you're building a product on top of LiteLLM that integrates with Databricks, you can pass your own partner identifier for proper attribution in Databricks telemetry.

The partner name will be prefixed to the LiteLLM user agent:

# Via parameter
response = completion(
    model="databricks/databricks-dbrx-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    user_agent="mycompany/1.0.0",
)
# Resulting User-Agent: mycompany_litellm/1.79.1

# Via environment variable
os.environ["DATABRICKS_USER_AGENT"] = "mycompany/1.0.0"
# Resulting User-Agent: mycompany_litellm/1.79.1

Input	Resulting User-Agent
(none)	`litellm/1.79.1`
`mycompany/1.0.0`	`mycompany_litellm/1.79.1`
`partner_product/2.5.0`	`partner_product_litellm/1.79.1`
`acme`	`acme_litellm/1.79.1`

Note: The version from your custom user agent is ignored; LiteLLM's version is always used.

Security

LiteLLM automatically redacts sensitive information (tokens, secrets, API keys) from all debug logs to prevent credential leakage. This includes:

Authorization headers
API keys and tokens
Client secrets
Personal access tokens (PATs)

Usage

SDK
PROXY

ENV VAR

import os 
os.environ["DATABRICKS_API_KEY"] = ""
os.environ["DATABRICKS_API_BASE"] = ""

Example Call

from litellm import completion
import os
## set ENV variables
os.environ["DATABRICKS_API_KEY"] = "databricks key"
os.environ["DATABRICKS_API_BASE"] = "databricks base url" # e.g.: https://adb-3064715882934586.6.azuredatabricks.net/serving-endpoints

# Databricks dbrx-instruct call
response = completion(
    model="databricks/databricks-dbrx-instruct", 
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Add models to your config.yaml

model_list:
  - model_name: dbrx-instruct
    litellm_params:
      model: databricks/databricks-dbrx-instruct
      api_key: os.environ/DATABRICKS_API_KEY
      api_base: os.environ/DATABRICKS_API_BASE
      user_agent: "mycompany/1.0.0"  # Optional: for partner attribution

Start the proxy

$ litellm --config /path/to/config.yaml --debug

Send Request to LiteLLM Proxy Server

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="dbrx-instruct",
    messages = [
      {
          "role": "system",
          "content": "Be a good human!"
      },
      {
          "role": "user",
          "content": "What do you know about earth?"
      }
  ]
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "dbrx-instruct",
    "messages": [
      {
          "role": "system",
          "content": "Be a good human!"
      },
      {
          "role": "user",
          "content": "What do you know about earth?"
      }
      ],
}'

Passing additional params - max_tokens, temperature

See all litellm.completion supported params here

# !pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["DATABRICKS_API_KEY"] = "databricks key"
os.environ["DATABRICKS_API_BASE"] = "databricks api base"

# databricks dbrx call
response = completion(
    model="databricks/databricks-dbrx-instruct", 
    messages = [{ "content": "Hello, how are you?","role": "user"}],
    max_tokens=20,
    temperature=0.5
)

proxy

  model_list:
    - model_name: llama-3
      litellm_params:
        model: databricks/databricks-meta-llama-3-70b-instruct
        api_key: os.environ/DATABRICKS_API_KEY
        max_tokens: 20
        temperature: 0.5

Usage - Thinking / `reasoning_content`

LiteLLM translates OpenAI's reasoning_effort to Anthropic's thinking parameter. Code

reasoning_effort	thinking
"low"	"budget_tokens": 1024
"medium"	"budget_tokens": 2048
"high"	"budget_tokens": 4096

Known Limitations:

Support for passing thinking blocks back to Claude Issue

SDK
PROXY

from litellm import completion
import os

# set ENV variables (can also be passed in to .completion() - e.g. `api_base`, `api_key`)
os.environ["DATABRICKS_API_KEY"] = "databricks key"
os.environ["DATABRICKS_API_BASE"] = "databricks base url"

resp = completion(
    model="databricks/databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
)

Setup config.yaml

- model_name: claude-3-7-sonnet
  litellm_params:
    model: databricks/databricks-claude-3-7-sonnet
    api_key: os.environ/DATABRICKS_API_KEY
    api_base: os.environ/DATABRICKS_API_BASE

Start proxy

litellm --config /path/to/config.yaml

Test it!

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "claude-3-7-sonnet",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning_effort": "low"
  }'

Expected Response

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                provider_specific_fields={
                    'citations': None,
                    'thinking_blocks': [
                        {
                            'type': 'thinking',
                            'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                            'signature': 'EuYBCkQYAiJAy6...'
                        }
                    ]
                }
            ),
            thinking_blocks=[
                {
                    'type': 'thinking',
                    'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                    'signature': 'EuYBCkQYAiJAy6AGB...'
                }
            ],
            reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

Citations

Anthropic models served through Databricks can return citation metadata. LiteLLM exposes these via response.choices[0].message.provider_specific_fields["citations"].

Pass `thinking` to Anthropic models

You can also pass the thinking parameter to Anthropic models.

SDK
PROXY

from litellm import completion
import os

# set ENV variables (can also be passed in to .completion() - e.g. `api_base`, `api_key`)
os.environ["DATABRICKS_API_KEY"] = "databricks key"
os.environ["DATABRICKS_API_BASE"] = "databricks base url"

response = litellm.completion(
  model="databricks/databricks-claude-3-7-sonnet",
  messages=[{"role": "user", "content": "What is the capital of France?"}],
  thinking={"type": "enabled", "budget_tokens": 1024},
)

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "databricks/databricks-claude-3-7-sonnet",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

Supported Databricks Chat Completion Models

tip

We support ALL Databricks models, just set model=databricks/<any-model-on-databricks> as a prefix when sending litellm requests

Model Name	Command
databricks/databricks-claude-3-7-sonnet	`completion(model='databricks/databricks/databricks-claude-3-7-sonnet', messages=messages)`
databricks-meta-llama-3-1-70b-instruct	`completion(model='databricks/databricks-meta-llama-3-1-70b-instruct', messages=messages)`
databricks-meta-llama-3-1-405b-instruct	`completion(model='databricks/databricks-meta-llama-3-1-405b-instruct', messages=messages)`
databricks-dbrx-instruct	`completion(model='databricks/databricks-dbrx-instruct', messages=messages)`
databricks-meta-llama-3-70b-instruct	`completion(model='databricks/databricks-meta-llama-3-70b-instruct', messages=messages)`
databricks-llama-2-70b-chat	`completion(model='databricks/databricks-llama-2-70b-chat', messages=messages)`
databricks-mixtral-8x7b-instruct	`completion(model='databricks/databricks-mixtral-8x7b-instruct', messages=messages)`
databricks-mpt-30b-instruct	`completion(model='databricks/databricks-mpt-30b-instruct', messages=messages)`
databricks-mpt-7b-instruct	`completion(model='databricks/databricks-mpt-7b-instruct', messages=messages)`

Embedding Models

Passing Databricks specific params - 'instruction'

For embedding models, databricks lets you pass in an additional param 'instruction'. Full Spec

# !pip install litellm
from litellm import embedding
import os
## set ENV variables
os.environ["DATABRICKS_API_KEY"] = "databricks key"
os.environ["DATABRICKS_API_BASE"] = "databricks url"

# Databricks bge-large-en call
response = litellm.embedding(
      model="databricks/databricks-bge-large-en",
      input=["good morning from litellm"],
      instruction="Represent this sentence for searching relevant passages:",
  )

proxy

  model_list:
    - model_name: bge-large
      litellm_params:
        model: databricks/databricks-bge-large-en
        api_key: os.environ/DATABRICKS_API_KEY
        api_base: os.environ/DATABRICKS_API_BASE
        instruction: "Represent this sentence for searching relevant passages:"

Supported Databricks Embedding Models

tip

We support ALL Databricks models, just set model=databricks/<any-model-on-databricks> as a prefix when sending litellm requests

Model Name	Command
databricks-bge-large-en	`embedding(model='databricks/databricks-bge-large-en', messages=messages)`
databricks-gte-large-en	`embedding(model='databricks/databricks-gte-large-en', messages=messages)`

Authentication​

OAuth M2M (Recommended for Production)​

Personal Access Token (PAT)​

Databricks SDK Authentication (Automatic)​

Custom User-Agent for Partner Attribution​

Security​

Usage​

ENV VAR​

Example Call​

Passing additional params - max_tokens, temperature​

Usage - Thinking / reasoning_content​

Citations​

Pass thinking to Anthropic models​

Supported Databricks Chat Completion Models​

Embedding Models​

Passing Databricks specific params - 'instruction'​

Supported Databricks Embedding Models​

Authentication

OAuth M2M (Recommended for Production)

Personal Access Token (PAT)

Databricks SDK Authentication (Automatic)

Custom User-Agent for Partner Attribution

Security

Usage

ENV VAR

Example Call

Passing additional params - max_tokens, temperature

Usage - Thinking / `reasoning_content`

Citations

Pass `thinking` to Anthropic models

Supported Databricks Chat Completion Models

Embedding Models

Passing Databricks specific params - 'instruction'

Supported Databricks Embedding Models