docs.litellm.ai Open in urlscan Pro
76.76.21.241  Public Scan

Submitted URL: http://docs.litellm.ai/
Effective URL: https://docs.litellm.ai/
Submission: On March 08 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Skip to main content
🚅 LiteLLMDocsEnterprise
GitHubDiscord
I'm Confused






LITELLM - GETTING STARTED

https://github.com/BerriAI/litellm


CALL 100+ LLMS USING THE SAME INPUT/OUTPUT FORMAT

 * Translate inputs to provider's completion, embedding, and image_generation
   endpoints
 * Consistent output, text responses will always be available at
   ['choices'][0]['message']['content']
 * Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
 * Track spend & set budgets per project OpenAI Proxy Server


BASIC USAGE

pip install litellm



 * OpenAI
 * Anthropic
 * VertexAI
 * HuggingFace
 * Azure OpenAI
 * Ollama
 * Openrouter

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
  model="gpt-3.5-turbo", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)




from litellm import completion
import os

## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"

response = completion(
  model="claude-2", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)




from litellm import completion
import os

# auth: run 'gcloud auth application-default'
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
os.environ["VERTEX_LOCATION"] = "us-central1"

response = completion(
  model="chat-bison", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)




from litellm import completion 
import os

os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" 

# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
  model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
  messages=[{ "content": "Hello, how are you?","role": "user"}], 
  api_base="https://my-endpoint.huggingface.cloud"
)

print(response)




from litellm import completion
import os

## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""

# azure call
response = completion(
  "azure/<your_deployment_name>", 
  messages = [{ "content": "Hello, how are you?","role": "user"}]
)




from litellm import completion

response = completion(
            model="ollama/llama2", 
            messages = [{ "content": "Hello, how are you?","role": "user"}], 
            api_base="http://localhost:11434"
)




from litellm import completion
import os

## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" 

response = completion(
  model="openrouter/google/palm-2-chat-bison", 
  messages = [{ "content": "Hello, how are you?","role": "user"}],
)





STREAMING

Set stream=True in the completion args.

 * OpenAI
 * Anthropic
 * VertexAI
 * HuggingFace
 * Azure OpenAI
 * Ollama
 * Openrouter

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
  model="gpt-3.5-turbo", 
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
)




from litellm import completion
import os

## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"

response = completion(
  model="claude-2", 
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
)




from litellm import completion
import os

# auth: run 'gcloud auth application-default'
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
os.environ["VERTEX_LOCATION"] = "us-central1"

response = completion(
  model="chat-bison", 
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
)




from litellm import completion 
import os

os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" 

# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
  model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
  messages=[{ "content": "Hello, how are you?","role": "user"}], 
  api_base="https://my-endpoint.huggingface.cloud",
  stream=True,
)

print(response)




from litellm import completion
import os

## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""

# azure call
response = completion(
  "azure/<your_deployment_name>", 
  messages = [{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
)




from litellm import completion

response = completion(
            model="ollama/llama2", 
            messages = [{ "content": "Hello, how are you?","role": "user"}], 
            api_base="http://localhost:11434",
            stream=True,
)




from litellm import completion
import os

## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" 

response = completion(
  model="openrouter/google/palm-2-chat-bison", 
  messages = [{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
)





EXCEPTION HANDLING

LiteLLM maps exceptions across all supported providers to the OpenAI exceptions.
All our exceptions inherit from OpenAI's exception types, so any error-handling
you have for that, should work out of the box with LiteLLM.

from openai.error import OpenAIError
from litellm import completion

os.environ["ANTHROPIC_API_KEY"] = "bad-key"
try: 
    # some code 
    completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
except OpenAIError as e:
    print(e)





LOGGING OBSERVABILITY - LOG LLM INPUT/OUTPUT (DOCS)

LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor,
Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])





TRACK COSTS, USAGE, LATENCY FOR STREAMING

Use a callback function for this - more info on custom callbacks:
https://docs.litellm.ai/docs/observability/custom_callback

import litellm

# track_cost_callback 
def track_cost_callback(
    kwargs,                 # kwargs to completion
    completion_response,    # response from completion
    start_time, end_time    # start/end time
):
    try:
      response_cost = kwargs.get("response_cost", 0)
      print("streaming response_cost", response_cost)
    except:
        pass
# set callback 
litellm.success_callback = [track_cost_callback] # set custom callback function

# litellm.completion() call
response = completion(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user",
            "content": "Hi 👋 - i'm openai"
        }
    ],
    stream=True
)





OPENAI PROXY

Track spend across multiple projects/people



The proxy provides:

 1. Hooks for auth
 2. Hooks for logging
 3. Cost tracking
 4. Rate Limiting


📖 PROXY ENDPOINTS - SWAGGER DOCS


QUICK START PROXY - CLI

pip install 'litellm[proxy]'




STEP 1: START LITELLM PROXY

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000




STEP 2: MAKE CHATCOMPLETIONS REQUEST TO PROXY

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)





MORE DETAILS

 * exception mapping
 * retries + model fallbacks for completion()
 * proxy virtual keys & spend management

 * Call 100+ LLMs using the same Input/Output Format
 * Basic usage
 * Streaming
 * Exception handling
 * Logging Observability - Log LLM Input/Output (Docs)
 * Track Costs, Usage, Latency for streaming
 * OpenAI Proxy
   * 📖 Proxy Endpoints - Swagger Docs
   * Quick Start Proxy - CLI
 * More details

Docs
 * Tutorial

Community
 * Discord
 * Twitter

More
 * GitHub

Copyright © 2024 liteLLM