docs.litellm.ai
Open in
urlscan Pro
76.76.21.241
Public Scan
Submitted URL: http://docs.litellm.ai/
Effective URL: https://docs.litellm.ai/
Submission: On March 08 via api from US — Scanned from DE
Effective URL: https://docs.litellm.ai/
Submission: On March 08 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Skip to main content 🚅 LiteLLMDocsEnterprise GitHubDiscord I'm Confused LITELLM - GETTING STARTED https://github.com/BerriAI/litellm CALL 100+ LLMS USING THE SAME INPUT/OUTPUT FORMAT * Translate inputs to provider's completion, embedding, and image_generation endpoints * Consistent output, text responses will always be available at ['choices'][0]['message']['content'] * Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router * Track spend & set budgets per project OpenAI Proxy Server BASIC USAGE pip install litellm * OpenAI * Anthropic * VertexAI * HuggingFace * Azure OpenAI * Ollama * Openrouter from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}] ) from litellm import completion import os ## set ENV variables os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}] ) from litellm import completion import os # auth: run 'gcloud auth application-default' os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}] ) from litellm import completion import os os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud" ) print(response) from litellm import completion import os ## set ENV variables os.environ["AZURE_API_KEY"] = "" os.environ["AZURE_API_BASE"] = "" os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( "azure/<your_deployment_name>", messages = [{ "content": "Hello, how are you?","role": "user"}] ) from litellm import completion response = completion( model="ollama/llama2", messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434" ) from litellm import completion import os ## set ENV variables os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], ) STREAMING Set stream=True in the completion args. * OpenAI * Anthropic * VertexAI * HuggingFace * Azure OpenAI * Ollama * Openrouter from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) from litellm import completion import os ## set ENV variables os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) from litellm import completion import os # auth: run 'gcloud auth application-default' os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) from litellm import completion import os os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud", stream=True, ) print(response) from litellm import completion import os ## set ENV variables os.environ["AZURE_API_KEY"] = "" os.environ["AZURE_API_BASE"] = "" os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( "azure/<your_deployment_name>", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) from litellm import completion response = completion( model="ollama/llama2", messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434", stream=True, ) from litellm import completion import os ## set ENV variables os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) EXCEPTION HANDLING LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. from openai.error import OpenAIError from litellm import completion os.environ["ANTHROPIC_API_KEY"] = "bad-key" try: # some code completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}]) except OpenAIError as e: print(e) LOGGING OBSERVABILITY - LOG LLM INPUT/OUTPUT (DOCS) LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack from litellm import completion ## set env variables for logging tools os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" os.environ["OPENAI_API_KEY"] # set callbacks litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) TRACK COSTS, USAGE, LATENCY FOR STREAMING Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback import litellm # track_cost_callback def track_cost_callback( kwargs, # kwargs to completion completion_response, # response from completion start_time, end_time # start/end time ): try: response_cost = kwargs.get("response_cost", 0) print("streaming response_cost", response_cost) except: pass # set callback litellm.success_callback = [track_cost_callback] # set custom callback function # litellm.completion() call response = completion( model="gpt-3.5-turbo", messages=[ { "role": "user", "content": "Hi 👋 - i'm openai" } ], stream=True ) OPENAI PROXY Track spend across multiple projects/people The proxy provides: 1. Hooks for auth 2. Hooks for logging 3. Cost tracking 4. Rate Limiting 📖 PROXY ENDPOINTS - SWAGGER DOCS QUICK START PROXY - CLI pip install 'litellm[proxy]' STEP 1: START LITELLM PROXY $ litellm --model huggingface/bigcode/starcoder #INFO: Proxy running on http://0.0.0.0:8000 STEP 2: MAKE CHATCOMPLETIONS REQUEST TO PROXY import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response) MORE DETAILS * exception mapping * retries + model fallbacks for completion() * proxy virtual keys & spend management * Call 100+ LLMs using the same Input/Output Format * Basic usage * Streaming * Exception handling * Logging Observability - Log LLM Input/Output (Docs) * Track Costs, Usage, Latency for streaming * OpenAI Proxy * 📖 Proxy Endpoints - Swagger Docs * Quick Start Proxy - CLI * More details Docs * Tutorial Community * Discord * Twitter More * GitHub Copyright © 2024 liteLLM