docs.argilla.io Open in urlscan Pro
104.17.33.82 Public Scan

Back to summary

Submitted URL:
http://docs.argilla.io/
Effective URL:
https://docs.argilla.io/en/latest/
Submission: On March 28 via api (March 28th 2024, 12:06:38 am UTC) from US — Scanned from DE

Form analysis
3 forms found in the DOM

GET search.html

<form class="header__search__container" method="get" action="search.html" role="search">
  <input class="header__search__input" placeholder="Search" name="q" aria-label=" Search">
  <input type="hidden" name="check_keywords" value="yes">
  <input type="hidden" name="area" value="default">
  <label class="close-icon" for="__search">
    <div class="visually-hidden">Hide search</div>
  </label>
</form>

GET search.html

<form class="sidebar-search-container" method="get" action="search.html" role="search">
  <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
  <input type="hidden" name="check_keywords" value="yes">
  <input type="hidden" name="area" value="default">
</form>

GET //readthedocs.org/projects/argilla-docs/search/

<form id="flyout-search-form" class="wy-form" target="_blank" action="//readthedocs.org/projects/argilla-docs/search/" method="get">
  <input type="text" name="q" aria-label="Dokumente durchsuchen" placeholder="Dokumente durchsuchen">
</form>

Text Content

Contents Menu Expand Light mode Dark mode Auto light/dark mode
Hide navigation sidebar
Hide table of contents sidebar
Hide search
Toggle site navigation sidebar

Toggle search
Hide search

Toggle Light / Dark / Auto color theme
Join
Star3,046



Getting Started

 * What is Argilla?
 * 🚀 Quickstart
   Toggle navigation of 🚀 Quickstart
   * Installation
   * Workflow Feedback Dataset
   * Workflow of Other Datasets
 * 🎼 Cheatsheet
 * 🔧 Installation
   Toggle navigation of 🔧 Installation
   * Python
   * Docker
   * Docker Quickstart
   * Docker-compose
   * Cloud Providers and Kubernetes
   * Hugging Face Spaces
   * Google Colab
 * ⚙️ Configuration
   Toggle navigation of ⚙️ Configuration
   * Elasticsearch
   * Server configuration
   * User Management
   * Workspace and Dataset Management
   * Database Migrations
   * Image Support

Conceptual Guides

 * Argilla concepts
 * Data collection for LLMs
   Toggle navigation of Data collection for LLMs
   * Collecting RLHF data
   * Collecting demonstration data
   * Collecting comparison data

Practical Guides

 * 🗺️ Practical guides overview
 * 🧐 Choose a dataset type
 * 🧑‍💻 Create and update a dataset
   Toggle navigation of 🧑‍💻 Create and update a dataset
   * ⏺️ Add and update records
   * 💾 Work with metadata
   * 🎫 Work with vectors
   * 🤔 Work with suggestions and responses
 * 🗂️ Assign records to your team
 * 🔎 Filter and query datasets
 * ✍️ Annotate a dataset
 * 🌊 Simplify annotation with machine feedback workflows
   Toggle navigation of 🌊 Simplify annotation with machine feedback workflows
   * 🧑‍🏫 Active Learning
   * 👮 Weak Supervision
   * 🔦 Semantic Search
   * ⏲️ Job Scheduling and Callbacks
   * 📇 Add Text Descriptives as Metadata
 * 📊 Collect responses and metrics
 * 📥 Export a dataset
 * 🦾 Fine-tune LLMs and other language models

Tutorials and Integrations

 * Tutorials
 * Integrations
   Toggle navigation of Integrations
   * langchain: Monitoring LLMs in apps, chains, and agents and tools
   * unstructured: Large-scale document processing for LLMs
   * fastapi: Monitor NLP models with ArgillaLogHTTPMiddleware
   * textdescriptives: Add basic descriptive features as Metadata
   * sentence-transformers: Add semantic vectors to your dataset

Reference

 * Python
   Toggle navigation of Python
   * Client
   * Metrics
   * Labeling
   * Training
   * Monitoring
   * Listeners
   * Users
   * Workspaces
   * Annotation metrics
 * CLI
 * Argilla UI
   Toggle navigation of Argilla UI
   * Pages
   * Features
 * Notebooks
   Toggle navigation of Notebooks
   * 🔐 Backup and version Argilla Datasets using DVC
   * 🚀 Run Argilla with a Transformer in an active learning loop and a free GPU
     in your browser
   * 💾 Monitor FastAPI model endpoints
   * 🧸 Using LLMs for Text Classification and Summarization Suggestions with
     spacy-llm
   * 🗺️ Add bias-equality features to datasets with disaggregators
   * 💡 Build and evaluate a zero-shot sentiment classifier with GPT-3
   * 💨 Label data with semantic search and Sentence Transformers
   * 📸 Bulk Labeling Multimodal Data
   * 🧱 Augment weak supervision rules with Sentence Transformers
   * 🔫 Zero-shot and few-shot classification with SetFit
   * 🗂 Multi-label text classification with weak supervision
   * 📰 Train a text classifier with weak supervision
   * 🗂️ Assign records to your annotation team
   * 🩹 Delete labels from a Token or Text Classification dataset
   * 🔫 Evaluate a zero-shot NER with Flair
   * 🐭 Train a NER model with skweak
   * 💫 Explore and analyze spaCy NER predictions
   * 🔗 Using LLMs for Few-Shot Token Classification Suggestions with spacy-llm
   * 🧐 Find label errors with cleanlab
   * 🥇 Compare Text Classification Models
   * 🕵️‍♀️ Analyze predictions with explainability methods
   * 🧼 Clean labels using your model’s loss
   * 🤔 Fine-tunning a NER model with BERT for Beginners
   * Text classification active learning with classy-classification
   * 🤔 Text Classification active learning with ModAL
   * 🤯 Few-shot classification with SetFit
   * 🤗 Train a sentiment classifier with SetFit
   * 👂 Text Classification: Active Learning with small-text
   * 🏷️ Fine-tune a sentiment classifier with your own data
   * 🕸️ Train a summarization model with Unstructured and Transformers
 * Telemetry

Community

 * Slack
 * Github
 * Developer Documentation
 * Contributor Documentation
 * Migration from Rubrix


  v: latest
  v: latest
Versionen latest v1.26.0 v1.25.0 v1.24.0 v1.23.0 v1.22.0 v1.21.0 v1.20.0 v1.19.0
v1.18.0 v1.17.0 v1.16.0 v1.15.0 v1.14.0 v1.13.0 v1.12.0 v1.11.0 v1.10.0 v1.9.0
v1.8.0 v1.7.0 v1.6.0 v1.5.0 v1.4.0 v1.3.0 v1.2.0 v1.1.0 develop Auf Read the
Docs Projektstartseite Erstellungsprozesse Downloads Auf GitHub Ansehen
Bearbeiten Suche


--------------------------------------------------------------------------------

Bereitgestellt von Read the Docs · Datenschutz-Bestimmungen
Back to top
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
Join
Star3,046


WHAT IS ARGILLA?#

Argilla is an open-source data curation platform for LLMs. Using Argilla,
everyone can build robust language models through faster data curation using
both human and machine feedback. We provide support for each step in the MLOps
cycle, from data labeling to model monitoring.

Join Follow on LinkedIn Follow on Twitter
Star3,046



📄 ABOUT THE DOCS#

Section

Goal

🚀 Quickstart

Install Argilla and end-to-end toy examples

🎼 Cheatsheet

Brief code snippets for our main functionalities

🔧 Installation

Everything deployment: Docker, Kubernetes, Cloud and way more

⚙️ Configuration

User management and deployment tweaking

💥 Concepts about LLMs

Generative AI, ChatGPT and friends

🦮 Practical Guides

Conceptual overview of our main functionalities

🧗‍♀️ Tutorials

Specific applied end-to-end examples

🏷️ References

Itemized information and API docs

🏘️ Community

Everything about for developers and contributing

🗺️ Roadmap

Our future plans


🛠️ PROJECT ARCHITECTURE#

Argilla is built on 5 core components:

 * Python SDK: A Python SDK which is installable with pip install argilla. To
   interact with the Argilla Server and the Argilla UI. It provides an API to
   manage the data, configuration, and annotation workflows.

 * FastAPI Server: The core of Argilla is a Python FastAPI server that manages
   the data, by pre-processing it and storing it in the vector database. Also,
   it stores application information in the relational database. It provides a
   REST API to interact with the data from the Python SDK and the Argilla UI. It
   also provides a web interface to visualize the data.

 * Relational Database: A relational database to store the metadata of the
   records and the annotations. SQLite is used as the default built-in option
   and is deployed separately with the Argilla Server but a separate PostgreSQL
   can be used too.

 * Vector Database: A vector database to store the records data and perform
   scalable vector similarity searches and basic document searches. We currently
   support ElasticSearch and AWS OpenSearch and they can be deployed as separate
   Docker images.

 * Vue.js UI: A web application to visualize and annotate your data, users, and
   teams. It is built with Vue.js and is directly deployed alongside the Argilla
   Server within our Argilla Docker image.


📏 PRINCIPLES#

 * Open: Argilla is free, open-source, and 100% compatible with major NLP
   libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.).
   In fact, you can use and combine your preferred libraries without
   implementing any specific interface.

 * End-to-end: Most annotation tools treat data collection as a one-off activity
   at the beginning of each project. In real-world projects, data collection is
   a key activity of the iterative process of ML model development. Once a model
   goes into production, you want to monitor and analyze its predictions and
   collect more data to improve your model over time. Argilla is designed to
   close this gap, enabling you to iterate as much as you need.

 * User and Developer Experience: The key to sustainable NLP solutions is to
   make it easier for everyone to contribute to projects. Domain experts should
   feel comfortable interpreting and annotating data. Data scientists should
   feel free to experiment and iterate. Engineers should feel in control of data
   pipelines. Argilla optimizes the experience for these core users to make your
   teams more productive.

 * Beyond hand-labeling: Classical hand-labeling workflows are costly and
   inefficient, but having humans in the loop is essential. Easily combine
   hand-labeling with active learning, bulk-labeling, zero-shot models, and weak
   supervision in novel data annotation workflows**.


❔ FAQ#

What is Argilla?



Argilla is an open-source data curation platform, designed to enhance the
development of both small and large language models (LLMs). Using Argilla,
everyone can build robust language models through faster data curation using
both human and machine feedback. We provide support for each step in the MLOps
cycle, from data labeling to model monitoring. In fact, the inspiration behind
the name “Argilla” comes from the word for “clay”, in Latin, Italian and even in
Catalan. And just as clay has been a fundamental medium for human creativity and
tool-making throughout history, we view data as the essential material for
sculpting and refining models.



Does Argilla train models?



Argilla does not train models but offers tools and integrations to help you do
so. With Argilla, you can easily load data and train models straightforward
using a feature we call the ArgillaTrainer. The ArgillaTrainer acts as a bridge
to various popular NLP libraries. It simplifies the training process by offering
an easy-to-understand interface for many NLP tasks using default pre-set
settings without the need of converting data from Argilla’s format. You can find
more information about training models with Argilla here.



What is the difference between old datasets and the FeedbackDataset?



The FeedbackDataset stands out for its versatility and adaptability, designed to
support a wider range of NLP tasks including those centered on large language
models. In contrast, older datasets, while more feature-rich in specific areas,
are tailored to singular NLP tasks. However, in Argilla 2.0, the intention is to
phase out the older datasets in favor of the FeedbackDataset. For a more
detailed explanation, please refer to this guide.



Can Argilla only be used for LLMs?



No, Argilla is a versatile tool suitable for a wide range of NLP tasks. However,
we emphasize the integration with small and large language models (LLMs),
reflecting confidence in the significant role that they will play in the future
of NLP. In this page, you can find a list of supported tasks.



Does Argilla provide annotation workforces?



Currently, we already have partnerships with annotation providers that ensure
ethical practices and secure work environments. Feel free to schedule a meeting
here or contact us via email.



Does Argilla cost money?



No, Argilla is an open-source platform. And we plan to keep Argilla free
forever. However, we do offer a commercial version of Argilla called Argilla
Cloud.



What is the difference between Argilla open source and Argilla Cloud?



Argilla Cloud is the counterpart to our open-source platform, offering a
Software as a Service (SaaS) model, and doesn’t add extra features beyond what
is available in the open-source version. The main difference is its
cloud-hosting, which caters especially to large teams requiring features that
aren’t typically necessary for individual practitioners or small businesses. So,
Argilla Cloud is a SAS plus virtual private cloud deployment, with added
features specifically related to the cloud. For those interested in the
different plans available under Argilla Cloud, you can find detailed information
on our website.



How does Argilla differ from competitors like Snorkel, Prodigy and Scale?



Argilla distinguishes itself for its focus on specific use cases and
human-in-the-loop approaches. While it does offer programmatic features,
Argilla’s core value lies in actively involving human experts in the
tool-building process, setting it apart from other competitors.

Furthermore, Argilla places particular emphasis on smooth integration with other
tools in the community, particularly within the realms of MLOps and NLP. So, its
compatibility with popular frameworks like SpaCy and Hugging Face makes it
exceptionally user-friendly and accessible.

Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive,
often require a significant commitment. Argilla, on the other hand, works more
as a component within the MLOps ecosystem, allowing users to begin with specific
use cases and then scale up as needed. This flexibility is particularly
beneficial for users and customers who prefer to start small and expand their
applications over time, as opposed to committing to an all-encompassing platform
from the outset.



What is Argilla currently working on?



We are continuously working on improving Argilla’s features and usability,
focusing now concentrating on a three-pronged vision: the development of Argilla
Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our
current projects here.




🤝 CONTRIBUTE#

We love contributors and have launched a collaboration with JustDiggit to hand
out our very own bunds and help the re-greening of sub-Saharan Africa. To help
our community with the creation of contributions, we have created our developer
and contributor docs. Additionally, you can always schedule a meeting with our
Developer Advocacy team so they can get you up to speed.


🥇 CONTRIBUTORS#


🏘️ COMMUNITY#

🙋‍♀️ Join the Argilla community on Slack and get direct support from the
community.

⭐ Argilla Github repo to stay updated about new releases and tutorials.

🎁 We’ve just printed stickers! Would you like some? Order stickers for free.


🗺️ ROADMAP#

We continuously work on updating our plans and our roadmap and we love to
discuss those with our community. Feel encouraged to participate.







Next
🚀 Quickstart
Copyright © 2024, Argilla.io
Made with Sphinx and @pradyunsg's Furo

Contents
 * What is Argilla?
   * 📄 About The Docs
   * 🛠️ Project Architecture
   * 📏 Principles
   * ❔ FAQ
   * 🤝 Contribute
   * 🥇 Contributors
   * 🏘️ Community
   * 🗺️ Roadmap

Signup Here For
Our Community Meetup

docs.argilla.io Open in urlscan Pro 104.17.33.82 Public Scan

Form analysis 3 forms found in the DOM

GET search.html

GET search.html

GET //readthedocs.org/projects/argilla-docs/search/

Text Content

docs.argilla.io Open in urlscan Pro
104.17.33.82 Public Scan

Form analysis
3 forms found in the DOM