docs.argilla.io
Open in
urlscan Pro
104.17.33.82
Public Scan
Submitted URL: http://docs.argilla.io/
Effective URL: https://docs.argilla.io/en/latest/
Submission: On March 28 via api from US — Scanned from DE
Effective URL: https://docs.argilla.io/en/latest/
Submission: On March 28 via api from US — Scanned from DE
Form analysis
3 forms found in the DOMGET search.html
<form class="header__search__container" method="get" action="search.html" role="search">
<input class="header__search__input" placeholder="Search" name="q" aria-label=" Search">
<input type="hidden" name="check_keywords" value="yes">
<input type="hidden" name="area" value="default">
<label class="close-icon" for="__search">
<div class="visually-hidden">Hide search</div>
</label>
</form>
GET search.html
<form class="sidebar-search-container" method="get" action="search.html" role="search">
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
<input type="hidden" name="check_keywords" value="yes">
<input type="hidden" name="area" value="default">
</form>
GET //readthedocs.org/projects/argilla-docs/search/
<form id="flyout-search-form" class="wy-form" target="_blank" action="//readthedocs.org/projects/argilla-docs/search/" method="get">
<input type="text" name="q" aria-label="Dokumente durchsuchen" placeholder="Dokumente durchsuchen">
</form>
Text Content
Contents Menu Expand Light mode Dark mode Auto light/dark mode Hide navigation sidebar Hide table of contents sidebar Hide search Toggle site navigation sidebar Toggle search Hide search Toggle Light / Dark / Auto color theme Join Star3,046 Getting Started * What is Argilla? * ๐ Quickstart Toggle navigation of ๐ Quickstart * Installation * Workflow Feedback Dataset * Workflow of Other Datasets * ๐ผ Cheatsheet * ๐ง Installation Toggle navigation of ๐ง Installation * Python * Docker * Docker Quickstart * Docker-compose * Cloud Providers and Kubernetes * Hugging Face Spaces * Google Colab * โ๏ธ Configuration Toggle navigation of โ๏ธ Configuration * Elasticsearch * Server configuration * User Management * Workspace and Dataset Management * Database Migrations * Image Support Conceptual Guides * Argilla concepts * Data collection for LLMs Toggle navigation of Data collection for LLMs * Collecting RLHF data * Collecting demonstration data * Collecting comparison data Practical Guides * ๐บ๏ธ Practical guides overview * ๐ง Choose a dataset type * ๐งโ๐ป Create and update a dataset Toggle navigation of ๐งโ๐ป Create and update a dataset * โบ๏ธ Add and update records * ๐พ Work with metadata * ๐ซ Work with vectors * ๐ค Work with suggestions and responses * ๐๏ธ Assign records to your team * ๐ Filter and query datasets * โ๏ธ Annotate a dataset * ๐ Simplify annotation with machine feedback workflows Toggle navigation of ๐ Simplify annotation with machine feedback workflows * ๐งโ๐ซ Active Learning * ๐ฎ Weak Supervision * ๐ฆ Semantic Search * โฒ๏ธ Job Scheduling and Callbacks * ๐ Add Text Descriptives as Metadata * ๐ Collect responses and metrics * ๐ฅ Export a dataset * ๐ฆพ Fine-tune LLMs and other language models Tutorials and Integrations * Tutorials * Integrations Toggle navigation of Integrations * langchain: Monitoring LLMs in apps, chains, and agents and tools * unstructured: Large-scale document processing for LLMs * fastapi: Monitor NLP models with ArgillaLogHTTPMiddleware * textdescriptives: Add basic descriptive features as Metadata * sentence-transformers: Add semantic vectors to your dataset Reference * Python Toggle navigation of Python * Client * Metrics * Labeling * Training * Monitoring * Listeners * Users * Workspaces * Annotation metrics * CLI * Argilla UI Toggle navigation of Argilla UI * Pages * Features * Notebooks Toggle navigation of Notebooks * ๐ Backup and version Argilla Datasets using DVC * ๐ Run Argilla with a Transformer in an active learning loop and a free GPU in your browser * ๐พ Monitor FastAPI model endpoints * ๐งธ Using LLMs for Text Classification and Summarization Suggestions with spacy-llm * ๐บ๏ธ Add bias-equality features to datasets with disaggregators * ๐ก Build and evaluate a zero-shot sentiment classifier with GPT-3 * ๐จ Label data with semantic search and Sentence Transformers * ๐ธ Bulk Labeling Multimodal Data * ๐งฑ Augment weak supervision rules with Sentence Transformers * ๐ซ Zero-shot and few-shot classification with SetFit * ๐ Multi-label text classification with weak supervision * ๐ฐ Train a text classifier with weak supervision * ๐๏ธ Assign records to your annotation team * ๐ฉน Delete labels from a Token or Text Classification dataset * ๐ซ Evaluate a zero-shot NER with Flair * ๐ญ Train a NER model with skweak * ๐ซ Explore and analyze spaCy NER predictions * ๐ Using LLMs for Few-Shot Token Classification Suggestions with spacy-llm * ๐ง Find label errors with cleanlab * ๐ฅ Compare Text Classification Models * ๐ต๏ธโโ๏ธ Analyze predictions with explainability methods * ๐งผ Clean labels using your modelโs loss * ๐ค Fine-tunning a NER model with BERT for Beginners * Text classification active learning with classy-classification * ๐ค Text Classification active learning with ModAL * ๐คฏ Few-shot classification with SetFit * ๐ค Train a sentiment classifier with SetFit * ๐ Text Classification: Active Learning with small-text * ๐ท๏ธ Fine-tune a sentiment classifier with your own data * ๐ธ๏ธ Train a summarization model with Unstructured and Transformers * Telemetry Community * Slack * Github * Developer Documentation * Contributor Documentation * Migration from Rubrix ย v: latest ย v: latest Versionen latest v1.26.0 v1.25.0 v1.24.0 v1.23.0 v1.22.0 v1.21.0 v1.20.0 v1.19.0 v1.18.0 v1.17.0 v1.16.0 v1.15.0 v1.14.0 v1.13.0 v1.12.0 v1.11.0 v1.10.0 v1.9.0 v1.8.0 v1.7.0 v1.6.0 v1.5.0 v1.4.0 v1.3.0 v1.2.0 v1.1.0 develop Auf Read the Docs Projektstartseite Erstellungsprozesse Downloads Auf GitHub Ansehen Bearbeiten Suche -------------------------------------------------------------------------------- Bereitgestellt von Read the Docs ยท Datenschutz-Bestimmungen Back to top Toggle Light / Dark / Auto color theme Toggle table of contents sidebar Join Star3,046 WHAT IS ARGILLA?# Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring. Join Follow on LinkedIn Follow on Twitter Star3,046 ๐ ABOUT THE DOCS# Section Goal ๐ Quickstart Install Argilla and end-to-end toy examples ๐ผ Cheatsheet Brief code snippets for our main functionalities ๐ง Installation Everything deployment: Docker, Kubernetes, Cloud and way more โ๏ธ Configuration User management and deployment tweaking ๐ฅ Concepts about LLMs Generative AI, ChatGPT and friends ๐ฆฎ Practical Guides Conceptual overview of our main functionalities ๐งโโ๏ธ Tutorials Specific applied end-to-end examples ๐ท๏ธ References Itemized information and API docs ๐๏ธ Community Everything about for developers and contributing ๐บ๏ธ Roadmap Our future plans ๐ ๏ธ PROJECT ARCHITECTURE# Argilla is built on 5 core components: * Python SDK: A Python SDK which is installable with pip install argilla. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration, and annotation workflows. * FastAPI Server: The core of Argilla is a Python FastAPI server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data. * Relational Database: A relational database to store the metadata of the records and the annotations. SQLite is used as the default built-in option and is deployed separately with the Argilla Server but a separate PostgreSQL can be used too. * Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and AWS OpenSearch and they can be deployed as separate Docker images. * Vue.js UI: A web application to visualize and annotate your data, users, and teams. It is built with Vue.js and is directly deployed alongside the Argilla Server within our Argilla Docker image. ๐ PRINCIPLES# * Open: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred libraries without implementing any specific interface. * End-to-end: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to iterate as much as you need. * User and Developer Experience: The key to sustainable NLP solutions is to make it easier for everyone to contribute to projects. Domain experts should feel comfortable interpreting and annotating data. Data scientists should feel free to experiment and iterate. Engineers should feel in control of data pipelines. Argilla optimizes the experience for these core users to make your teams more productive. * Beyond hand-labeling: Classical hand-labeling workflows are costly and inefficient, but having humans in the loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision in novel data annotation workflows**. โ FAQ# What is Argilla? Argilla is an open-source data curation platform, designed to enhance the development of both small and large language models (LLMs). Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring. In fact, the inspiration behind the name โArgillaโ comes from the word for โclayโ, in Latin, Italian and even in Catalan. And just as clay has been a fundamental medium for human creativity and tool-making throughout history, we view data as the essential material for sculpting and refining models. Does Argilla train models? Argilla does not train models but offers tools and integrations to help you do so. With Argilla, you can easily load data and train models straightforward using a feature we call the ArgillaTrainer. The ArgillaTrainer acts as a bridge to various popular NLP libraries. It simplifies the training process by offering an easy-to-understand interface for many NLP tasks using default pre-set settings without the need of converting data from Argillaโs format. You can find more information about training models with Argilla here. What is the difference between old datasets and the FeedbackDataset? The FeedbackDataset stands out for its versatility and adaptability, designed to support a wider range of NLP tasks including those centered on large language models. In contrast, older datasets, while more feature-rich in specific areas, are tailored to singular NLP tasks. However, in Argilla 2.0, the intention is to phase out the older datasets in favor of the FeedbackDataset. For a more detailed explanation, please refer to this guide. Can Argilla only be used for LLMs? No, Argilla is a versatile tool suitable for a wide range of NLP tasks. However, we emphasize the integration with small and large language models (LLMs), reflecting confidence in the significant role that they will play in the future of NLP. In this page, you can find a list of supported tasks. Does Argilla provide annotation workforces? Currently, we already have partnerships with annotation providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting here or contact us via email. Does Argilla cost money? No, Argilla is an open-source platform. And we plan to keep Argilla free forever. However, we do offer a commercial version of Argilla called Argilla Cloud. What is the difference between Argilla open source and Argilla Cloud? Argilla Cloud is the counterpart to our open-source platform, offering a Software as a Service (SaaS) model, and doesnโt add extra features beyond what is available in the open-source version. The main difference is its cloud-hosting, which caters especially to large teams requiring features that arenโt typically necessary for individual practitioners or small businesses. So, Argilla Cloud is a SAS plus virtual private cloud deployment, with added features specifically related to the cloud. For those interested in the different plans available under Argilla Cloud, you can find detailed information on our website. How does Argilla differ from competitors like Snorkel, Prodigy and Scale? Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argillaโs core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors. Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like SpaCy and Hugging Face makes it exceptionally user-friendly and accessible. Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a component within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing platform from the outset. What is Argilla currently working on? We are continuously working on improving Argillaโs features and usability, focusing now concentrating on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects here. ๐ค CONTRIBUTE# We love contributors and have launched a collaboration with JustDiggit to hand out our very own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we have created our developer and contributor docs. Additionally, you can always schedule a meeting with our Developer Advocacy team so they can get you up to speed. ๐ฅ CONTRIBUTORS# ๐๏ธ COMMUNITY# ๐โโ๏ธ Join the Argilla community on Slack and get direct support from the community. โญ Argilla Github repo to stay updated about new releases and tutorials. ๐ Weโve just printed stickers! Would you like some? Order stickers for free. ๐บ๏ธ ROADMAP# We continuously work on updating our plans and our roadmap and we love to discuss those with our community. Feel encouraged to participate. Next ๐ Quickstart Copyright ยฉ 2024, Argilla.io Made with Sphinx and @pradyunsg's Furo Contents * What is Argilla? * ๐ About The Docs * ๐ ๏ธ Project Architecture * ๐ Principles * โ FAQ * ๐ค Contribute * ๐ฅ Contributors * ๐๏ธ Community * ๐บ๏ธ Roadmap Signup Here For Our Community Meetup