docs.anote.ai
Open in
urlscan Pro
143.204.98.85
Public Scan
URL:
https://docs.anote.ai/privategpt/privategpt.html
Submission: On September 25 via api from US — Scanned from DE
Submission: On September 25 via api from US — Scanned from DE
Form analysis
2 forms found in the DOM<form class="md-header__option" data-md-component="palette">
<input aria-label="Switch to dark mode" class="md-option" data-md-color-accent="light-blue" data-md-color-media="" data-md-color-primary="black" data-md-color-scheme="default" id="__palette_1" name="__palette" type="radio">
<label class="md-header__button md-icon" for="__palette_2" title="Switch to dark mode">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path d="M17 6H7c-3.31 0-6 2.69-6 6s2.69 6 6 6h10c3.31 0 6-2.69 6-6s-2.69-6-6-6zm0 10H7c-2.21 0-4-1.79-4-4s1.79-4 4-4h10c2.21 0 4 1.79 4 4s-1.79 4-4 4zM7 9c-1.66 0-3 1.34-3 3s1.34 3 3 3 3-1.34 3-3-1.34-3-3-3z"></path>
</svg>
</label>
<input aria-label="Switch to light mode" class="md-option" data-md-color-accent="lime" data-md-color-media="" data-md-color-primary="black" data-md-color-scheme="slate" id="__palette_2" name="__palette" type="radio">
<label class="md-header__button md-icon" for="__palette_1" hidden="" title="Switch to light mode">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path d="M17 7H7a5 5 0 0 0-5 5 5 5 0 0 0 5 5h10a5 5 0 0 0 5-5 5 5 0 0 0-5-5m0 8a3 3 0 0 1-3-3 3 3 0 0 1 3-3 3 3 0 0 1 3 3 3 3 0 0 1-3 3Z"></path>
</svg>
</label>
</form>
Name: search —
<form class="md-search__form" name="search">
<input aria-label="Search" autocapitalize="off" autocomplete="off" autocorrect="off" class="md-search__input" data-md-component="search-query" name="query" placeholder="Search" required="" spellcheck="false" type="text">
<label class="md-search__icon md-icon" for="__search">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5Z"></path>
</svg>
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12Z"></path>
</svg>
</label>
<nav aria-label="Search" class="md-search__options">
<button aria-label="Clear" class="md-search__icon md-icon" tabindex="-1" title="Clear" type="reset">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path d="M19 6.41 17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12 19 6.41Z"></path>
</svg>
</button>
</nav>
<div class="md-search__suggest" data-md-component="search-suggest"></div>
</form>
Text Content
Skip to content Anote PrivateGPT Type to start searching Anote * Home * Getting Started Getting Started * Overview * Upload Upload * Create Dataset * Upload Unstructured * Upload Structured * Connect to Datasets * Scrape from Websites * Select from Hugging Face * Load Existing Datasets * Customize * Annotate * Download * About * Tutorials Tutorials * Classifying Violent Tweets * Identifying Mislabeled Emotions * Question and Answering for Legal Contracts * Summarizing Medical Charts * Capabilities Capabilities * Text Classification Text Classification * Sentiment Analysis * Document Labeling * Single Layer Classification * Hierarchical Classification * Multi Column Hierarchies * Active Classification * Programmatic Labeling Programmatic Labeling * Labeling Functions * Tagging Keywords * Tagging Entities * Tagging Part Of Speech * Tagging Operations * Weak Supervision * LF Limitations * Data Validation Data Validation * Identifying Label Errors * Mislabels Classification * Mislabels Prompting * Structured Dashboard * Named Entity Recognition Named Entity Recognition * Named Entity Recognition * Active NER * Advanced NER * Multi Person Collaboration Multi Person Collaboration * Multi-Annotator Collaboration * Annotator History * Review * Annotator Metrics * Prompting Prompting * Active Prompting * Summarization * Semi Structured Prompting * Question and Answering * Data Engineering Data Engineering * Decomposition * Preprocessing * Reintegration * Few Shot Learning Few Shot Learning * What is Few Shot Learning * Few Shot Classification * Few Shot NER * Few Shot Prompting * Few Shot Learning Limitations * Human Centered AI Human Centered AI * Model Centric AI * Data Centric AI * Scalability Bottleneck * Human Centric AI * Ensemble Model * Impact * AI and Labeling Primer AI and Labeling Primer * What is AI * Accessible AI * State of Data Labeling * Importance Data Labeling * More Information * Setting up Python * Python Basics * Anote APIs Anote APIs * APIs * Anote PrivateGPT Anote PrivateGPT * PrivateGPT PrivateGPT Table of contents * Key Features * How Does Anote PrivateGPT Works? * Large Language Model and Our Backend * Anote Newsletter Anote Newsletter * Newsletter * Anote Sababa Anote Sababa * Sababa * Anote 10-Ks Anote 10-Ks * 10-Ks Background * 10-Ks Questions * 10-Ks Decomposition * 10-Ks Feedback * 10-Ks Solution * 10-Ks Impact Table of contents * Key Features * How Does Anote PrivateGPT Works? * Large Language Model and Our Backend INTRODUCTION TO ANOTE PRIVATEGPT Anote PrivateGPT enables enterprises to leverage generative AI and privacy preserving LLMs to chat with your documents while keeping your private and data secure. Anote's Private GPT provides enterprises with their own AI assistant, acting as a chief artificial intelligence officer for a specific organization. Members of the organization are able to ask any question about the organization, and private GPT can answer any query based on your organizations data, while keeping the data local, on premise, private and secure. Members of the organization can chat with their documents to obtain relevant insights, and are shown citations of where the answers specifically come from within documents in their enterprise. For enterprises, this can be viewed as an on premise GPT-for-your-business, where enterprises have their own GPT, catered specifically for their needs. At the same time, enterprises have no risk of sharing confidential and private data, as data is kept on premise, local, private and secure. KEY FEATURES * Local Environment: Anote PrivateGPT operates entirely within the user's local environment, providing a secure and private space for document interactions. * Document Storage: User documents are stored locally in a Chroma vector store, ensuring that they remain securely stored on the user's device or local storage infrastructure. * Privacy-Preserving Retrieval: Anote incorporates a privacy-preserving retrieval component that efficiently searches and retrieves relevant documents based on user queries. The retrieval process takes place locally, without transmitting sensitive information to external servers. * Privacy-Aware Language Models: Anote employs privacy-aware language models like LlamaCpp and GPT4All, which operate locally on the user's device or local infrastructure. These models preserve user privacy by avoiding the transmission of user queries or documents to external servers. * Query-Response Privacy: Anote ensures that user queries and responses are kept private. User queries are processed locally, and the system only reveals relevant answers without disclosing the underlying content or details of the user's documents. * Secure Execution: Anote implements security measures to protect the integrity and confidentiality of user data during execution. This includes secure execution environments, encryption of sensitive data, secure APIs and interfaces, and adherence to best practices for securing local infrastructure. * User Control and Consent: Anote prioritizes user control and consent. Users have full control over their documents and data, including the ability to choose which documents to include, initiate queries, and receive answers. Interactions with the system are based on explicit user consent and preferences. By combining these principles, Anote PrivateGPT empowers users to chat with their documents in a privacy-preserving way using the capabilities of the generative AI models. The focus on local execution, document storage, privacy-aware models, secure infrastructure, and user control ensures that your data remains confidential, secure and on premise. HOW DOES ANOTE PRIVATEGPT WORKS? At Anote, you can easily upload your documents in various formats, including pdf, email, html, docx, and csv, providing you with flexibility and convenience when working with a wide range of document types. Once you've uploaded your documents, you can efficiently manage your document base and add more documents as needed. This streamlined document management system ensures that working with multiple documents becomes a breeze. You can initiate a chat session in our user-friendly interface to interact with your uploaded documents. Just like texting, you can simply type your prompts, and our system will promptly provide you with accurate and relevant responses. Our user interface not only presents answers to your main prompts but also includes references to the specific sections in the source documents where the information was found. By clicking on the eye button, you can easily navigate to the relevant document chunks and view the exact location of the retrieved information. Once you've finished your chat session, you can access informative dashboards that display document related analytics for your organization. These dashboards provide you with insights into the number of analyzed documents, common keywords found within the documents, and simple graphs that offer a visual representation of the data. LARGE LANGUAGE MODEL AND OUR BACKEND The previous sections show how Anote PrivateGPT would work from user perspective. The flow chart below illustrates how Anote PrivateGPT works in the backend. In general, we utilize two important pipelines: data ingestion and query processing. Data ingestion involves downloading user input documents, dividing them into chunks, and securely storing the embeddings of the chunks in a local vector database using Chroma. Anote PrivateGPT supports various formats, including .csv, .doc, .docx, .enex, .eml, .epub, .html, .md, .odt, .pdf, .ppt, .pptx, and .txt. The query processing pipeline involves processing user queries and leveraging the local Chroma vector database to retrieve relevant document chunks. The retrieval process is followed by answer generation using powerful language models like LlamaCpp and GPT4All. The step-by-step process involved in Anote PrivateGPT's pipeline is as follows: def main(): # Parse the command line arguments args = parse_arguments() embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name) db = Chroma( persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS ) retriever = db.as_retriever(search_kwargs={"k": target_source_chunks}) # activate/deactivate the streaming StdOut callback for LLMs callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()] # Prepare the LLM match model_type: case "LlamaCpp": llm = LlamaCpp( model_path=model_path, n_ctx=model_n_ctx, n_batch=model_n_batch, callbacks=callbacks, verbose=False ) case "GPT4All": llm = GPT4All( model=model_path, n_ctx=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False ) qa = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=not args.hide_source ) 1. Embeddings Initialization: An instance of the HuggingFaceEmbeddings class is initialized to generate embedding vectors for the documents and capture their semantic representations. 2. Vector Store Creation: An instance of the Chroma class is created to represent the vector store where document embeddings are securely stored. Chroma provides efficient retrieval capabilities for subsequent steps. 3. Retrieval Setup: The code calls the as_retriever() method on the Chroma vector store object (db). This method creates and returns a retriever object. Additional arguments can be provided using the search_kwargs parameter. In this case, {"k": target_source_chunks} specifies the number of top-ranked documents to retrieve. The value of target_source_chunks determines the desired number of retrieved documents. 4. Language Model Selection: The code checks the model_type to determine the appropriate language model (LLM) for answering the questions. Depending on the model_type, either an instance of LlamaCpp or GPT4All is created. These LLMs are distinct implementations of language models that will be utilized to generate answers based on the retrieved documents. 5. RetrievalQA Instance: Finally, an instance of the RetrievalQA class is created, combining the retriever object, the selected LLM instance, and other necessary parameters. This class orchestrates the question-answering process, leveraging the retriever's retrieved documents and the LLM's language understanding and generation capabilities. Made with Material for MkDocs