docs.anote.ai Open in urlscan Pro
143.204.98.85  Public Scan

URL: https://docs.anote.ai/privategpt/privategpt.html
Submission: On September 25 via api from US — Scanned from DE

Form analysis 2 forms found in the DOM

<form class="md-header__option" data-md-component="palette">
  <input aria-label="Switch to dark mode" class="md-option" data-md-color-accent="light-blue" data-md-color-media="" data-md-color-primary="black" data-md-color-scheme="default" id="__palette_1" name="__palette" type="radio">
  <label class="md-header__button md-icon" for="__palette_2" title="Switch to dark mode">
    <svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
      <path d="M17 6H7c-3.31 0-6 2.69-6 6s2.69 6 6 6h10c3.31 0 6-2.69 6-6s-2.69-6-6-6zm0 10H7c-2.21 0-4-1.79-4-4s1.79-4 4-4h10c2.21 0 4 1.79 4 4s-1.79 4-4 4zM7 9c-1.66 0-3 1.34-3 3s1.34 3 3 3 3-1.34 3-3-1.34-3-3-3z"></path>
    </svg>
  </label>
  <input aria-label="Switch to light mode" class="md-option" data-md-color-accent="lime" data-md-color-media="" data-md-color-primary="black" data-md-color-scheme="slate" id="__palette_2" name="__palette" type="radio">
  <label class="md-header__button md-icon" for="__palette_1" hidden="" title="Switch to light mode">
    <svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
      <path d="M17 7H7a5 5 0 0 0-5 5 5 5 0 0 0 5 5h10a5 5 0 0 0 5-5 5 5 0 0 0-5-5m0 8a3 3 0 0 1-3-3 3 3 0 0 1 3-3 3 3 0 0 1 3 3 3 3 0 0 1-3 3Z"></path>
    </svg>
  </label>
</form>

Name: search

<form class="md-search__form" name="search">
  <input aria-label="Search" autocapitalize="off" autocomplete="off" autocorrect="off" class="md-search__input" data-md-component="search-query" name="query" placeholder="Search" required="" spellcheck="false" type="text">
  <label class="md-search__icon md-icon" for="__search">
    <svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
      <path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5Z"></path>
    </svg>
    <svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
      <path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12Z"></path>
    </svg>
  </label>
  <nav aria-label="Search" class="md-search__options">
    <button aria-label="Clear" class="md-search__icon md-icon" tabindex="-1" title="Clear" type="reset">
      <svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
        <path d="M19 6.41 17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12 19 6.41Z"></path>
      </svg>
    </button>
  </nav>
  <div class="md-search__suggest" data-md-component="search-suggest"></div>
</form>

Text Content

Skip to content

Anote
PrivateGPT

Type to start searching
Anote
 * Home
 * Getting Started
   Getting Started
    * Overview
    * Upload
      Upload
       * Create Dataset
       * Upload Unstructured
       * Upload Structured
       * Connect to Datasets
       * Scrape from Websites
       * Select from Hugging Face
       * Load Existing Datasets
   
    * Customize
    * Annotate
    * Download
    * About

 * Tutorials
   Tutorials
    * Classifying Violent Tweets
    * Identifying Mislabeled Emotions
    * Question and Answering for Legal Contracts
    * Summarizing Medical Charts

 * Capabilities
   Capabilities
    * Text Classification
      Text Classification
       * Sentiment Analysis
       * Document Labeling
       * Single Layer Classification
       * Hierarchical Classification
       * Multi Column Hierarchies
       * Active Classification
   
    * Programmatic Labeling
      Programmatic Labeling
       * Labeling Functions
       * Tagging Keywords
       * Tagging Entities
       * Tagging Part Of Speech
       * Tagging Operations
       * Weak Supervision
       * LF Limitations
   
    * Data Validation
      Data Validation
       * Identifying Label Errors
       * Mislabels Classification
       * Mislabels Prompting
       * Structured Dashboard
   
    * Named Entity Recognition
      Named Entity Recognition
       * Named Entity Recognition
       * Active NER
       * Advanced NER
   
    * Multi Person Collaboration
      Multi Person Collaboration
       * Multi-Annotator Collaboration
       * Annotator History
       * Review
       * Annotator Metrics
   
    * Prompting
      Prompting
       * Active Prompting
       * Summarization
       * Semi Structured Prompting
       * Question and Answering
   
    * Data Engineering
      Data Engineering
       * Decomposition
       * Preprocessing
       * Reintegration
   
    * Few Shot Learning
      Few Shot Learning
       * What is Few Shot Learning
       * Few Shot Classification
       * Few Shot NER
       * Few Shot Prompting
       * Few Shot Learning Limitations
   
    * Human Centered AI
      Human Centered AI
       * Model Centric AI
       * Data Centric AI
       * Scalability Bottleneck
       * Human Centric AI
       * Ensemble Model
       * Impact

 * AI and Labeling Primer
   AI and Labeling Primer
    * What is AI
    * Accessible AI
    * State of Data Labeling
    * Importance Data Labeling
    * More Information
    * Setting up Python
    * Python Basics

 * Anote APIs
   Anote APIs
    * APIs

 * Anote PrivateGPT
   Anote PrivateGPT
    * PrivateGPT PrivateGPT
      Table of contents
       * Key Features
       * How Does Anote PrivateGPT Works?
       * Large Language Model and Our Backend

 * Anote Newsletter
   Anote Newsletter
    * Newsletter

 * Anote Sababa
   Anote Sababa
    * Sababa

 * Anote 10-Ks
   Anote 10-Ks
    * 10-Ks Background
    * 10-Ks Questions
    * 10-Ks Decomposition
    * 10-Ks Feedback
    * 10-Ks Solution
    * 10-Ks Impact

Table of contents
 * Key Features
 * How Does Anote PrivateGPT Works?
 * Large Language Model and Our Backend


INTRODUCTION TO ANOTE PRIVATEGPT

Anote PrivateGPT enables enterprises to leverage generative AI and privacy
preserving LLMs to chat with your documents while keeping your private and data
secure. Anote's Private GPT provides enterprises with their own AI assistant,
acting as a chief artificial intelligence officer for a specific organization.
Members of the organization are able to ask any question about the organization,
and private GPT can answer any query based on your organizations data, while
keeping the data local, on premise, private and secure. Members of the
organization can chat with their documents to obtain relevant insights, and are
shown citations of where the answers specifically come from within documents in
their enterprise. For enterprises, this can be viewed as an on premise
GPT-for-your-business, where enterprises have their own GPT, catered
specifically for their needs. At the same time, enterprises have no risk of
sharing confidential and private data, as data is kept on premise, local,
private and secure.




KEY FEATURES

 * Local Environment: Anote PrivateGPT operates entirely within the user's local
   environment, providing a secure and private space for document interactions.
 * Document Storage: User documents are stored locally in a Chroma vector store,
   ensuring that they remain securely stored on the user's device or local
   storage infrastructure.
 * Privacy-Preserving Retrieval: Anote incorporates a privacy-preserving
   retrieval component that efficiently searches and retrieves relevant
   documents based on user queries. The retrieval process takes place locally,
   without transmitting sensitive information to external servers.
 * Privacy-Aware Language Models: Anote employs privacy-aware language models
   like LlamaCpp and GPT4All, which operate locally on the user's device or
   local infrastructure. These models preserve user privacy by avoiding the
   transmission of user queries or documents to external servers.
 * Query-Response Privacy: Anote ensures that user queries and responses are
   kept private. User queries are processed locally, and the system only reveals
   relevant answers without disclosing the underlying content or details of the
   user's documents.
 * Secure Execution: Anote implements security measures to protect the integrity
   and confidentiality of user data during execution. This includes secure
   execution environments, encryption of sensitive data, secure APIs and
   interfaces, and adherence to best practices for securing local
   infrastructure.
 * User Control and Consent: Anote prioritizes user control and consent. Users
   have full control over their documents and data, including the ability to
   choose which documents to include, initiate queries, and receive answers.
   Interactions with the system are based on explicit user consent and
   preferences.

By combining these principles, Anote PrivateGPT empowers users to chat with
their documents in a privacy-preserving way using the capabilities of the
generative AI models. The focus on local execution, document storage,
privacy-aware models, secure infrastructure, and user control ensures that your
data remains confidential, secure and on premise.


HOW DOES ANOTE PRIVATEGPT WORKS?

At Anote, you can easily upload your documents in various formats, including
pdf, email, html, docx, and csv, providing you with flexibility and convenience
when working with a wide range of document types.





Once you've uploaded your documents, you can efficiently manage your document
base and add more documents as needed. This streamlined document management
system ensures that working with multiple documents becomes a breeze.

You can initiate a chat session in our user-friendly interface to interact with
your uploaded documents. Just like texting, you can simply type your prompts,
and our system will promptly provide you with accurate and relevant responses.





Our user interface not only presents answers to your main prompts but also
includes references to the specific sections in the source documents where the
information was found. By clicking on the eye button, you can easily navigate to
the relevant document chunks and view the exact location of the retrieved
information.





Once you've finished your chat session, you can access informative dashboards
that display document related analytics for your organization. These dashboards
provide you with insights into the number of analyzed documents, common keywords
found within the documents, and simple graphs that offer a visual representation
of the data.






LARGE LANGUAGE MODEL AND OUR BACKEND

The previous sections show how Anote PrivateGPT would work from user
perspective. The flow chart below illustrates how Anote PrivateGPT works in the
backend. In general, we utilize two important pipelines: data ingestion and
query processing.

Data ingestion involves downloading user input documents, dividing them into
chunks, and securely storing the embeddings of the chunks in a local vector
database using Chroma. Anote PrivateGPT supports various formats, including
.csv, .doc, .docx, .enex, .eml, .epub, .html, .md, .odt, .pdf, .ppt, .pptx, and
.txt.

The query processing pipeline involves processing user queries and leveraging
the local Chroma vector database to retrieve relevant document chunks. The
retrieval process is followed by answer generation using powerful language
models like LlamaCpp and GPT4All.



The step-by-step process involved in Anote PrivateGPT's pipeline is as follows:

def main():
    # Parse the command line arguments
    args = parse_arguments()
    embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
    db = Chroma(
        persist_directory=persist_directory,
        embedding_function=embeddings,
        client_settings=CHROMA_SETTINGS
    )
    retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
    # activate/deactivate the streaming StdOut callback for LLMs
    callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
    # Prepare the LLM
    match model_type:
        case "LlamaCpp":
            llm = LlamaCpp(
                model_path=model_path,
                n_ctx=model_n_ctx,
                n_batch=model_n_batch,
                callbacks=callbacks,
                verbose=False
            )
        case "GPT4All":
            llm = GPT4All(
                model=model_path,
                n_ctx=model_n_ctx,
                backend='gptj',
                n_batch=model_n_batch,
                callbacks=callbacks,
                verbose=False
            )

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=not args.hide_source
)


 1. Embeddings Initialization: An instance of the HuggingFaceEmbeddings class is
    initialized to generate embedding vectors for the documents and capture
    their semantic representations.
 2. Vector Store Creation: An instance of the Chroma class is created to
    represent the vector store where document embeddings are securely stored.
    Chroma provides efficient retrieval capabilities for subsequent steps.
 3. Retrieval Setup: The code calls the as_retriever() method on the Chroma
    vector store object (db). This method creates and returns a retriever
    object. Additional arguments can be provided using the search_kwargs
    parameter. In this case, {"k": target_source_chunks} specifies the number of
    top-ranked documents to retrieve. The value of target_source_chunks
    determines the desired number of retrieved documents.
 4. Language Model Selection: The code checks the model_type to determine the
    appropriate language model (LLM) for answering the questions. Depending on
    the model_type, either an instance of LlamaCpp or GPT4All is created. These
    LLMs are distinct implementations of language models that will be utilized
    to generate answers based on the retrieved documents.
 5. RetrievalQA Instance: Finally, an instance of the RetrievalQA class is
    created, combining the retriever object, the selected LLM instance, and
    other necessary parameters. This class orchestrates the question-answering
    process, leveraging the retriever's retrieved documents and the LLM's
    language understanding and generation capabilities.

Made with Material for MkDocs