paperswithcode.com Open in urlscan Pro
2606:4700:20::681a:d9b  Public Scan

URL: https://paperswithcode.com/dataset/conceptual-captions
Submission: On August 24 via manual from DE — Scanned from DE

Form analysis 10 forms found in the DOM

GET /search

<form action="/search" method="get" id="id_global_search_form" autocomplete="off">
  <input type="text" name="q_meta" style="display:none" id="q_meta">
  <input type="hidden" name="q_type" id="q_type">
  <input id="id_global_search_input" autocomplete="off" value="" name="q" class="global-search" type="search" placeholder="Search">
  <button type="submit" class="icon"><span class=" icon-wrapper icon-fa icon-fa-light" data-name="search"><svg viewBox="0 0 512.025 520.146" xmlns="http://www.w3.org/2000/svg">
        <path
          d="M508.5 482.6c4.7 4.7 4.7 12.3 0 17l-9.9 9.9c-4.7 4.7-12.3 4.7-17 0l-129-129c-2.2-2.3-3.5-5.3-3.5-8.5v-10.2C312 396 262.5 417 208 417 93.1 417 0 323.9 0 209S93.1 1 208 1s208 93.1 208 208c0 54.5-21 104-55.3 141.1H371c3.2 0 6.2 1.2 8.5 3.5zM208 385c97.3 0 176-78.7 176-176S305.3 33 208 33 32 111.7 32 209s78.7 176 176 176z">
        </path>
      </svg></span></button>
  <ul id="result-box"></ul>
</form>

POST

<form action="" method="post">
  <div class="modal-body" style="opacity: 0;">
    <div class="modal-body-info-text"> Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.<br><br>
      <a href="/newsletter">Read previous issues</a>
    </div>
    <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
    <input placeholder="Enter your email" type="email" class="form-control pwc-email" name="address" id="id_address" max_length="100" required="">
  </div>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary">Subscribe</button>
  </div>
</form>

POST

<form class="dataset-form" method="post" enctype="multipart/form-data">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_name" class="form-group"> <label for="id_name" class=" requiredField"> Name:<span class="asteriskField">*</span> </label>
    <div class=""> <input type="text" name="name" value="Conceptual Captions" maxlength="128" class="textinput textInput form-control" required="" id="id_name"> </div>
  </div>
  <div id="div_id_full_name" class="form-group"> <label for="id_full_name" class=""> Full name (optional): </label>
    <div class=""> <input type="text" name="full_name" value="Conceptual Captions" class="textinput textInput form-control" id="id_full_name"> </div>
  </div>
  <div id="div_id_description" class="form-group"> <label for="id_description" class=" requiredField"> Description (Markdown and <mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="0"
        style="font-size: 113.1%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true" style="width: 2.781em;"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D43F TEX-I"></mjx-c></mjx-mi><mjx-mspace
            style="margin-right: -0.325em;"></mjx-mspace><mjx-mpadded><mjx-block style="margin: 0.21em 0px 0px; position: relative;"><mjx-rbox style="left: 0px; top: -0.21em;"><mjx-texatom texclass="ORD"><mjx-mstyle size="s"><mjx-texatom
                      texclass="ORD"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D434 TEX-I"></mjx-c></mjx-mi></mjx-texatom></mjx-mstyle></mjx-texatom></mjx-rbox></mjx-block></mjx-mpadded><mjx-mspace style="margin-right: -0.17em;"></mjx-mspace><mjx-mi
            class="mjx-i"><mjx-c class="mjx-c1D447 TEX-I"></mjx-c></mjx-mi><mjx-mspace style="margin-right: -0.14em;"></mjx-mspace><mjx-mpadded><mjx-block style="margin: -0.216em 0px 0.215em; position: relative;"><mjx-rbox
                style="left: 0px; top: 0.215em;"><mjx-texatom texclass="ORD"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D438 TEX-I"></mjx-c></mjx-mi></mjx-texatom></mjx-rbox></mjx-block></mjx-mpadded><mjx-mspace
            style="margin-right: -0.115em;"></mjx-mspace><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D44B TEX-I"></mjx-c></mjx-mi></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math
            xmlns="http://www.w3.org/1998/Math/MathML">
            <mi>L</mi>
            <mspace width="-.325em"></mspace>
            <mpadded height="+.21em" depth="-.21em" voffset="+.21em">
              <mrow>
                <mstyle displaystyle="false" scriptlevel="1">
                  <mrow>
                    <mi>A</mi>
                  </mrow>
                </mstyle>
              </mrow>
            </mpadded>
            <mspace width="-.17em"></mspace>
            <mi>T</mi>
            <mspace width="-.14em"></mspace>
            <mpadded height="-.5ex" depth="+.5ex" voffset="-.5ex">
              <mrow>
                <mi>E</mi>
              </mrow>
            </mpadded>
            <mspace width="-.115em"></mspace>
            <mi>X</mi>
          </math></mjx-assistive-mml></mjx-container> enabled):<span class="asteriskField">*</span> </label>
    <div class=""> <textarea name="description" cols="40" rows="6" placeholder="Briefly describe the dataset. Provide:

* a high-level explanation of the dataset characteristics
* explain motivations and summary of its content
* potential use cases of the dataset

If the description or image is from a different paper, please refer to it as follows:
Source: [title](url)
Image Source: [title](url)
" class="md-sources-autocomplete textarea form-control" required="" id="id_description">Automatic image captioning is the task of producing a natural-language utterance (usually a sentence) that correctly reflects the visual content of an image. Up to this point, the resource most used for this task was the MS-COCO dataset, containing around 120,000 images and 5-way image-caption annotations (produced by paid annotators).

Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions. In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images. The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.

Source: [Conceptual Captions](https://github.com/google-research-datasets/conceptual-captions)
Image Source: [Sharma et al](https://www.aclweb.org/anthology/P18-1238)</textarea> </div>
  </div>
  <div id="div_id_url" class="form-group"> <label for="id_url" class=""> Homepage URL (optional): </label>
    <div class=""> <input type="text" name="url" value="https://github.com/google-research-datasets/conceptual-captions" class="textinput textInput form-control" id="id_url"> </div>
  </div>
  <div id="div_id_paper" class="form-group"> <label for="id_paper" class=""> Paper where the dataset was introduced: </label>
    <div class=""> <select name="paper" style="width: 350px" class="modelselect2 form-control custom-select" id="id_paper" data-autocomplete-light-language="en" data-autocomplete-light-url="/sota/autocomplete/paper"
        data-autocomplete-light-function="select2">
        <option value="">---------</option>
        <option value="66977" selected="">Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning</option>
      </select>
      <div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_paper">
        <script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "all_portals"}]</script>
      </div>
    </div>
  </div>
  <div id="div_id_introduced_date" class="form-group"> <label for="id_introduced_date" class=""> Introduction date: </label>
    <div class=""> <input type="text" name="introduced_date" value="2018-07-01" autocomplete="off" class="dateinput form-control" id="id_introduced_date"> </div>
  </div>
  <div id="div_id_license_name" class="form-group"> <label for="id_license_name" class=""> Dataset license: </label>
    <div class=""> <input type="text" name="license_name" value="Custom" maxlength="500" class="textinput textInput form-control" id="id_license_name"> </div>
  </div>
  <div id="div_id_license_url" class="form-group"> <label for="id_license_url" class=""> URL to full license terms: </label>
    <div class=""> <input type="url" name="license_url" value="https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE" maxlength="200" class="urlinput form-control" id="id_license_url"> </div>
  </div>
  <div id="div_id_image" class="form-group"> <label for="id_image" class=""> Image </label>
    <div class=" mb-2">
      <div class="input-group mb-2">
        <div class="input-group-prepend"> <span class="input-group-text">Currently</span> </div>
        <div class="form-control d-flex h-auto"> <span class="text-break" style="flex-grow:1;min-width:0">
            <a href="https://production-media.paperswithcode.com/datasets/Screenshot_2021-01-07_at_17.03.39.png">datasets/Screenshot_2021-01-07_at_17.03.39.png</a> </span> <span class="align-self-center ml-2"> <span
              class="custom-control custom-checkbox"> <input type="checkbox" name="image-clear" id="image-clear_id" class="custom-control-input"> <label class="custom-control-label mb-0" for="image-clear_id">Clear</label> </span> </span> </div>
      </div>
      <div class="input-group mb-0">
        <div class="input-group-prepend"> <span class="input-group-text">Change</span> </div>
        <div class="form-control custom-file" style="border:0"> <input type="file" name="image" class="custom-file-input" accept="image/*" id="id_image"> <label class="custom-file-label text-truncate" for="id_image">---</label>
          <script type="text/javascript" id="script-id_image">
            document.getElementById("script-id_image").parentNode.querySelector('.custom-file-input').onchange = function(e) {
              var filenames = "";
              for (let i = 0; i < e.target.files.length; i++) {
                filenames += (i > 0 ? ", " : "") + e.target.files[i].name;
              }
              e.target.parentNode.querySelector('.custom-file-label').textContent = filenames;
            }
          </script>
        </div>
      </div>
      <div class="input-group mb-0"> </div>
    </div>
  </div> <input type="hidden" name="prediction_id" id="id_prediction_id">
  <div class="modal-footer">
    <button type="submit" name="edit-dataset" class="btn btn-primary">Save</button>
  </div>
</form>

POST

<form action="" method="post">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_tasks" class="form-group">
    <label for="id_tasks" class=""> Add or remove tasks: </label>
    <div class="">
      <select name="tasks" data-container-css-class="select2-lockable-tags" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_tasks" data-autocomplete-light-language="en"
        data-autocomplete-light-url="/task-autocomplete/" data-autocomplete-light-function="select2" multiple="">
        <option value="540" selected="">Image Captioning</option>
        <option value="9" selected="">Question Answering</option>
        <option value="168" selected="">Visual Question Answering (VQA)</option>
      </select>
      <div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_tasks">
        <script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "disable_create_option"}]</script>
      </div>
    </div>
  </div>
  <p> Some tasks are inferred based on the benchmarks list. </p>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="edit-tasks"> Save </button>
  </div>
</form>

POST

<form action="" method="post">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_library_url" class="form-group">
    <label for="id_library_url" class=" requiredField"> Code Repository URL:<span class="asteriskField">*</span>
    </label>
    <div class="">
      <input type="text" name="library_url" class="textinput textInput form-control" required="" id="id_library_url">
    </div>
  </div>
  <div id="div_id_loader_url" class="form-group">
    <label for="id_loader_url" class=""> [Optional] URL to documentation for this dataset: </label>
    <div class="">
      <input type="url" name="loader_url" class="urlinput form-control" id="id_loader_url">
    </div>
  </div>
  <div id="div_id_frameworks" class="form-group">
    <label class=""> Supported frameworks: </label>
    <div>
      <div class="form-check">
        <input type="checkbox" class="form-check-input" name="frameworks" value="tf" id="id_frameworks_0">
        <label class="form-check-label" for="id_frameworks_0"> TensorFlow </label>
      </div>
      <div class="form-check">
        <input type="checkbox" class="form-check-input" name="frameworks" value="pytorch" id="id_frameworks_1">
        <label class="form-check-label" for="id_frameworks_1"> PyTorch </label>
      </div>
      <div class="form-check">
        <input type="checkbox" class="form-check-input" name="frameworks" value="jax" id="id_frameworks_2">
        <label class="form-check-label" for="id_frameworks_2"> JAX </label>
      </div>
    </div>
  </div>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="add-loader"> Submit </button>
  </div>
</form>

POST

<form action="" method="post">
  <div class="modal-body" style="opacity: 0;">
    <ul class="list-unstyled">
      <li>
        <a href="https://huggingface.co/datasets/conceptual_captions">

                                                <span class=" icon-wrapper icon-ion" data-name="document-text-outline"><svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512"><path d="M416 221.25V416a48 48 0 0 1-48 48H144a48 48 0 0 1-48-48V96a48 48 0 0 1 48-48h98.75a32 32 0 0 1 22.62 9.37l141.26 141.26a32 32 0 0 1 9.37 22.62z" fill="none" stroke="#000" stroke-linejoin="round" stroke-width="32"></path><path d="M256 56v120a32 32 0 0 0 32 32h120m-232 80h160m-160 80h160" fill="none" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="32"></path></svg></span>
                                            
                                            huggingface/datasets
                                        </a>
        <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
        <input type="hidden" name="remove_loader_pk" value="2990">
        <button type="submit" class="btn btn-danger" style="width:2.5em">- </button>
      </li>
    </ul>
    <ul class="list-unstyled">
      <li>
        <a href="https://github.com/google-research-datasets/conceptual-captions">

                                                <span class=" icon-wrapper icon-ion" data-name="logo-github"><svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512"><path d="M256 32C132.3 32 32 134.9 32 261.7c0 101.5 64.2 187.5 153.2 217.9a17.56 17.56 0 0 0 3.8.4c8.3 0 11.5-6.1 11.5-11.4 0-5.5-.2-19.9-.3-39.1a102.4 102.4 0 0 1-22.6 2.7c-43.1 0-52.9-33.5-52.9-33.5-10.2-26.5-24.9-33.6-24.9-33.6-19.5-13.7-.1-14.1 1.4-14.1h.1c22.5 2 34.3 23.8 34.3 23.8 11.2 19.6 26.2 25.1 39.6 25.1a63 63 0 0 0 25.6-6c2-14.8 7.8-24.9 14.2-30.7-49.7-5.8-102-25.5-102-113.5 0-25.1 8.7-45.6 23-61.6-2.3-5.8-10-29.2 2.2-60.8a18.64 18.64 0 0 1 5-.5c8.1 0 26.4 3.1 56.6 24.1a208.21 208.21 0 0 1 112.2 0c30.2-21 48.5-24.1 56.6-24.1a18.64 18.64 0 0 1 5 .5c12.2 31.6 4.5 55 2.2 60.8 14.3 16.1 23 36.6 23 61.6 0 88.2-52.4 107.6-102.3 113.3 8 7.1 15.2 21.1 15.2 42.5 0 30.7-.3 55.5-.3 63 0 5.4 3.1 11.5 11.4 11.5a19.35 19.35 0 0 0 4-.4C415.9 449.2 480 363.1 480 261.7 480 134.9 379.7 32 256 32z"></path></svg></span>
                                            
                                            google-research-datasets/conceptual-captions
                                        </a>
        <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
        <input type="hidden" name="remove_loader_pk" value="1870">
        <button type="submit" class="btn btn-danger" style="width:2.5em">- </button>
      </li>
    </ul>
  </div>
</form>

POST

<form action="" method="post">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_modalities" class="form-group">
    <label for="id_modalities" class=""> Add or remove modalities: </label>
    <div class="">
      <select name="modalities" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_modalities" data-autocomplete-light-language="en"
        data-autocomplete-light-url="/dataset-collection-autocomplete/" data-autocomplete-light-function="select2" multiple="">
        <option value="4" selected="">Images</option>
        <option value="6" selected="">Texts</option>
      </select>
      <div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_modalities">
        <script type="text/dal-forward-conf">[{"type": "const", "val": "Modalities", "dst": "area_name"}]</script>
      </div>
    </div>
  </div>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="edit-modalities"> Save </button>
  </div>
</form>

POST

<form action="" method="post">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_languages" class="form-group">
    <label for="id_languages" class=""> Add or remove languages: </label>
    <div class="">
      <select name="languages" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_languages" data-autocomplete-light-language="en"
        data-autocomplete-light-url="/dataset-collection-autocomplete/" data-autocomplete-light-function="select2" multiple="">
        <option value="7" selected="">English</option>
      </select>
      <div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_languages">
        <script type="text/dal-forward-conf">[{"type": "const", "val": "Languages", "dst": "area_name"}]</script>
      </div>
    </div>
  </div>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="edit-languages"> Save </button>
  </div>
</form>

POST

<form action="" method="post">
  <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
  <div id="div_id_variants" class="form-group">
    <label for="id_variants" class=""> Add or remove variants: </label>
    <div class="">
      <select name="variants" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_variants" data-autocomplete-light-language="en"
        data-autocomplete-light-url="/dataset-autocomplete/" data-autocomplete-light-function="select2" multiple="">
        <option value="3824" selected="">Conceptual Captions</option>
      </select>
      <div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_variants">
        <script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "disable_create_option"}]</script>
      </div>
    </div>
  </div>
  <p> The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet
    64⨉64 are variants of the ImageNet dataset. </p>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="edit-variants"> Save </button>
  </div>
</form>

POST

<form action="" method="post">
  <div class="modal-body" style="opacity: 0;">
    <input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
    <div id="div_id_paper" class="form-group">
      <label for="id_paper_add_row" class=" requiredField"> Paper title:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <select name="paper" id="id_paper_add_row" class="modelselect2 form-control" required="" data-autocomplete-light-language="en" data-autocomplete-light-url="/paper-autocomplete/" data-autocomplete-light-function="select2">
          <option value="" selected="">---------</option>
        </select>
      </div>
    </div>
    <div id="div_id_dataset" class="form-group">
      <label for="id_dataset" class=" requiredField"> Dataset or its variant:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <select name="dataset" class="modelselect2 form-control" required="" id="id_dataset" data-autocomplete-light-language="en" data-autocomplete-light-url="/dataset-autocomplete/" data-autocomplete-light-function="select2">
          <option value="">---------</option>
          <option value="3824" selected="">Conceptual Captions</option>
        </select>
      </div>
    </div>
    <div id="div_id_task" class="form-group">
      <label for="id_task" class=" requiredField"> Task:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <select name="task" class="modelselect2 form-control" required="" id="id_task" data-autocomplete-light-language="en" data-autocomplete-light-url="/task-autocomplete/" data-autocomplete-light-function="select2">
          <option value="" selected="">---------</option>
        </select>
      </div>
    </div>
    <div id="div_id_model_name" class="form-group">
      <label for="id_model_name" class=" requiredField"> Model name:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <input type="text" name="model_name" class="textinput textInput form-control" required="" id="id_model_name">
      </div>
    </div>
    <div id="div_id_metric" class="form-group">
      <label for="id_metric" class=" requiredField"> Metric name:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <select name="metric" class="modelselect2 form-control" required="" id="id_metric" data-autocomplete-light-language="en" data-autocomplete-light-url="/metric-autocomplete/" data-autocomplete-light-function="select2">
          <option value="" selected="">---------</option>
        </select>
      </div>
    </div>
    <div id="sota-metric-names">
    </div>
    <div class="form-group">
      <div id="div_id_metric_higher_is_better" class="form-check">
        <input type="checkbox" name="metric_higher_is_better" class="checkboxinput form-check-input" id="id_metric_higher_is_better">
        <label for="id_metric_higher_is_better" class="form-check-label"> Higher is better (for the metric) </label>
      </div>
    </div>
    <div id="div_id_metric_value" class="form-group">
      <label for="id_metric_value" class=" requiredField"> Metric value:<span class="asteriskField">*</span>
      </label>
      <div class="">
        <input type="text" name="metric_value" class="textinput textInput form-control" required="" id="id_metric_value">
      </div>
    </div>
    <div id="sota-metric-values">
    </div>
    <div class="form-group">
      <div id="div_id_uses_additional_data" class="form-check">
        <input type="checkbox" name="uses_additional_data" class="checkboxinput form-check-input" id="id_uses_additional_data">
        <label for="id_uses_additional_data" class="form-check-label"> Uses extra training data </label>
      </div>
    </div>
    <div id="div_id_evaluated_on" class="form-group">
      <label for="id_evaluated_on" class=""> Data evaluated on </label>
      <div class="">
        <input type="text" name="evaluated_on" autocomplete="off" class="dateinput form-control" id="id_evaluated_on">
      </div>
    </div>
  </div>
  <div class="modal-footer">
    <button type="submit" class="btn btn-primary" name="add-row">Submit </button>
  </div>
</form>

Text Content

 * 
 * Browse State-of-the-Art
 * Datasets
 * Methods
 * More
   Newsletter RC2022
   
   About Trends Portals Libraries

 * 
 * 
 * Sign In


SUBSCRIBE TO THE PWC NEWSLETTER

×
Stay informed on the latest trending ML papers with code, research developments,
libraries, methods, and datasets.

Read previous issues
Subscribe

JOIN THE COMMUNITY

×
You need to log in to edit.
You can create a new account if you don't have one.

Or, discuss a change on Slack.

EDIT DATASET

×
Name:*

Full name (optional):

Description (Markdown and LATEX enabled):*
Automatic image captioning is the task of producing a natural-language utterance
(usually a sentence) that correctly reflects the visual content of an image. Up
to this point, the resource most used for this task was the MS-COCO dataset,
containing around 120,000 images and 5-way image-caption annotations (produced
by paid annotators). Google's Conceptual Captions dataset has more than 3
million images, paired with natural-language captions. In contrast with the
curated style of the MS-COCO images, Conceptual Captions images and their raw
descriptions are harvested from the web, and therefore represent a wider variety
of styles. The raw descriptions are harvested from the Alt-text HTML attribute
associated with web images. The authors developed an automatic pipeline that
extracts, filters, and transforms candidate image/caption pairs, with the goal
of achieving a balance of cleanliness, informativeness, fluency, and
learnability of the resulting captions. Source: [Conceptual
Captions](https://github.com/google-research-datasets/conceptual-captions) Image
Source: [Sharma et al](https://www.aclweb.org/anthology/P18-1238)
Homepage URL (optional):

Paper where the dataset was introduced:
--------- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For
Automatic Image Captioning

Introduction date:

Dataset license:

URL to full license terms:

Image
Currently
datasets/Screenshot_2021-01-07_at_17.03.39.png Clear
Change
---

Save

EDIT DATASET TASKS

×
Add or remove tasks:
Image Captioning Question Answering Visual Question Answering (VQA)


Some tasks are inferred based on the benchmarks list.

Save

ADD A DATA LOADER

×
Code Repository URL:*

[Optional] URL to documentation for this dataset:

Supported frameworks:
TensorFlow
PyTorch
JAX
Submit

REMOVE A DATA LOADER

×
 * huggingface/datasets -

google-research-datasets/conceptual-captions -

EDIT DATASET MODALITIES

×
Add or remove modalities:
Images Texts

Save

EDIT DATASET LANGUAGES

×
Add or remove languages:
English

Save

EDIT DATASET VARIANTS

×
Add or remove variants:
Conceptual Captions


The benchmarks section lists all benchmarks using a given dataset or any of its
variants. We use variants to distinguish between results evaluated on slightly
different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet
64⨉64 are variants of the ImageNet dataset.

Save

ADD A NEW EVALUATION RESULT ROW

×
Paper title:*
---------
Dataset or its variant:*
--------- Conceptual Captions
Task:*
---------
Model name:*

Metric name:*
---------

Higher is better (for the metric)
Metric value:*


Uses extra training data
Data evaluated on

Submit
Images
Edit


CONCEPTUAL CAPTIONS

Introduced by Sharma et al. in Conceptual Captions: A Cleaned, Hypernymed, Image
Alt-text Dataset For Automatic Image Captioning


Automatic image captioning is the task of producing a natural-language utterance
(usually a sentence) that correctly reflects the visual content of an image. Up
to this point, the resource most used for this task was the MS-COCO dataset,
containing around 120,000 images and 5-way image-caption annotations (produced
by paid annotators).

Google's Conceptual Captions dataset has more than 3 million images, paired with
natural-language captions. In contrast with the curated style of the MS-COCO
images, Conceptual Captions images and their raw descriptions are harvested from
the web, and therefore represent a wider variety of styles. The raw descriptions
are harvested from the Alt-text HTML attribute associated with web images. The
authors developed an automatic pipeline that extracts, filters, and transforms
candidate image/caption pairs, with the goal of achieving a balance of
cleanliness, informativeness, fluency, and learnability of the resulting
captions.

Source: Conceptual Captions
Homepage

BENCHMARKS
EDIT
ADD A NEW RESULT LINK AN EXISTING BENCHMARK

--------------------------------------------------------------------------------

Trend Task Dataset Variant Best Model Paper Code
Image Captioning
Conceptual Captions
ClipCap



PAPERS

--------------------------------------------------------------------------------

PaperCodeResultsDateStars
GIT: A Generative Image-to-text Transformer for Vision and Language
Kevin Lin, Ce Liu, Linjie Li, Xiaowei Hu, JianFeng Wang, Zhengyuan Yang, Lijuan
Wang, Zhe Gan, Zicheng Liu


27 May 2022
110,602
TVLT: Textless Vision-Language Transformer
Mohit Bansal, Jaemin Cho, Yixin Nie, Zineng Tang


28 Sep 2022
110,602
Language Is Not All You Need: Aligning Perception with Language Models
Shuming Ma, Qiang Liu, Zewen Chi, Saksham Singhal, Wenhui Wang, Lei Cui,
Tengchao Lv, Shaohan Huang, Furu Wei, Li Dong, Yaru Hao, Xia Song, Barun Patra,
Johan Bjorck, Vishrav Chaudhary, Owais Khan Mohammed, Kriti Aggarwal, Subhojit
Som


27 Feb 2023
15,075
Zero-Shot Text-to-Image Generation
Mikhail Pavlov, Chelsea Voss, Mark Chen, Gabriel Goh, Aditya Ramesh, Alec
Radford, Scott Gray, Ilya Sutskever


24 Feb 2021
10,417
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language
Generation
Haifeng Wang, Weichong Yin, Zhihua Wu, Hao Tian, Han Zhang, Hua Wu, Yu Sun,
Yewei Fang, Lanxin Li, Boqiang Duan


31 Dec 2021
9,984
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai, Lucas Beyer, Xiao Wang, Daniel Keysers, Basil Mustafa, Alexander
Kolesnikov, Andreas Steiner


15 Nov 2021
7,793
LAVIS: A Library for Language-Vision Intelligence
Hung Le, Steven C. H. Hoi, Silvio Savarese, Dongxu Li, Junnan Li, Guangsen Wang


15 Sep 2022
6,410
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
Skip-connections
Songfang Huang, Luo Si, Guohai Xu, Ming Yan, Bin Bi, Fei Huang, Zheng Cao, Ji
Zhang, Jingren Zhou, Wei Wang, Chenliang Li, Junfeng Tian, Hehong Chen, Haiyang
Xu, Jiabo Ye


24 May 2022
1,812
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou, Ishan Misra, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl


7 Jan 2022
1,557
Fusion of Detected Objects in Text for Visual Question Answering
David Reitter, Jeffrey Ling, Chris Alberti, Michael Collins


14 Aug 2019
1,486

 * Previous
 * 1
 * 2
 * 3
 * 4
 * 5
 * …
 * 28
 * Next

Showing 1 to 10 of 272 papers

DATASET LOADERS
EDIT
ADD REMOVE

--------------------------------------------------------------------------------


huggingface/datasets
16,991


google-research-datasets/conceptual-captions
445


TASKS
EDIT

--------------------------------------------------------------------------------

 * Question Answering
 * Visual Question Answering (VQA)
 * Image Captioning

SIMILAR DATASETS

--------------------------------------------------------------------------------


VCR


VISUAL GENOME


FLICKR30K


COCO CAPTIONS

Source: Sharma et al.

USAGE

--------------------------------------------------------------------------------

Created with Highcharts 9.3.0Number of
Papers201920202021202220230100255075Conceptual CaptionsVCRVisual GenomeFlickr30k

LICENSE
EDIT

--------------------------------------------------------------------------------

 * Custom

MODALITIES
EDIT

--------------------------------------------------------------------------------

 * Images
 * Texts

LANGUAGES
EDIT

--------------------------------------------------------------------------------

 * English


Contact us on: hello@paperswithcode.com . Papers With Code is a free resource
with all data licensed under CC-BY-SA.
Terms Data policy Cookies policy from