paperswithcode.com
Open in
urlscan Pro
2606:4700:20::681a:d9b
Public Scan
URL:
https://paperswithcode.com/dataset/conceptual-captions
Submission: On August 24 via manual from DE — Scanned from DE
Submission: On August 24 via manual from DE — Scanned from DE
Form analysis
10 forms found in the DOMGET /search
<form action="/search" method="get" id="id_global_search_form" autocomplete="off">
<input type="text" name="q_meta" style="display:none" id="q_meta">
<input type="hidden" name="q_type" id="q_type">
<input id="id_global_search_input" autocomplete="off" value="" name="q" class="global-search" type="search" placeholder="Search">
<button type="submit" class="icon"><span class=" icon-wrapper icon-fa icon-fa-light" data-name="search"><svg viewBox="0 0 512.025 520.146" xmlns="http://www.w3.org/2000/svg">
<path
d="M508.5 482.6c4.7 4.7 4.7 12.3 0 17l-9.9 9.9c-4.7 4.7-12.3 4.7-17 0l-129-129c-2.2-2.3-3.5-5.3-3.5-8.5v-10.2C312 396 262.5 417 208 417 93.1 417 0 323.9 0 209S93.1 1 208 1s208 93.1 208 208c0 54.5-21 104-55.3 141.1H371c3.2 0 6.2 1.2 8.5 3.5zM208 385c97.3 0 176-78.7 176-176S305.3 33 208 33 32 111.7 32 209s78.7 176 176 176z">
</path>
</svg></span></button>
<ul id="result-box"></ul>
</form>
POST
<form action="" method="post">
<div class="modal-body" style="opacity: 0;">
<div class="modal-body-info-text"> Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.<br><br>
<a href="/newsletter">Read previous issues</a>
</div>
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<input placeholder="Enter your email" type="email" class="form-control pwc-email" name="address" id="id_address" max_length="100" required="">
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary">Subscribe</button>
</div>
</form>
POST
<form class="dataset-form" method="post" enctype="multipart/form-data">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_name" class="form-group"> <label for="id_name" class=" requiredField"> Name:<span class="asteriskField">*</span> </label>
<div class=""> <input type="text" name="name" value="Conceptual Captions" maxlength="128" class="textinput textInput form-control" required="" id="id_name"> </div>
</div>
<div id="div_id_full_name" class="form-group"> <label for="id_full_name" class=""> Full name (optional): </label>
<div class=""> <input type="text" name="full_name" value="Conceptual Captions" class="textinput textInput form-control" id="id_full_name"> </div>
</div>
<div id="div_id_description" class="form-group"> <label for="id_description" class=" requiredField"> Description (Markdown and <mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="0"
style="font-size: 113.1%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true" style="width: 2.781em;"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D43F TEX-I"></mjx-c></mjx-mi><mjx-mspace
style="margin-right: -0.325em;"></mjx-mspace><mjx-mpadded><mjx-block style="margin: 0.21em 0px 0px; position: relative;"><mjx-rbox style="left: 0px; top: -0.21em;"><mjx-texatom texclass="ORD"><mjx-mstyle size="s"><mjx-texatom
texclass="ORD"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D434 TEX-I"></mjx-c></mjx-mi></mjx-texatom></mjx-mstyle></mjx-texatom></mjx-rbox></mjx-block></mjx-mpadded><mjx-mspace style="margin-right: -0.17em;"></mjx-mspace><mjx-mi
class="mjx-i"><mjx-c class="mjx-c1D447 TEX-I"></mjx-c></mjx-mi><mjx-mspace style="margin-right: -0.14em;"></mjx-mspace><mjx-mpadded><mjx-block style="margin: -0.216em 0px 0.215em; position: relative;"><mjx-rbox
style="left: 0px; top: 0.215em;"><mjx-texatom texclass="ORD"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D438 TEX-I"></mjx-c></mjx-mi></mjx-texatom></mjx-rbox></mjx-block></mjx-mpadded><mjx-mspace
style="margin-right: -0.115em;"></mjx-mspace><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D44B TEX-I"></mjx-c></mjx-mi></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math
xmlns="http://www.w3.org/1998/Math/MathML">
<mi>L</mi>
<mspace width="-.325em"></mspace>
<mpadded height="+.21em" depth="-.21em" voffset="+.21em">
<mrow>
<mstyle displaystyle="false" scriptlevel="1">
<mrow>
<mi>A</mi>
</mrow>
</mstyle>
</mrow>
</mpadded>
<mspace width="-.17em"></mspace>
<mi>T</mi>
<mspace width="-.14em"></mspace>
<mpadded height="-.5ex" depth="+.5ex" voffset="-.5ex">
<mrow>
<mi>E</mi>
</mrow>
</mpadded>
<mspace width="-.115em"></mspace>
<mi>X</mi>
</math></mjx-assistive-mml></mjx-container> enabled):<span class="asteriskField">*</span> </label>
<div class=""> <textarea name="description" cols="40" rows="6" placeholder="Briefly describe the dataset. Provide:
* a high-level explanation of the dataset characteristics
* explain motivations and summary of its content
* potential use cases of the dataset
If the description or image is from a different paper, please refer to it as follows:
Source: [title](url)
Image Source: [title](url)
" class="md-sources-autocomplete textarea form-control" required="" id="id_description">Automatic image captioning is the task of producing a natural-language utterance (usually a sentence) that correctly reflects the visual content of an image. Up to this point, the resource most used for this task was the MS-COCO dataset, containing around 120,000 images and 5-way image-caption annotations (produced by paid annotators).
Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions. In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images. The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions.
Source: [Conceptual Captions](https://github.com/google-research-datasets/conceptual-captions)
Image Source: [Sharma et al](https://www.aclweb.org/anthology/P18-1238)</textarea> </div>
</div>
<div id="div_id_url" class="form-group"> <label for="id_url" class=""> Homepage URL (optional): </label>
<div class=""> <input type="text" name="url" value="https://github.com/google-research-datasets/conceptual-captions" class="textinput textInput form-control" id="id_url"> </div>
</div>
<div id="div_id_paper" class="form-group"> <label for="id_paper" class=""> Paper where the dataset was introduced: </label>
<div class=""> <select name="paper" style="width: 350px" class="modelselect2 form-control custom-select" id="id_paper" data-autocomplete-light-language="en" data-autocomplete-light-url="/sota/autocomplete/paper"
data-autocomplete-light-function="select2">
<option value="">---------</option>
<option value="66977" selected="">Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning</option>
</select>
<div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_paper">
<script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "all_portals"}]</script>
</div>
</div>
</div>
<div id="div_id_introduced_date" class="form-group"> <label for="id_introduced_date" class=""> Introduction date: </label>
<div class=""> <input type="text" name="introduced_date" value="2018-07-01" autocomplete="off" class="dateinput form-control" id="id_introduced_date"> </div>
</div>
<div id="div_id_license_name" class="form-group"> <label for="id_license_name" class=""> Dataset license: </label>
<div class=""> <input type="text" name="license_name" value="Custom" maxlength="500" class="textinput textInput form-control" id="id_license_name"> </div>
</div>
<div id="div_id_license_url" class="form-group"> <label for="id_license_url" class=""> URL to full license terms: </label>
<div class=""> <input type="url" name="license_url" value="https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE" maxlength="200" class="urlinput form-control" id="id_license_url"> </div>
</div>
<div id="div_id_image" class="form-group"> <label for="id_image" class=""> Image </label>
<div class=" mb-2">
<div class="input-group mb-2">
<div class="input-group-prepend"> <span class="input-group-text">Currently</span> </div>
<div class="form-control d-flex h-auto"> <span class="text-break" style="flex-grow:1;min-width:0">
<a href="https://production-media.paperswithcode.com/datasets/Screenshot_2021-01-07_at_17.03.39.png">datasets/Screenshot_2021-01-07_at_17.03.39.png</a> </span> <span class="align-self-center ml-2"> <span
class="custom-control custom-checkbox"> <input type="checkbox" name="image-clear" id="image-clear_id" class="custom-control-input"> <label class="custom-control-label mb-0" for="image-clear_id">Clear</label> </span> </span> </div>
</div>
<div class="input-group mb-0">
<div class="input-group-prepend"> <span class="input-group-text">Change</span> </div>
<div class="form-control custom-file" style="border:0"> <input type="file" name="image" class="custom-file-input" accept="image/*" id="id_image"> <label class="custom-file-label text-truncate" for="id_image">---</label>
<script type="text/javascript" id="script-id_image">
document.getElementById("script-id_image").parentNode.querySelector('.custom-file-input').onchange = function(e) {
var filenames = "";
for (let i = 0; i < e.target.files.length; i++) {
filenames += (i > 0 ? ", " : "") + e.target.files[i].name;
}
e.target.parentNode.querySelector('.custom-file-label').textContent = filenames;
}
</script>
</div>
</div>
<div class="input-group mb-0"> </div>
</div>
</div> <input type="hidden" name="prediction_id" id="id_prediction_id">
<div class="modal-footer">
<button type="submit" name="edit-dataset" class="btn btn-primary">Save</button>
</div>
</form>
POST
<form action="" method="post">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_tasks" class="form-group">
<label for="id_tasks" class=""> Add or remove tasks: </label>
<div class="">
<select name="tasks" data-container-css-class="select2-lockable-tags" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_tasks" data-autocomplete-light-language="en"
data-autocomplete-light-url="/task-autocomplete/" data-autocomplete-light-function="select2" multiple="">
<option value="540" selected="">Image Captioning</option>
<option value="9" selected="">Question Answering</option>
<option value="168" selected="">Visual Question Answering (VQA)</option>
</select>
<div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_tasks">
<script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "disable_create_option"}]</script>
</div>
</div>
</div>
<p> Some tasks are inferred based on the benchmarks list. </p>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="edit-tasks"> Save </button>
</div>
</form>
POST
<form action="" method="post">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_library_url" class="form-group">
<label for="id_library_url" class=" requiredField"> Code Repository URL:<span class="asteriskField">*</span>
</label>
<div class="">
<input type="text" name="library_url" class="textinput textInput form-control" required="" id="id_library_url">
</div>
</div>
<div id="div_id_loader_url" class="form-group">
<label for="id_loader_url" class=""> [Optional] URL to documentation for this dataset: </label>
<div class="">
<input type="url" name="loader_url" class="urlinput form-control" id="id_loader_url">
</div>
</div>
<div id="div_id_frameworks" class="form-group">
<label class=""> Supported frameworks: </label>
<div>
<div class="form-check">
<input type="checkbox" class="form-check-input" name="frameworks" value="tf" id="id_frameworks_0">
<label class="form-check-label" for="id_frameworks_0"> TensorFlow </label>
</div>
<div class="form-check">
<input type="checkbox" class="form-check-input" name="frameworks" value="pytorch" id="id_frameworks_1">
<label class="form-check-label" for="id_frameworks_1"> PyTorch </label>
</div>
<div class="form-check">
<input type="checkbox" class="form-check-input" name="frameworks" value="jax" id="id_frameworks_2">
<label class="form-check-label" for="id_frameworks_2"> JAX </label>
</div>
</div>
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="add-loader"> Submit </button>
</div>
</form>
POST
<form action="" method="post">
<div class="modal-body" style="opacity: 0;">
<ul class="list-unstyled">
<li>
<a href="https://huggingface.co/datasets/conceptual_captions">
<span class=" icon-wrapper icon-ion" data-name="document-text-outline"><svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512"><path d="M416 221.25V416a48 48 0 0 1-48 48H144a48 48 0 0 1-48-48V96a48 48 0 0 1 48-48h98.75a32 32 0 0 1 22.62 9.37l141.26 141.26a32 32 0 0 1 9.37 22.62z" fill="none" stroke="#000" stroke-linejoin="round" stroke-width="32"></path><path d="M256 56v120a32 32 0 0 0 32 32h120m-232 80h160m-160 80h160" fill="none" stroke="#000" stroke-linecap="round" stroke-linejoin="round" stroke-width="32"></path></svg></span>
huggingface/datasets
</a>
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<input type="hidden" name="remove_loader_pk" value="2990">
<button type="submit" class="btn btn-danger" style="width:2.5em">- </button>
</li>
</ul>
<ul class="list-unstyled">
<li>
<a href="https://github.com/google-research-datasets/conceptual-captions">
<span class=" icon-wrapper icon-ion" data-name="logo-github"><svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512"><path d="M256 32C132.3 32 32 134.9 32 261.7c0 101.5 64.2 187.5 153.2 217.9a17.56 17.56 0 0 0 3.8.4c8.3 0 11.5-6.1 11.5-11.4 0-5.5-.2-19.9-.3-39.1a102.4 102.4 0 0 1-22.6 2.7c-43.1 0-52.9-33.5-52.9-33.5-10.2-26.5-24.9-33.6-24.9-33.6-19.5-13.7-.1-14.1 1.4-14.1h.1c22.5 2 34.3 23.8 34.3 23.8 11.2 19.6 26.2 25.1 39.6 25.1a63 63 0 0 0 25.6-6c2-14.8 7.8-24.9 14.2-30.7-49.7-5.8-102-25.5-102-113.5 0-25.1 8.7-45.6 23-61.6-2.3-5.8-10-29.2 2.2-60.8a18.64 18.64 0 0 1 5-.5c8.1 0 26.4 3.1 56.6 24.1a208.21 208.21 0 0 1 112.2 0c30.2-21 48.5-24.1 56.6-24.1a18.64 18.64 0 0 1 5 .5c12.2 31.6 4.5 55 2.2 60.8 14.3 16.1 23 36.6 23 61.6 0 88.2-52.4 107.6-102.3 113.3 8 7.1 15.2 21.1 15.2 42.5 0 30.7-.3 55.5-.3 63 0 5.4 3.1 11.5 11.4 11.5a19.35 19.35 0 0 0 4-.4C415.9 449.2 480 363.1 480 261.7 480 134.9 379.7 32 256 32z"></path></svg></span>
google-research-datasets/conceptual-captions
</a>
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<input type="hidden" name="remove_loader_pk" value="1870">
<button type="submit" class="btn btn-danger" style="width:2.5em">- </button>
</li>
</ul>
</div>
</form>
POST
<form action="" method="post">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_modalities" class="form-group">
<label for="id_modalities" class=""> Add or remove modalities: </label>
<div class="">
<select name="modalities" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_modalities" data-autocomplete-light-language="en"
data-autocomplete-light-url="/dataset-collection-autocomplete/" data-autocomplete-light-function="select2" multiple="">
<option value="4" selected="">Images</option>
<option value="6" selected="">Texts</option>
</select>
<div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_modalities">
<script type="text/dal-forward-conf">[{"type": "const", "val": "Modalities", "dst": "area_name"}]</script>
</div>
</div>
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="edit-modalities"> Save </button>
</div>
</form>
POST
<form action="" method="post">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_languages" class="form-group">
<label for="id_languages" class=""> Add or remove languages: </label>
<div class="">
<select name="languages" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_languages" data-autocomplete-light-language="en"
data-autocomplete-light-url="/dataset-collection-autocomplete/" data-autocomplete-light-function="select2" multiple="">
<option value="7" selected="">English</option>
</select>
<div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_languages">
<script type="text/dal-forward-conf">[{"type": "const", "val": "Languages", "dst": "area_name"}]</script>
</div>
</div>
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="edit-languages"> Save </button>
</div>
</form>
POST
<form action="" method="post">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_variants" class="form-group">
<label for="id_variants" class=""> Add or remove variants: </label>
<div class="">
<select name="variants" data-container-css-class="" data-allow-clear="false" style="width: 100%" class="modelselect2multiple form-control" id="id_variants" data-autocomplete-light-language="en"
data-autocomplete-light-url="/dataset-autocomplete/" data-autocomplete-light-function="select2" multiple="">
<option value="3824" selected="">Conceptual Captions</option>
</select>
<div style="display:none" class="dal-forward-conf" id="dal-forward-conf-for_id_variants">
<script type="text/dal-forward-conf">[{"type": "const", "val": true, "dst": "disable_create_option"}]</script>
</div>
</div>
</div>
<p> The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet
64⨉64 are variants of the ImageNet dataset. </p>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="edit-variants"> Save </button>
</div>
</form>
POST
<form action="" method="post">
<div class="modal-body" style="opacity: 0;">
<input type="hidden" name="csrfmiddlewaretoken" value="wMtsUJEDKxfjyuykniX25hXe16cdV7oeXduzEZ4nulOp8A2fwA8nFAu9xU83RBM7">
<div id="div_id_paper" class="form-group">
<label for="id_paper_add_row" class=" requiredField"> Paper title:<span class="asteriskField">*</span>
</label>
<div class="">
<select name="paper" id="id_paper_add_row" class="modelselect2 form-control" required="" data-autocomplete-light-language="en" data-autocomplete-light-url="/paper-autocomplete/" data-autocomplete-light-function="select2">
<option value="" selected="">---------</option>
</select>
</div>
</div>
<div id="div_id_dataset" class="form-group">
<label for="id_dataset" class=" requiredField"> Dataset or its variant:<span class="asteriskField">*</span>
</label>
<div class="">
<select name="dataset" class="modelselect2 form-control" required="" id="id_dataset" data-autocomplete-light-language="en" data-autocomplete-light-url="/dataset-autocomplete/" data-autocomplete-light-function="select2">
<option value="">---------</option>
<option value="3824" selected="">Conceptual Captions</option>
</select>
</div>
</div>
<div id="div_id_task" class="form-group">
<label for="id_task" class=" requiredField"> Task:<span class="asteriskField">*</span>
</label>
<div class="">
<select name="task" class="modelselect2 form-control" required="" id="id_task" data-autocomplete-light-language="en" data-autocomplete-light-url="/task-autocomplete/" data-autocomplete-light-function="select2">
<option value="" selected="">---------</option>
</select>
</div>
</div>
<div id="div_id_model_name" class="form-group">
<label for="id_model_name" class=" requiredField"> Model name:<span class="asteriskField">*</span>
</label>
<div class="">
<input type="text" name="model_name" class="textinput textInput form-control" required="" id="id_model_name">
</div>
</div>
<div id="div_id_metric" class="form-group">
<label for="id_metric" class=" requiredField"> Metric name:<span class="asteriskField">*</span>
</label>
<div class="">
<select name="metric" class="modelselect2 form-control" required="" id="id_metric" data-autocomplete-light-language="en" data-autocomplete-light-url="/metric-autocomplete/" data-autocomplete-light-function="select2">
<option value="" selected="">---------</option>
</select>
</div>
</div>
<div id="sota-metric-names">
</div>
<div class="form-group">
<div id="div_id_metric_higher_is_better" class="form-check">
<input type="checkbox" name="metric_higher_is_better" class="checkboxinput form-check-input" id="id_metric_higher_is_better">
<label for="id_metric_higher_is_better" class="form-check-label"> Higher is better (for the metric) </label>
</div>
</div>
<div id="div_id_metric_value" class="form-group">
<label for="id_metric_value" class=" requiredField"> Metric value:<span class="asteriskField">*</span>
</label>
<div class="">
<input type="text" name="metric_value" class="textinput textInput form-control" required="" id="id_metric_value">
</div>
</div>
<div id="sota-metric-values">
</div>
<div class="form-group">
<div id="div_id_uses_additional_data" class="form-check">
<input type="checkbox" name="uses_additional_data" class="checkboxinput form-check-input" id="id_uses_additional_data">
<label for="id_uses_additional_data" class="form-check-label"> Uses extra training data </label>
</div>
</div>
<div id="div_id_evaluated_on" class="form-group">
<label for="id_evaluated_on" class=""> Data evaluated on </label>
<div class="">
<input type="text" name="evaluated_on" autocomplete="off" class="dateinput form-control" id="id_evaluated_on">
</div>
</div>
</div>
<div class="modal-footer">
<button type="submit" class="btn btn-primary" name="add-row">Submit </button>
</div>
</form>
Text Content
* * Browse State-of-the-Art * Datasets * Methods * More Newsletter RC2022 About Trends Portals Libraries * * * Sign In SUBSCRIBE TO THE PWC NEWSLETTER × Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. Read previous issues Subscribe JOIN THE COMMUNITY × You need to log in to edit. You can create a new account if you don't have one. Or, discuss a change on Slack. EDIT DATASET × Name:* Full name (optional): Description (Markdown and LATEX enabled):* Automatic image captioning is the task of producing a natural-language utterance (usually a sentence) that correctly reflects the visual content of an image. Up to this point, the resource most used for this task was the MS-COCO dataset, containing around 120,000 images and 5-way image-caption annotations (produced by paid annotators). Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions. In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images. The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions. Source: [Conceptual Captions](https://github.com/google-research-datasets/conceptual-captions) Image Source: [Sharma et al](https://www.aclweb.org/anthology/P18-1238) Homepage URL (optional): Paper where the dataset was introduced: --------- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning Introduction date: Dataset license: URL to full license terms: Image Currently datasets/Screenshot_2021-01-07_at_17.03.39.png Clear Change --- Save EDIT DATASET TASKS × Add or remove tasks: Image Captioning Question Answering Visual Question Answering (VQA) Some tasks are inferred based on the benchmarks list. Save ADD A DATA LOADER × Code Repository URL:* [Optional] URL to documentation for this dataset: Supported frameworks: TensorFlow PyTorch JAX Submit REMOVE A DATA LOADER × * huggingface/datasets - google-research-datasets/conceptual-captions - EDIT DATASET MODALITIES × Add or remove modalities: Images Texts Save EDIT DATASET LANGUAGES × Add or remove languages: English Save EDIT DATASET VARIANTS × Add or remove variants: Conceptual Captions The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset. Save ADD A NEW EVALUATION RESULT ROW × Paper title:* --------- Dataset or its variant:* --------- Conceptual Captions Task:* --------- Model name:* Metric name:* --------- Higher is better (for the metric) Metric value:* Uses extra training data Data evaluated on Submit Images Edit CONCEPTUAL CAPTIONS Introduced by Sharma et al. in Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning Automatic image captioning is the task of producing a natural-language utterance (usually a sentence) that correctly reflects the visual content of an image. Up to this point, the resource most used for this task was the MS-COCO dataset, containing around 120,000 images and 5-way image-caption annotations (produced by paid annotators). Google's Conceptual Captions dataset has more than 3 million images, paired with natural-language captions. In contrast with the curated style of the MS-COCO images, Conceptual Captions images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. The raw descriptions are harvested from the Alt-text HTML attribute associated with web images. The authors developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions. Source: Conceptual Captions Homepage BENCHMARKS EDIT ADD A NEW RESULT LINK AN EXISTING BENCHMARK -------------------------------------------------------------------------------- Trend Task Dataset Variant Best Model Paper Code Image Captioning Conceptual Captions ClipCap PAPERS -------------------------------------------------------------------------------- PaperCodeResultsDateStars GIT: A Generative Image-to-text Transformer for Vision and Language Kevin Lin, Ce Liu, Linjie Li, Xiaowei Hu, JianFeng Wang, Zhengyuan Yang, Lijuan Wang, Zhe Gan, Zicheng Liu 27 May 2022 110,602 TVLT: Textless Vision-Language Transformer Mohit Bansal, Jaemin Cho, Yixin Nie, Zineng Tang 28 Sep 2022 110,602 Language Is Not All You Need: Aligning Perception with Language Models Shuming Ma, Qiang Liu, Zewen Chi, Saksham Singhal, Wenhui Wang, Lei Cui, Tengchao Lv, Shaohan Huang, Furu Wei, Li Dong, Yaru Hao, Xia Song, Barun Patra, Johan Bjorck, Vishrav Chaudhary, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som 27 Feb 2023 15,075 Zero-Shot Text-to-Image Generation Mikhail Pavlov, Chelsea Voss, Mark Chen, Gabriel Goh, Aditya Ramesh, Alec Radford, Scott Gray, Ilya Sutskever 24 Feb 2021 10,417 ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Haifeng Wang, Weichong Yin, Zhihua Wu, Hao Tian, Han Zhang, Hua Wu, Yu Sun, Yewei Fang, Lanxin Li, Boqiang Duan 31 Dec 2021 9,984 LiT: Zero-Shot Transfer with Locked-image text Tuning Xiaohua Zhai, Lucas Beyer, Xiao Wang, Daniel Keysers, Basil Mustafa, Alexander Kolesnikov, Andreas Steiner 15 Nov 2021 7,793 LAVIS: A Library for Language-Vision Intelligence Hung Le, Steven C. H. Hoi, Silvio Savarese, Dongxu Li, Junnan Li, Guangsen Wang 15 Sep 2022 6,410 mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections Songfang Huang, Luo Si, Guohai Xu, Ming Yan, Bin Bi, Fei Huang, Zheng Cao, Ji Zhang, Jingren Zhou, Wei Wang, Chenliang Li, Junfeng Tian, Hehong Chen, Haiyang Xu, Jiabo Ye 24 May 2022 1,812 Detecting Twenty-thousand Classes using Image-level Supervision Xingyi Zhou, Ishan Misra, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl 7 Jan 2022 1,557 Fusion of Detected Objects in Text for Visual Question Answering David Reitter, Jeffrey Ling, Chris Alberti, Michael Collins 14 Aug 2019 1,486 * Previous * 1 * 2 * 3 * 4 * 5 * … * 28 * Next Showing 1 to 10 of 272 papers DATASET LOADERS EDIT ADD REMOVE -------------------------------------------------------------------------------- huggingface/datasets 16,991 google-research-datasets/conceptual-captions 445 TASKS EDIT -------------------------------------------------------------------------------- * Question Answering * Visual Question Answering (VQA) * Image Captioning SIMILAR DATASETS -------------------------------------------------------------------------------- VCR VISUAL GENOME FLICKR30K COCO CAPTIONS Source: Sharma et al. USAGE -------------------------------------------------------------------------------- Created with Highcharts 9.3.0Number of Papers201920202021202220230100255075Conceptual CaptionsVCRVisual GenomeFlickr30k LICENSE EDIT -------------------------------------------------------------------------------- * Custom MODALITIES EDIT -------------------------------------------------------------------------------- * Images * Texts LANGUAGES EDIT -------------------------------------------------------------------------------- * English Contact us on: hello@paperswithcode.com . Papers With Code is a free resource with all data licensed under CC-BY-SA. Terms Data policy Cookies policy from