nokogiri.org Open in urlscan Pro
185.199.110.153  Public Scan

Submitted URL: http://nokogiri.org/
Effective URL: https://nokogiri.org/
Submission: On August 26 via manual from JP

Form analysis 2 forms found in the DOM

Name: search

<form class="md-search__form" name="search">
  <input type="text" class="md-search__input" name="query" aria-label="Search" placeholder="Search" autocapitalize="off" autocorrect="off" autocomplete="off" spellcheck="false" data-md-component="search-query" required="">
  <label class="md-search__icon md-icon" for="__search">
    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
      <path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5z"></path>
    </svg>
    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
      <path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12z"></path>
    </svg>
  </label>
  <nav class="md-search__options" aria-label="Search">
    <button type="reset" class="md-search__icon md-icon" aria-label="Clear" tabindex="-1">
      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
        <path d="M19 6.41 17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12 19 6.41z"></path>
      </svg>
    </button>
  </nav>
</form>

Name: consent

<form class="md-consent__form md-grid md-typeset" name="consent">
  <h4>Cookie consent</h4>
  <p>This site uses cookies to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better. </p>
  <input type="checkbox" class="md-toggle" id="__settings">
  <div class="md-consent__settings">
    <ul class="task-list">
      <li class="task-list-item">
        <label class="task-list-control">
          <input type="checkbox" name="analytics" checked="">
          <span class="task-list-indicator"></span> Google Analytics <label>
          </label></label>
      </li>
    </ul>
  </div>
  <div class="md-consent__controls">
    <label class="md-button" for="__settings">Manage settings</label>
    <button class="md-button md-button--primary">Accept</button>
  </div>
</form>

Text Content

Skip to content

Nokogiri
Overview
Type to start searching
sparklemotion/nokogiri
 * v1.12.3
 * 5.7k
 * 820

 * Overview
 * API
 * Support
 * Tutorials
 * Changelog
 * About

Nokogiri
sparklemotion/nokogiri
 * v1.12.3
 * 5.7k
 * 820

 * Overview Overview
   Table of contents
    * Guiding Principles
    * Features Overview
    * Status
    * Support, Getting Help, and Reporting Issues
       * Reading
       * Ask For Help
       * Report A Bug
       * Security and Vulnerability Reporting
       * Semantic Versioning Policy
   
    * Installation
       * Native Gems: Faster, more reliable installation
       * Supported Platforms
       * Other Installation Options
   
    * How To Use Nokogiri
       * Parsing and Querying
       * Encoding
   
    * Technical Overview
       * Guiding Principles
       * CRuby
       * JRuby
   
    * Contributing
    * Code of Conduct
    * License
       * Dependencies
   
    * Authors

 * API
 * Support
   Support
    * Installing Nokogiri
    * Nokogiri for Enterprise
    * Getting Help
    * Security
    * More Resources

 * Tutorials
   Tutorials
    * Table of Contents
    * Parsing an HTML/XML document
    * Parsing an HTML5 document
    * Searching a XML/HTML document
    * Modifying an HTML/XML document
    * Ensuring well-formed markup

 * Changelog
 * About
   About
    * Contributing
    * Code of Conduct
    * Security
    * License
    * Dependencies
    * Roadmap
    * Privacy Policy

Table of contents
 * Guiding Principles
 * Features Overview
 * Status
 * Support, Getting Help, and Reporting Issues
    * Reading
    * Ask For Help
    * Report A Bug
    * Security and Vulnerability Reporting
    * Semantic Versioning Policy

 * Installation
    * Native Gems: Faster, more reliable installation
    * Supported Platforms
    * Other Installation Options

 * How To Use Nokogiri
    * Parsing and Querying
    * Encoding

 * Technical Overview
    * Guiding Principles
    * CRuby
    * JRuby

 * Contributing
 * Code of Conduct
 * License
    * Dependencies

 * Authors

Get support for Nokogiri with a Tidelift subscription




NOKOGIRI¶

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It
provides a sensible, easy-to-understand API for reading, writing, modifying, and
querying documents. It is fast and standards-compliant by relying on native
parsers like libxml2 (C) and xerces (Java).


GUIDING PRINCIPLES¶

Some guiding principles Nokogiri tries to follow:

 * be secure-by-default by treating all documents as untrusted by default
 * be a thin-as-reasonable layer on top of the underlying parsers, and don't
   attempt to fix behavioral differences between the parsers


FEATURES OVERVIEW¶

 * DOM Parser for XML, HTML4, and HTML5
 * SAX Parser for XML and HTML4
 * Push Parser for XML and HTML4
 * Document search via XPath 1.0
 * Document search via CSS3 selectors, with some jquery-like extensions
 * XSD Schema validation
 * XSLT transformation
 * "Builder" DSL for XML and HTML documents


STATUS¶






SUPPORT, GETTING HELP, AND REPORTING ISSUES¶

All official documentation is posted at https://nokogiri.org (the source for
which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome
contributions).

Consider subscribing to Tidelift which provides license assurances and timely
security notifications for your open source dependencies, including Nokogiri.
Tidelift subscriptions also help the Nokogiri maintainers fund our automated
testing which in turn allows us to ship releases, bugfixes, and security updates
more often.


READING¶

Your first stops for learning more about Nokogiri should be:

 * API Documentation
 * Tutorials
 * An excellent community-maintained Cheat Sheet


ASK FOR HELP¶

There are a few ways to ask exploratory questions:

 * The Ruby Discord chat server is active at https://discord.gg/UyQnKrT
 * The Nokogiri mailing list is active at
   https://groups.google.com/group/nokogiri-talk
 * Open an issue using the "Help Request" template at
   https://github.com/sparklemotion/nokogiri/issues

Please do not mail the maintainers at their personal addresses.


REPORT A BUG¶

The Nokogiri bug tracker is at https://github.com/sparklemotion/nokogiri/issues

Please use the "Bug Report" or "Installation Difficulties" templates.


SECURITY AND VULNERABILITY REPORTING¶

Please report vulnerabilities at https://hackerone.com/nokogiri

Full information and description of our security policy is in SECURITY.md


SEMANTIC VERSIONING POLICY¶

Nokogiri follows Semantic Versioning (since 2017 or so).

We bump Major.Minor.Patch versions following this guidance:

Major: (we've never done this)

 * Significant backwards-incompatible changes to the public API that would
   require rewriting existing application code.
 * Some examples of backwards-incompatible changes we might someday consider for
   a Major release are at ROADMAP.md.

Minor:

 * Features and bugfixes.
 * Updating packaged libraries for non-security-related reasons.
 * Dropping support for EOLed Ruby versions. Some folks find this objectionable,
   but SemVer says this is OK if the public API hasn't changed.
 * Backwards-incompatible changes to internal or private methods and constants.
   These are detailed in the "Changes" section of each changelog entry.

Patch:

 * Bugfixes.
 * Security updates.
 * Updating packaged libraries for security-related reasons.


INSTALLATION¶

Requirements:

 * Ruby >= 2.5
 * JRuby >= 9.2.0.0


NATIVE GEMS: FASTER, MORE RELIABLE INSTALLATION¶

"Native gems" contain pre-compiled libraries for a specific machine
architecture. On supported platforms, this removes the need for compiling the C
extension and the packaged libraries, or for system dependencies to exist. This
results in much faster installation and more reliable installation, which as you
probably know are the biggest headaches for Nokogiri users.


SUPPORTED PLATFORMS¶

As of v1.11.0, Nokogiri ships pre-compiled, "native" gems for the following
platforms:

 * Linux: x86-linux and x86_64-linux (req: glibc >= 2.17), including musl
   platforms like Alpine
 * Darwin/MacOS: x86_64-darwin and arm64-darwin
 * Windows: x86-mingw32 and x64-mingw32
 * Java: any platform running JRuby 9.2 or higher

To determine whether your system supports one of these gems, look at the output
of bundle platform or ruby -e 'puts Gem::Platform.local.to_s'.

If you're on a supported platform, either gem install or bundle install should
install a native gem without any additional action on your part. This
installation should only take a few seconds, and your output should look
something like:

$ gem install nokogiri
Fetching nokogiri-1.11.0-x86_64-linux.gem
Successfully installed nokogiri-1.11.0-x86_64-linux
1 gem installed



OTHER INSTALLATION OPTIONS¶

Because Nokogiri is a C extension, it requires that you have a C compiler
toolchain, Ruby development header files, and some system dependencies
installed.

The following may work for you if you have an appropriately-configured system:

gem install nokogiri


If you have any issues, please visit Installing Nokogiri for more complete
instructions and troubleshooting.


HOW TO USE NOKOGIRI¶

Nokogiri is a large library, and so it's challenging to briefly summarize it.
We've tried to provide long, real-world examples at Tutorials.


PARSING AND QUERYING¶

Here is example usage for parsing and querying a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end



ENCODING¶

Strings are always stored as UTF-8 internally. Methods that return text values
will always return UTF-8 encoded strings. Methods that return a string
containing markup (like to_xml, to_html and inner_html) will return a string
encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different one. In these
cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any
particular set of bytes could be valid characters in multiple encodings, so
detecting encoding with 100% accuracy is not possible. libxml2 does its best,
but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is
to explicitly set the encoding. Here is an example of explicitly setting the
encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')



TECHNICAL OVERVIEW¶


GUIDING PRINCIPLES¶

As noted above, two guiding principles of the software are:

 * be secure-by-default by treating all documents as untrusted by default
 * be a thin-as-reasonable layer on top of the underlying parsers, and don't
   attempt to fix behavioral differences between the parsers

Notably, despite all parsers being standards-compliant, there are behavioral
inconsistencies between the parsers used in the CRuby and JRuby implementations,
and Nokogiri does not and should not attempt to remove these inconsistencies.
Instead, we surface these differences in the test suite when they are
important/semantic; or we intentionally write tests to depend only on the
important/semantic bits (omitting whitespace from regex matchers on results, for
example).


CRUBY¶

The Ruby (a.k.a., CRuby, MRI, YARV) implementation is a C extension that depends
on libxml2 and libxslt (which in turn depend on zlib and possibly libiconv).

These dependencies are met by default by Nokogiri's packaged versions of the
libxml2 and libxslt source code, but a configuration option
--use-system-libraries is provided to allow specification of alternative library
locations. See Installing Nokogiri for full documentation.

We provide native gems by pre-compiling libxml2 and libxslt (and potentially
zlib and libiconv) and packaging them into the gem file. In this case, no
compilation is necessary at installation time, which leads to faster and more
reliable installation.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are
provided in which native and source gems.


JRUBY¶

The Java (a.k.a. JRuby) implementation is a Java extension that depends
primarily on Xerces and NekoHTML for parsing, though additional dependencies are
on isorelax, nekodtd, jing, serializer, xalan-j, and xml-apis.

These dependencies are provided by pre-compiled jar files packaged in the java
platform gem.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are
provided in which native and source gems.


CONTRIBUTING¶

See CONTRIBUTING.md for an intro guide to developing Nokogiri.


CODE OF CONDUCT¶

We've adopted the Contributor Covenant code of conduct, which you can read in
full in CODE_OF_CONDUCT.md.


LICENSE¶

This project is licensed under the terms of the MIT license.

See this license at LICENSE.md.


DEPENDENCIES¶

Some additional libraries may be distributed with your version of Nokogiri.
Please see LICENSE-DEPENDENCIES.md for a discussion of the variations as well as
the licenses thereof.


AUTHORS¶

 * Mike Dalessio
 * Aaron Patterson
 * Yoko Harada
 * Akinori MUSHA
 * John Shahid
 * Karol Bucek
 * Sam Ruby
 * Craig Barnes
 * Stephen Checkoway
 * Lars Kanis
 * Sergio Arbeo
 * Timothy Elliott
 * Nobuyoshi Nakada

Back to top
Next API

Copyright Ⓒ 2009-2021 Mike Dalessio



COOKIE CONSENT

This site uses cookies to measure the effectiveness of our documentation and
whether users find what they're searching for. With your consent, you're helping
us to make our documentation better.

 * Google Analytics

Manage settings Accept