thehftguy.com Open in urlscan Pro
192.0.78.25  Public Scan

URL: https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-2...
Submission: On November 16 via manual from US — Scanned from DE

Form analysis 4 forms found in the DOM

POST https://thehftguy.com/wp-comments-post.php

<form action="https://thehftguy.com/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate="">
  <div id="comment-form__verbum" class="light"></div>
  <div class="verbum-form-meta"><input type="hidden" name="comment_post_ID" value="16443" id="comment_post_ID">
    <input type="hidden" name="comment_parent" id="comment_parent" value="0">
    <input type="hidden" name="highlander_comment_nonce" id="highlander_comment_nonce" value="839e2c7c38">
  </div>
  <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="60df3bb8c3"></p>
  <p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="1700174478858">
    <script>
      document.getElementById("ak_js_1").setAttribute("value", (new Date()).getTime());
    </script>
  </p>
</form>

POST https://subscribe.wordpress.com

<form action="https://subscribe.wordpress.com" method="post" accept-charset="utf-8" data-blog="108084023" data-post_access_level="everybody" id="subscribe-blog">
  <p>Follow this blog. Get news about the cloud and the latest devops tools.</p>
  <p id="subscribe-email">
    <label id="subscribe-field-label" for="subscribe-field" class="screen-reader-text"> Email Address: </label>
    <input type="email" name="email" style="width: 95%; padding: 1px 10px" placeholder="Email Address" value="" id="subscribe-field" required="">
  </p>
  <p id="subscribe-submit">
    <input type="hidden" name="action" value="subscribe">
    <input type="hidden" name="blog_id" value="108084023">
    <input type="hidden" name="source" value="https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-20-years/">
    <input type="hidden" name="sub-type" value="widget">
    <input type="hidden" name="redirect_fragment" value="subscribe-blog">
    <input type="hidden" id="_wpnonce" name="_wpnonce" value="37f106c06c"> <button type="submit" class="wp-block-button__link"> Follow </button>
  </p>
</form>

POST https://subscribe.wordpress.com

<form method="post" action="https://subscribe.wordpress.com" accept-charset="utf-8" style="display: none;">
  <div class="actnbr-follow-count">Join 539 other followers</div>
  <div>
    <input type="email" name="email" placeholder="Enter your email address" class="actnbr-email-field" aria-label="Enter your email address">
  </div>
  <input type="hidden" name="action" value="subscribe">
  <input type="hidden" name="blog_id" value="108084023">
  <input type="hidden" name="source" value="https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-20-years/">
  <input type="hidden" name="sub-type" value="actionbar-follow">
  <input type="hidden" id="_wpnonce" name="_wpnonce" value="37f106c06c">
  <div class="actnbr-button-wrap">
    <button type="submit" value="Sign me up"> Sign me up </button>
  </div>
</form>

<form id="jp-carousel-comment-form">
  <label for="jp-carousel-comment-form-comment-field" class="screen-reader-text">Write a Comment...</label>
  <textarea name="comment" class="jp-carousel-comment-form-field jp-carousel-comment-form-textarea" id="jp-carousel-comment-form-comment-field" placeholder="Write a Comment..."></textarea>
  <div id="jp-carousel-comment-form-submit-and-info-wrapper">
    <div id="jp-carousel-comment-form-commenting-as">
      <fieldset>
        <label for="jp-carousel-comment-form-email-field">Email (Required)</label>
        <input type="text" name="email" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-email-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-author-field">Name (Required)</label>
        <input type="text" name="author" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-author-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-url-field">Website</label>
        <input type="text" name="url" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-url-field">
      </fieldset>
    </div>
    <input type="submit" name="submit" class="jp-carousel-comment-form-button" id="jp-carousel-comment-form-button-submit" value="Post Comment">
  </div>
</form>

Text Content

Skip to content
Open Menu
 * Home
 * About Me
 * Contact
 * Privacy Policy




THE HFT GUY


A DEVELOPER IN LONDON

tech


THE LINUX KERNEL SCHEDULER HAS BEEN ACCIDENTALLY HARDCODED TO A MAXIMUM OF 8
CORES FOR THE PAST 15 YEARS AND NOBODY NOTICED

14 November 202315 November 2023 thehftguy9 Comments
i

1279 Votes




TL;DR This doesn’t mean that the scheduler can’t use more than 8 cores. The
scheduler controls how to allocate tasks to available cores. How to schedule
particular workloads efficiently on available hardware is a complex problem.
There are settings and hardcoded timings to control the behavior of the
scheduler, some vary with the number of cores, some don’t, unfortunately they
don’t work as intended because they were capped to 8 cores. This had a rationale
around 2005-2010 when the latest CPUs were the core 2 duo and core 2 quad on
interactive desktops and “nobody will ever get more than 8 cores”, this doesn’t
hold as well in 2023 when the baseline is 128 cores per CPU on non-interactive
servers.

TL;DR; Yes, this has performance implications especially if you run compute
clusters, no, your computer won’t get 20 times faster. Sorry. Either way,
comments in the kernel, man pages and sysctl documentation are wrong and should
be corrected, they genuinely missed for the last 15 years that there was a
scaling factor at play capped to 8 cores.


A BIT OF HISTORY

I’ve been diving into the Linux kernel scheduler recently.

To give a short brief introduction to scheduling, imagine a single CPU single
core system. The operating system allocates time slices of a few milliseconds to
run applications. If every application can get a few milliseconds now and then,
the system feels interactive, to the user it feels like the computer is running
hundreds of tasks while the computer is only running 1 single task at any time.

Then multi core systems became affordable for consumers in the early 2000s, most
computers got 2 or 4 cores and the operating system could run multiple tasks in
parallel for real. That changed things a bit but not too much, server could
already have multi 2 or 4 physical CPUs in a server.

The number of cores increased over time and we’ve reached a state in early 2020s
where the baseline is AMD servers with 128 cores per CPU.

For the anecdote, historically on windows the period was around 16 milliseconds,
it had a funny side effect where an application doing a sleep(1ms) would resume
around 16 milliseconds later.




LINUX SCHEDULER

Back to Linux, short and simplified version.

On Linux the kernel scheduler works in periods of 5 milliseconds (or 20
milliseconds on multicore systems) to ensure that every application has a chance
to run in the next 5 milliseconds. A thread that takes more than 5 milliseconds
is supposed to be pre-empted.

It’s important for end user latency. In layman terms, when you move you mouse
around AND the mouse is moving on the screen at least every few milliseconds,
the computer feels responsive and it’s great. That’s why the Linux kernel was
hardcoded to 5 milliseconds forever ago, to give a good experience to
interactive desktop users.

It’s intuitive enough that the behavior of the scheduler had to change when CPU
got multiple cores. There is no need to aggressively interrupt tasks all the
time to run something else, when there are multiple CPUs to work with.

It’s a balancing act. You don’t want to reschedule tasks every few milliseconds
to another core, because it’s stopping the task and causing context switches and
breaking caches, making all tasks slower, but you have to keep things
responsive, especially if it’s a desktop with an end user, is it a desktop
though?

The scheduler can be tuned with sysctl settings. There are many settings
available, see this article for an intro but know that all scaling numbers are
wrong since the kernel was accidentally hardcoded to 8 cores
https://dev.to/satorutakeuchi/the-linux-s-sysctl-parameters-about-process-scheduler-1dh5

 * sched_latency_ns
 * sched_min_granularity_ns (renamed to base_slice in kernel v6)
 * wakeup_granularity_ns




COMMITS

This commit added automated scaling of scheduler settings with the number of
cores in 2009.

Advertisement

Privacy Settings

It was accidentally hardcoded to a maximum of 8 cores. Oops. The magic value
comes from an older commit and remained as it was. Please refer to the full diff
on GitHub.

https://github.com/torvalds/linux/commit/acb4a848da821a095ae9e4d8b22ae2d9633ba5cd





An interesting setting is the minimum granularity, this is supposed to allow
tasks to run for a minimum amount of 0.75 milliseconds (or 3 ms in multicore
systems) when the system is overloaded (there are more tasks to run in the
period than there are CPU available).

The min_granularity setting was renamed to base_slice in this commit in v6
kernel.

The comment says it scales with CPU count and the comment is incorrect. I wonder
whether kernel developers are aware of that mistake as they are rewriting the
scheduler!

 * Official comments in the code says it’s scaling with log2(1+cores) but it
   doesn’t.
 * All the comments in the code are incorrect.
 * Official documentation and man pages are incorrect.
 * Every blog article, stack overflow answer and guide ever published about the
   scheduler is incorrect.

https://github.com/torvalds/linux/commit/e4ec3318a17f5dcf11bc23b2d2c1da4c1c5bb507



Below is the function that does the scaling. This is the code where it’s been
hardcoded to a maximum of 8 cores.

The maximum scaling factor is 4 on multi core systems: 1+log2(min(cores, 8)).

The default is to use log scaling, you can adjust kernel settings to use linear
scaling instead but it’s broken too, capped to 8 cores.

https://github.com/torvalds/linux/blame/master/kernel/sched/fair.c#L198





Recent AMD desktop have 32 threads, recent AMD servers have 128 threads per CPU
and often have multiple physical CPUs.

It’s problematic that the kernel was hardcoded to a maximum of 8 cores (scaling
factor of 4). It can’t be good to reschedule hundreds of tasks every few
milliseconds, maybe on a different core, maybe on a different die. It can’t be
good for performance and cache locality.

To conclude, the kernel has been accidentally hardcoded to 8 cores for the past
15 years and nobody noticed.

Oops. ¯\_(ツ)_/¯

Linux kernel scheduler meets 128 cores AMD CPUs








SHARE THIS ARTICLE:

 * Click to share on Twitter (Opens in new window)
 * Click to share on Reddit (Opens in new window)
 * Click to share on Facebook (Opens in new window)
 * Click to share on LinkedIn (Opens in new window)
 * Click to share on Hacker News (Opens in new window)
 * Click to email a link to a friend (Opens in new window)
 * More
 * 

 * 1Click to share on Pinterest (Opens in new window)1
 * Click to share on Pocket (Opens in new window)
 * Click to share on Telegram (Opens in new window)
 * Click to share on WhatsApp (Opens in new window)
 * Click to share on Tumblr (Opens in new window)
 * Click to print (Opens in new window)
 * 


LIKE THIS:

Like Loading...

WHAT AES CIPHERS TO USE BETWEEN CBC, GCM, CCM, CHACHA-POLY?

TL;DR If you only have 5 seconds to pick only one, go with AES-GCM. Most
systems/libraries do both AES-GCM and ChaCha20-Poly1305 out-of-the-box. AES-GCM
(Galois Counter Mode) The most widely used block cipher worldwide.Mandatory as
of TLS 1.2 (2008) and used by default by most clients.RFC 5288 year 2008
https://tools.ietf.org/html/rfc5288 ChaCha20-Poly1305…

20 April 2020

In "tech"

GOOGLE CLOUD IS 50% CHEAPER THAN AWS

Let's revisit Google and Amazon pricing since the AWS November 2016 Price
Reduction. We'll analyse instance costs, for various workloads and usages. All
prices are given in dollars per month (720 hours) for servers located in Europe
(eu-west-1). Shared CPU Instances Shared CPU instances give only a bit of CPU.…

18 November 2016

In "tech"

MAJOR BUG IN GLIBC IS KILLING APPLICATIONS WITH A MEMORY LIMIT

This is the story of a debugging case with a happy ending. TL;DR This is a bug
in the glibc malloc(). It mainly affects 64 bits multi-threading applications.
With a special mention to Java applications because the JVM seems to always
trigger the worst case. The write up is quite…

21 May 2020

In "tech"

bug, cpu, linux kernel, scheduler


POST NAVIGATION

Previous Post
Why are there no antitrust claims vs GitHub Copilot, when there is a precedent?


9 THOUGHTS ON “THE LINUX KERNEL SCHEDULER HAS BEEN ACCIDENTALLY HARDCODED TO A
MAXIMUM OF 8 CORES FOR THE PAST 15 YEARS AND NOBODY NOTICED”

 1. Colin King says:
    14 November 2023 at 12:50
    
    I’ve filed a bug upstream to track this
    https://bugzilla.kernel.org/show_bug.cgi?id=218144
    
    LikeLike
    
    Reply
    
 2. Morten Fjord Christensen says:
    15 November 2023 at 06:58
    
    The comments in the kernel are misleading or straight up wrong, but the code
    isn’t. The time slices are limited to scale with up to 8 cores, to not get
    huge time slices for tons of cores, but the scheduler itself can handle as
    many cores as the system has.
    
    LikeLike
    
    Reply
    
 3. technicalparadox says:
    15 November 2023 at 07:31
    
    The 8 core limit only applies to scaling of the time slices.
    
    LikeLike
    
    Reply
    
 4. Paul Maidment says:
    15 November 2023 at 08:03
    
    Quite a find. Have you opened a bug, got a link to the bug?
    
    LikeLike
    
    Reply
    
 5. Kris says:
    15 November 2023 at 11:15
    
    Your headline does not correctly describe the behavior you describe. Which
    is a shame because it may be an important finding.
    
    But then “Linux Scheduler Optimized For 8 Cores” doesn’t sell as well.
    
    LikeLike
    
    Reply
    
 6. Zokkiretru says:
    15 November 2023 at 15:15
    
    Whoa, this is surprising!
    Have you tried removing the limit and benchmarking it? I am curious how much
    performance we are leaving on the table from that bug, but I don’t have any
    high core count machines.
    
    LikeLike
    
    Reply
    
 7. bmeneg says:
    15 November 2023 at 16:03
    
    This code is related only to the granularity time, increasing the number of
    CPUs would only allow tasks to run longer in a certain core, which is not
    the result you might want, since other tasks also want to run in that same
    core and would be stalled for more time. As you well pointed, is an act of
    balancing things, and having 8 as maximum number of CPUs for that specific
    calculation seems to still be a fine choice. Maybe a new benchmark would
    suggest another ceiling? Maybe. But that’s definitely not related to how
    many cores the scheduler see/use/consider/…
    
    Check what Peter Zijlstra replied to this exact subject a few years ago:
    https://lore.kernel.org/all/20211102160402.GX174703@worktop.programming.kicks-ass.net/
    
    LikeLike
    
    Reply
    
 8. auri0 says:
    15 November 2023 at 16:17
    
    Very interesting article! Thanks for sharing. This may be unrelated to your
    discovery, however on 23rd June 2021 I noticed that Linux (Ubuntu)
    multiprocessing times per number of CPUs enabled posed an anomaly … whereas
    Windows acted as (I) expected. My (rather amateur) finding is posted on
    github here: https://github.com/skyfielders/python-skyfield/issues/612
    
    My complaint about Linux is expressed near the following quoted text:
    “Windows processing for a Nautical Almanac would then consume a maximum of 2
    x 9 = 18 seconds. True! Windows processing of Event Time Tables would then
    consume a maximum of 2 x 16.5 seconds. True! But if you look at the Ubuntu
    times, this is NOT the case… which implies that something else is going on…
    interrupting the supposedly parallel processes. Very strange.”
    
    LikeLike
    
    Reply
    
 9. THRWY says:
    15 November 2023 at 17:28
    
    As discussed on the Hacker News thread for this article[0], the title and
    conclusion of this article are incorrect and seem to be based on a
    misunderstanding.
    
    What is capped to 8 is a scaling factor for the interval at which the
    scheduler switches tasks on any given core. Not the amount of cores the
    system can actually work with.
    
    As correctly stated in this article, the fewer cores you have the quicker
    those cores need to switch between tasks to maintain fluid, interactive
    “feel” for the user and to maintain performance across multiple running
    programs. However, as the core count increases from 8 to 16 to 128 and more,
    as some modern servers may have, you actually don’t want to continue
    increasing that interval factor – you don’t want individual cores bound to a
    single task for whole seconds! Especially when those machines are under high
    load, that would probably lead to reduced performance.
    
    In fact, the comment currently visible in the last screenshot used in this
    article states “but the relationship is not linear, so pick a second-best
    guess”.
    
    [0] https://news.ycombinator.com/item?id=38260935
    
    LikeLiked by 1 person
    
    Reply
    


LEAVE A REPLY CANCEL REPLY

Δ


MOST POPULAR

 * The Linux kernel scheduler has been accidentally hardcoded to a maximum of 8
   cores for the past 15 years and nobody noticed
 * Docker in Production: A History of Failure
 * Google Cloud is 50% cheaper than AWS
 * Cracking the HackerRank Test: 100% score made easy
 * Perl is dying quick. Could be extinct by 2023.
 * Why are there no antitrust claims vs GitHub Copilot, when there is a
   precedent?
 * What AES ciphers to use between CBC, GCM, CCM, Chacha-Poly?


RECOMMENDED BOOKS



Docker in Practice: This book is a gem to help you understand Docker and use it
effectively.


NEWSLETTER

Follow this blog. Get news about the cloud and the latest devops tools.

Email Address:

Follow


NEWEST ARTICLES

 * The Linux kernel scheduler has been accidentally hardcoded to a maximum of 8
   cores for the past 15 years and nobody noticed
 * Why are there no antitrust claims vs GitHub Copilot, when there is
   a precedent?
 * The CostCo UK Website Never Worked in the UK
 * White House estimates coin mining in Texas will require 47 new nuclear
   reactors, over the next decade.
 * French Appeal Court affirms decision that copyright claims on GPL are
   invalid; must be enforced via contractual dispute


SITE STATISTICS

 * 2,267,805 views


Back to top

 * Comment
 * Follow Following
    * The HFT Guy
      Join 539 other followers
      
      Sign me up
    * Already have a WordPress.com account? Log in now.

 * Privacy
 *  * The HFT Guy
    * Customize
    * Follow Following
    * Sign up
    * Log in
    * Copy shortlink
    * Report this content
    * View post in Reader
    * Manage subscriptions
    * Collapse this bar

 

Loading Comments...

 

Write a Comment...
Email (Required) Name (Required) Website

%d bloggers like this: