thehftguy.com
Open in
urlscan Pro
192.0.78.25
Public Scan
URL:
https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-2...
Submission: On November 16 via manual from US — Scanned from DE
Submission: On November 16 via manual from US — Scanned from DE
Form analysis
4 forms found in the DOMPOST https://thehftguy.com/wp-comments-post.php
<form action="https://thehftguy.com/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate="">
<div id="comment-form__verbum" class="light"></div>
<div class="verbum-form-meta"><input type="hidden" name="comment_post_ID" value="16443" id="comment_post_ID">
<input type="hidden" name="comment_parent" id="comment_parent" value="0">
<input type="hidden" name="highlander_comment_nonce" id="highlander_comment_nonce" value="839e2c7c38">
</div>
<p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="60df3bb8c3"></p>
<p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="1700174478858">
<script>
document.getElementById("ak_js_1").setAttribute("value", (new Date()).getTime());
</script>
</p>
</form>
POST https://subscribe.wordpress.com
<form action="https://subscribe.wordpress.com" method="post" accept-charset="utf-8" data-blog="108084023" data-post_access_level="everybody" id="subscribe-blog">
<p>Follow this blog. Get news about the cloud and the latest devops tools.</p>
<p id="subscribe-email">
<label id="subscribe-field-label" for="subscribe-field" class="screen-reader-text"> Email Address: </label>
<input type="email" name="email" style="width: 95%; padding: 1px 10px" placeholder="Email Address" value="" id="subscribe-field" required="">
</p>
<p id="subscribe-submit">
<input type="hidden" name="action" value="subscribe">
<input type="hidden" name="blog_id" value="108084023">
<input type="hidden" name="source" value="https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-20-years/">
<input type="hidden" name="sub-type" value="widget">
<input type="hidden" name="redirect_fragment" value="subscribe-blog">
<input type="hidden" id="_wpnonce" name="_wpnonce" value="37f106c06c"> <button type="submit" class="wp-block-button__link"> Follow </button>
</p>
</form>
POST https://subscribe.wordpress.com
<form method="post" action="https://subscribe.wordpress.com" accept-charset="utf-8" style="display: none;">
<div class="actnbr-follow-count">Join 539 other followers</div>
<div>
<input type="email" name="email" placeholder="Enter your email address" class="actnbr-email-field" aria-label="Enter your email address">
</div>
<input type="hidden" name="action" value="subscribe">
<input type="hidden" name="blog_id" value="108084023">
<input type="hidden" name="source" value="https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-20-years/">
<input type="hidden" name="sub-type" value="actionbar-follow">
<input type="hidden" id="_wpnonce" name="_wpnonce" value="37f106c06c">
<div class="actnbr-button-wrap">
<button type="submit" value="Sign me up"> Sign me up </button>
</div>
</form>
<form id="jp-carousel-comment-form">
<label for="jp-carousel-comment-form-comment-field" class="screen-reader-text">Write a Comment...</label>
<textarea name="comment" class="jp-carousel-comment-form-field jp-carousel-comment-form-textarea" id="jp-carousel-comment-form-comment-field" placeholder="Write a Comment..."></textarea>
<div id="jp-carousel-comment-form-submit-and-info-wrapper">
<div id="jp-carousel-comment-form-commenting-as">
<fieldset>
<label for="jp-carousel-comment-form-email-field">Email (Required)</label>
<input type="text" name="email" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-email-field">
</fieldset>
<fieldset>
<label for="jp-carousel-comment-form-author-field">Name (Required)</label>
<input type="text" name="author" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-author-field">
</fieldset>
<fieldset>
<label for="jp-carousel-comment-form-url-field">Website</label>
<input type="text" name="url" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-url-field">
</fieldset>
</div>
<input type="submit" name="submit" class="jp-carousel-comment-form-button" id="jp-carousel-comment-form-button-submit" value="Post Comment">
</div>
</form>
Text Content
Skip to content Open Menu * Home * About Me * Contact * Privacy Policy THE HFT GUY A DEVELOPER IN LONDON tech THE LINUX KERNEL SCHEDULER HAS BEEN ACCIDENTALLY HARDCODED TO A MAXIMUM OF 8 CORES FOR THE PAST 15 YEARS AND NOBODY NOTICED 14 November 202315 November 2023 thehftguy9 Comments i 1279 Votes TL;DR This doesn’t mean that the scheduler can’t use more than 8 cores. The scheduler controls how to allocate tasks to available cores. How to schedule particular workloads efficiently on available hardware is a complex problem. There are settings and hardcoded timings to control the behavior of the scheduler, some vary with the number of cores, some don’t, unfortunately they don’t work as intended because they were capped to 8 cores. This had a rationale around 2005-2010 when the latest CPUs were the core 2 duo and core 2 quad on interactive desktops and “nobody will ever get more than 8 cores”, this doesn’t hold as well in 2023 when the baseline is 128 cores per CPU on non-interactive servers. TL;DR; Yes, this has performance implications especially if you run compute clusters, no, your computer won’t get 20 times faster. Sorry. Either way, comments in the kernel, man pages and sysctl documentation are wrong and should be corrected, they genuinely missed for the last 15 years that there was a scaling factor at play capped to 8 cores. A BIT OF HISTORY I’ve been diving into the Linux kernel scheduler recently. To give a short brief introduction to scheduling, imagine a single CPU single core system. The operating system allocates time slices of a few milliseconds to run applications. If every application can get a few milliseconds now and then, the system feels interactive, to the user it feels like the computer is running hundreds of tasks while the computer is only running 1 single task at any time. Then multi core systems became affordable for consumers in the early 2000s, most computers got 2 or 4 cores and the operating system could run multiple tasks in parallel for real. That changed things a bit but not too much, server could already have multi 2 or 4 physical CPUs in a server. The number of cores increased over time and we’ve reached a state in early 2020s where the baseline is AMD servers with 128 cores per CPU. For the anecdote, historically on windows the period was around 16 milliseconds, it had a funny side effect where an application doing a sleep(1ms) would resume around 16 milliseconds later. LINUX SCHEDULER Back to Linux, short and simplified version. On Linux the kernel scheduler works in periods of 5 milliseconds (or 20 milliseconds on multicore systems) to ensure that every application has a chance to run in the next 5 milliseconds. A thread that takes more than 5 milliseconds is supposed to be pre-empted. It’s important for end user latency. In layman terms, when you move you mouse around AND the mouse is moving on the screen at least every few milliseconds, the computer feels responsive and it’s great. That’s why the Linux kernel was hardcoded to 5 milliseconds forever ago, to give a good experience to interactive desktop users. It’s intuitive enough that the behavior of the scheduler had to change when CPU got multiple cores. There is no need to aggressively interrupt tasks all the time to run something else, when there are multiple CPUs to work with. It’s a balancing act. You don’t want to reschedule tasks every few milliseconds to another core, because it’s stopping the task and causing context switches and breaking caches, making all tasks slower, but you have to keep things responsive, especially if it’s a desktop with an end user, is it a desktop though? The scheduler can be tuned with sysctl settings. There are many settings available, see this article for an intro but know that all scaling numbers are wrong since the kernel was accidentally hardcoded to 8 cores https://dev.to/satorutakeuchi/the-linux-s-sysctl-parameters-about-process-scheduler-1dh5 * sched_latency_ns * sched_min_granularity_ns (renamed to base_slice in kernel v6) * wakeup_granularity_ns COMMITS This commit added automated scaling of scheduler settings with the number of cores in 2009. Advertisement Privacy Settings It was accidentally hardcoded to a maximum of 8 cores. Oops. The magic value comes from an older commit and remained as it was. Please refer to the full diff on GitHub. https://github.com/torvalds/linux/commit/acb4a848da821a095ae9e4d8b22ae2d9633ba5cd An interesting setting is the minimum granularity, this is supposed to allow tasks to run for a minimum amount of 0.75 milliseconds (or 3 ms in multicore systems) when the system is overloaded (there are more tasks to run in the period than there are CPU available). The min_granularity setting was renamed to base_slice in this commit in v6 kernel. The comment says it scales with CPU count and the comment is incorrect. I wonder whether kernel developers are aware of that mistake as they are rewriting the scheduler! * Official comments in the code says it’s scaling with log2(1+cores) but it doesn’t. * All the comments in the code are incorrect. * Official documentation and man pages are incorrect. * Every blog article, stack overflow answer and guide ever published about the scheduler is incorrect. https://github.com/torvalds/linux/commit/e4ec3318a17f5dcf11bc23b2d2c1da4c1c5bb507 Below is the function that does the scaling. This is the code where it’s been hardcoded to a maximum of 8 cores. The maximum scaling factor is 4 on multi core systems: 1+log2(min(cores, 8)). The default is to use log scaling, you can adjust kernel settings to use linear scaling instead but it’s broken too, capped to 8 cores. https://github.com/torvalds/linux/blame/master/kernel/sched/fair.c#L198 Recent AMD desktop have 32 threads, recent AMD servers have 128 threads per CPU and often have multiple physical CPUs. It’s problematic that the kernel was hardcoded to a maximum of 8 cores (scaling factor of 4). It can’t be good to reschedule hundreds of tasks every few milliseconds, maybe on a different core, maybe on a different die. It can’t be good for performance and cache locality. To conclude, the kernel has been accidentally hardcoded to 8 cores for the past 15 years and nobody noticed. Oops. ¯\_(ツ)_/¯ Linux kernel scheduler meets 128 cores AMD CPUs SHARE THIS ARTICLE: * Click to share on Twitter (Opens in new window) * Click to share on Reddit (Opens in new window) * Click to share on Facebook (Opens in new window) * Click to share on LinkedIn (Opens in new window) * Click to share on Hacker News (Opens in new window) * Click to email a link to a friend (Opens in new window) * More * * 1Click to share on Pinterest (Opens in new window)1 * Click to share on Pocket (Opens in new window) * Click to share on Telegram (Opens in new window) * Click to share on WhatsApp (Opens in new window) * Click to share on Tumblr (Opens in new window) * Click to print (Opens in new window) * LIKE THIS: Like Loading... WHAT AES CIPHERS TO USE BETWEEN CBC, GCM, CCM, CHACHA-POLY? TL;DR If you only have 5 seconds to pick only one, go with AES-GCM. Most systems/libraries do both AES-GCM and ChaCha20-Poly1305 out-of-the-box. AES-GCM (Galois Counter Mode) The most widely used block cipher worldwide.Mandatory as of TLS 1.2 (2008) and used by default by most clients.RFC 5288 year 2008 https://tools.ietf.org/html/rfc5288 ChaCha20-Poly1305… 20 April 2020 In "tech" GOOGLE CLOUD IS 50% CHEAPER THAN AWS Let's revisit Google and Amazon pricing since the AWS November 2016 Price Reduction. We'll analyse instance costs, for various workloads and usages. All prices are given in dollars per month (720 hours) for servers located in Europe (eu-west-1). Shared CPU Instances Shared CPU instances give only a bit of CPU.… 18 November 2016 In "tech" MAJOR BUG IN GLIBC IS KILLING APPLICATIONS WITH A MEMORY LIMIT This is the story of a debugging case with a happy ending. TL;DR This is a bug in the glibc malloc(). It mainly affects 64 bits multi-threading applications. With a special mention to Java applications because the JVM seems to always trigger the worst case. The write up is quite… 21 May 2020 In "tech" bug, cpu, linux kernel, scheduler POST NAVIGATION Previous Post Why are there no antitrust claims vs GitHub Copilot, when there is a precedent? 9 THOUGHTS ON “THE LINUX KERNEL SCHEDULER HAS BEEN ACCIDENTALLY HARDCODED TO A MAXIMUM OF 8 CORES FOR THE PAST 15 YEARS AND NOBODY NOTICED” 1. Colin King says: 14 November 2023 at 12:50 I’ve filed a bug upstream to track this https://bugzilla.kernel.org/show_bug.cgi?id=218144 LikeLike Reply 2. Morten Fjord Christensen says: 15 November 2023 at 06:58 The comments in the kernel are misleading or straight up wrong, but the code isn’t. The time slices are limited to scale with up to 8 cores, to not get huge time slices for tons of cores, but the scheduler itself can handle as many cores as the system has. LikeLike Reply 3. technicalparadox says: 15 November 2023 at 07:31 The 8 core limit only applies to scaling of the time slices. LikeLike Reply 4. Paul Maidment says: 15 November 2023 at 08:03 Quite a find. Have you opened a bug, got a link to the bug? LikeLike Reply 5. Kris says: 15 November 2023 at 11:15 Your headline does not correctly describe the behavior you describe. Which is a shame because it may be an important finding. But then “Linux Scheduler Optimized For 8 Cores” doesn’t sell as well. LikeLike Reply 6. Zokkiretru says: 15 November 2023 at 15:15 Whoa, this is surprising! Have you tried removing the limit and benchmarking it? I am curious how much performance we are leaving on the table from that bug, but I don’t have any high core count machines. LikeLike Reply 7. bmeneg says: 15 November 2023 at 16:03 This code is related only to the granularity time, increasing the number of CPUs would only allow tasks to run longer in a certain core, which is not the result you might want, since other tasks also want to run in that same core and would be stalled for more time. As you well pointed, is an act of balancing things, and having 8 as maximum number of CPUs for that specific calculation seems to still be a fine choice. Maybe a new benchmark would suggest another ceiling? Maybe. But that’s definitely not related to how many cores the scheduler see/use/consider/… Check what Peter Zijlstra replied to this exact subject a few years ago: https://lore.kernel.org/all/20211102160402.GX174703@worktop.programming.kicks-ass.net/ LikeLike Reply 8. auri0 says: 15 November 2023 at 16:17 Very interesting article! Thanks for sharing. This may be unrelated to your discovery, however on 23rd June 2021 I noticed that Linux (Ubuntu) multiprocessing times per number of CPUs enabled posed an anomaly … whereas Windows acted as (I) expected. My (rather amateur) finding is posted on github here: https://github.com/skyfielders/python-skyfield/issues/612 My complaint about Linux is expressed near the following quoted text: “Windows processing for a Nautical Almanac would then consume a maximum of 2 x 9 = 18 seconds. True! Windows processing of Event Time Tables would then consume a maximum of 2 x 16.5 seconds. True! But if you look at the Ubuntu times, this is NOT the case… which implies that something else is going on… interrupting the supposedly parallel processes. Very strange.” LikeLike Reply 9. THRWY says: 15 November 2023 at 17:28 As discussed on the Hacker News thread for this article[0], the title and conclusion of this article are incorrect and seem to be based on a misunderstanding. What is capped to 8 is a scaling factor for the interval at which the scheduler switches tasks on any given core. Not the amount of cores the system can actually work with. As correctly stated in this article, the fewer cores you have the quicker those cores need to switch between tasks to maintain fluid, interactive “feel” for the user and to maintain performance across multiple running programs. However, as the core count increases from 8 to 16 to 128 and more, as some modern servers may have, you actually don’t want to continue increasing that interval factor – you don’t want individual cores bound to a single task for whole seconds! Especially when those machines are under high load, that would probably lead to reduced performance. In fact, the comment currently visible in the last screenshot used in this article states “but the relationship is not linear, so pick a second-best guess”. [0] https://news.ycombinator.com/item?id=38260935 LikeLiked by 1 person Reply LEAVE A REPLY CANCEL REPLY Δ MOST POPULAR * The Linux kernel scheduler has been accidentally hardcoded to a maximum of 8 cores for the past 15 years and nobody noticed * Docker in Production: A History of Failure * Google Cloud is 50% cheaper than AWS * Cracking the HackerRank Test: 100% score made easy * Perl is dying quick. Could be extinct by 2023. * Why are there no antitrust claims vs GitHub Copilot, when there is a precedent? * What AES ciphers to use between CBC, GCM, CCM, Chacha-Poly? RECOMMENDED BOOKS Docker in Practice: This book is a gem to help you understand Docker and use it effectively. NEWSLETTER Follow this blog. Get news about the cloud and the latest devops tools. Email Address: Follow NEWEST ARTICLES * The Linux kernel scheduler has been accidentally hardcoded to a maximum of 8 cores for the past 15 years and nobody noticed * Why are there no antitrust claims vs GitHub Copilot, when there is a precedent? * The CostCo UK Website Never Worked in the UK * White House estimates coin mining in Texas will require 47 new nuclear reactors, over the next decade. * French Appeal Court affirms decision that copyright claims on GPL are invalid; must be enforced via contractual dispute SITE STATISTICS * 2,267,805 views Back to top * Comment * Follow Following * The HFT Guy Join 539 other followers Sign me up * Already have a WordPress.com account? Log in now. * Privacy * * The HFT Guy * Customize * Follow Following * Sign up * Log in * Copy shortlink * Report this content * View post in Reader * Manage subscriptions * Collapse this bar Loading Comments... Write a Comment... Email (Required) Name (Required) Website %d bloggers like this: