www.peterbe.com Open in urlscan Pro
2a0b:4d07:102::1  Public Scan

Submitted URL: http://www.peterbe.com/
Effective URL: https://www.peterbe.com/
Submission: On October 27 via manual from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

PETERBE.COM

Peter Bengtsson's blog
 * Archive
 * About
 * Contact
 * Search


COMPARING DIFFERENT EFFORTS WITH WEBP IN SHARP


OCTOBER 5, 2023
0 COMMENTS NODE, JAVASCRIPT

When you, in a Node program, use sharp to convert an image buffer to a WebP
buffer, you have an option of effort. The higher the number the longer it takes
but the image it produces is smaller on disk.

I wanted to put some realistic numbers for this, so I wrote a benchmark, run on
my Intel MacbookPro.


THE BENCHMARK

It looks like this:


async function e6() {
  return await f("screenshot-1000.png", 6);
}
async function e5() {
  return await f("screenshot-1000.png", 5);
}
async function e4() {
  return await f("screenshot-1000.png", 4);
}
async function e3() {
  return await f("screenshot-1000.png", 3);
}
async function e2() {
  return await f("screenshot-1000.png", 2);
}
async function e1() {
  return await f("screenshot-1000.png", 1);
}
async function e0() {
  return await f("screenshot-1000.png", 0);
}

async function f(fp, effort) {
  const originalBuffer = await fs.readFile(fp);
  const image = sharp(originalBuffer);
  const { width } = await image.metadata();
  const buffer = await image.webp({ effort }).toBuffer();
  return [buffer.length, width, { effort }];
}


Then, I ran each function in serial and measured how long it took. Then, do that
whole thing 15 times. So, in total, each function is executed 15 times. The
numbers are collected and the median (P50) is reported.


A 2000X2000 PIXEL PNG IMAGE

1. e0: 191ms                   235KB
2. e1: 340.5ms                 208KB
3. e2: 369ms                   198KB
4. e3: 485.5ms                 193KB
5. e4: 587ms                   177KB
6. e5: 695.5ms                 177KB
7. e6: 4811.5ms                142KB

What it means is that if you use {effort: 6} the conversion of a 2000x2000 PNG
took 4.8 seconds but the resulting WebP buffer became 142KB instead of the least
effort which made it 235 KB.



This graph demonstrates how the (blue) time goes up the more effort you put in.
And how the final size (red) goes down the more effort you put in.


A 1000X1000 PIXEL PNG IMAGE

1. e0: 54ms                    70KB
2. e1: 60ms                    66KB
3. e2: 65ms                    61KB
4. e3: 96ms                    59KB
5. e4: 169ms                   53KB
6. e5: 193ms                   53KB
7. e6: 1466ms                  51KB


A 500X500 PIXEL PNG IMAGE

1. e0: 24ms                    23KB
2. e1: 26ms                    21KB
3. e2: 28ms                    20KB
4. e3: 37ms                    19KB
5. e4: 57ms                    18KB
6. e5: 66ms                    18KB
7. e6: 556ms                   18KB


CONCLUSION

Up to you but clearly, {effort: 6} is to be avoided if you're worried about it
taking a huge amount of time to make the conversion.

Perhaps the takeaway is; that if you run these operations in the build step such
that you don't have to ever do it again, it's worth the maximum effort. Beyond
that, find a sweet spot for your particular environment and challenge.

Please post a comment if you have thoughts or questions


ZIPPING FILES IS APPENDING BY DEFAULT - WATCH OUT!


OCTOBER 4, 2023
0 COMMENTS LINUX

This is not a bug in the age-old zip Linux program. It's maybe a bug in its
intuitiveness.

I have a piece of automation that downloads a zip file from a file storage cache
(GitHub Actions actions/cache in this case). Then, it unpacks it, and plucks
some of the files from it into another fresh new directory. Lastly, it creates a
new .zip file with the same name. The same name because that way, when the
process is done, it uploads the new .zip file into the file storage cache. But
be careful; does it really create a new .zip file?

To demonstrate the surprise:


$ cd /tmp/
$ mkdir somefiles
$ touch somefiles/file1.txt
$ touch somefiles/file2.txt
$ zip -r somefiles.zip somefiles
  adding: somefiles/ (stored 0%)
  adding: somefiles/file1.txt (stored 0%)
  adding: somefiles/file2.txt (stored 0%)


Now we have a somefiles.zip to work with. It has 2 files in it.

Next session. Let's say it's another day and a fresh new /tmp directory and the
previous somefiles.txt has been downloaded from the first session. This time we
want to create a new somefile directory but in it, only have file2.txt from
before and a new file file3.txt.


$ rm -fr somefiles
$ unzip somefiles.zip
Archive:  somefiles.zip
   creating: somefiles/
 extracting: somefiles/file1.txt
 extracting: somefiles/file2.txt
$ rm somefiles/file1.txt
$ touch somefiles/file3.txt
$ zip -r somefiles.zip somefiles
updating: somefiles/ (stored 0%)
updating: somefiles/file2.txt (stored 0%)
  adding: somefiles/file3.txt (stored 0%)


And here comes the surprise, let's peek into the newly zipped up somefiles.txt
(which was made from the somefiles/ directory which only contained file2.txt and
file3.txt):


$ rm -fr somefiles
$ unzip -l somefiles.zip
Archive:  somefiles.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2023-10-04 16:06   somefiles/
        0  2023-10-04 16:05   somefiles/file1.txt
        0  2023-10-04 16:06   somefiles/file2.txt
        0  2023-10-04 16:06   somefiles/file3.txt
---------                     -------
        0                     4 files


I did not see that coming! The command zip -r somefiles.zip somefiles/ doesn't
create a fresh new .zip file based on recursively walking the somefiles
directory. It does an append by default!

The solution is easy. Right before the zip -r somefiles.zip somefiles command,
do a rm somefiles.zip.

Please post a comment if you have thoughts or questions


INTRODUCING HYLITE - A NODE CODE-SYNTAX-TO-HTML HIGHLIGHTER WRITTEN IN BUN


OCTOBER 3, 2023
0 COMMENTS NODE, BUN, JAVASCRIPT

hylite is a command line tool for syntax highlight code into HTML. You feed it a
file or some snippet of code (plus what language it is) and it returns a string
of HTML.

Suppose you have:


❯ cat example.py
# This is example.py
def hello():
    return "world"


When you run this through hylite you get:


❯ npx hylite example.py
<span class="hljs-keyword">def</span> <span class="hljs-title function_">hello</span>():
    <span class="hljs-keyword">return</span> <span class="hljs-string">&quot;world&quot;</span>


Now, if installed with the necessary CSS, it can finally render this:


# This is example.py
def hello():
    return "world"


(Note: At the time of writing this, npx hylite --list-css or npx hylite --css
don't work unless you've git clone the github.com/peterbe/hylite repo)


HOW I USE IT

This originated because I loved how highlight.js works. It supports numerous
languages, can even guess the language, is fast as heck, and the HTML output is
compact.

Originally, my personal website, whose backend is in Python/Django, was using
Pygments to do the syntax highlighting. The problem with that is it doesn't
support JSX (or TSX). For example:


export function Bell({ color }: {color: string}) {
  return <div style={{ backgroundColor: color }}>Ding!</div>
}


The problem is that Python != Node so to call out to hylite I use a sub-process.
At the moment, I can't use bunx or npx because that depends on $PATH and stuff
that the server doesn't have. Here's how I call hylite from Python:


command = settings.HYLITE_COMMAND.split()
assert language
command.extend(["--language", language, "--wrapped"])
process = subprocess.Popen(
    command,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
    cwd=settings.HYLITE_DIRECTORY,
)
process.stdin.write(code)
output, error = process.communicate()


The settings are:


HYLITE_DIRECTORY = "/home/django/hylite"
HYLITE_COMMAND = "node dist/index.js"



HOW I BUILT HYLITE

What's different about hylite compared to other JavaScript packages and CLIs
like this is that the development requires Bun. It's lovely because it has a
built-in test runner, TypeScript transpiler, and it's just so lovely fast at
starting for anything you do with it.

In my current view, I see Bun as an equivalent of TypeScript. It's convenient
when developing but once stripped away it's just good old JavaScript and you
don't have to worry about compatibility.

So I use bun for manual testing like bun run src/index.ts < foo.go but when it
comes time to ship, I run bun run build (which executes, with bun, the
src/build.ts) which then builds a dist/index.js file which you can run with
either node or bun anywhere.

By the way, the README as a section on Benchmarking. It concludes two things:

 1. node dist/index.js has the same performance as bun run dist/index.js
 2. bunx hylite is 7x times faster than npx hylite but it's bullcrap because
    bunx doesn't check the network if there's a new version (...until you
    restart your computer)

Please post a comment if you have thoughts or questions


SHALLOW CLONE VS. DEEP CLONE, IN NODE, WITH BENCHMARK


SEPTEMBER 29, 2023
0 COMMENTS NODE, JAVASCRIPT

A very common way to create a "copy" of an Object in JavaScript is to copy all
things from one object into an empty one. Example:


const original = {foo: "Foo"}
const copy = Object.assign({}, original)
copy.foo = "Bar"
console.log([original.foo, copy.foo])


This outputs


[ 'Foo', 'Bar' ]


Obviously the problem with this is that it's a shallow copy, best demonstrated
with an example:


const original = { names: ["Peter"] }
const copy = Object.assign({}, original)
copy.names.push("Tucker")
console.log([original.names, copy.names])


This outputs:


[ [ 'Peter', 'Tucker' ], [ 'Peter', 'Tucker' ] ]


which is arguably counter-intuitive. Especially since the variable was named
"copy".
Generally, I think Object.assign({}, someThing) is often a red flag because if
not today, maybe in some future the thing you're copying might have mutables
within.

The "solution" is to use structuredClone which has been available since Node 16.
Actually, it was introduced within minor releases of Node 16, so be a little bit
careful if you're still on Node 16.

Same example:


const original = { names: ["Peter"] };
// const copy = Object.assign({}, original);
const copy = structuredClone(original);
copy.names.push("Tucker");
console.log([original.names, copy.names]);


This outputs:


[ [ 'Peter' ], [ 'Peter', 'Tucker' ] ]


Another deep copy solution is to turn the object into a string, using
JSON.stringify and turn it back into a (deeply copied) object using JSON.parse.
It works like structuredClone but full of caveats such as unpredictable
precision loss on floating point numbers, and not to mention date objects
ceasing to be date objects but instead becoming strings.


BENCHMARK

Given how much "better" structuredClone is in that it's more intuitive and
therefore less dangerous for sneaky nested mutation bugs. Is it fast? Before
even running a benchmark; no, structuredClone is slower than Object.assign({},
...) because of course. It does more! Perhaps the question should be: how much
slower is structuredClone? Here's my benchmark code:


import fs from "fs"
import assert from "assert"

import Benchmark from "benchmark"

const obj = JSON.parse(fs.readFileSync("package-lock.json", "utf8"))

function f1() {
  const copy = Object.assign({}, obj)
  copy.name = "else"
  assert(copy.name !== obj.name)
}

function f2() {
  const copy = structuredClone(obj)
  copy.name = "else"
  assert(copy.name !== obj.name)
}

function f3() {
  const copy = JSON.parse(JSON.stringify(obj))
  copy.name = "else"
  assert(copy.name !== obj.name)
}

new Benchmark.Suite()
  .add("f1", f1)
  .add("f2", f2)
  .add("f3", f3)
  .on("cycle", (event) => {
    console.log(String(event.target))
  })
  .on("complete", function () {
    console.log("Fastest is " + this.filter("fastest").map("name"))
  })
  .run()


The results:

❯ node assign-or-clone.js
f1 x 8,057,542 ops/sec ±0.84% (93 runs sampled)
f2 x 37,245 ops/sec ±0.68% (94 runs sampled)
f3 x 37,978 ops/sec ±0.85% (92 runs sampled)
Fastest is f1

In other words, Object.assign({}, ...) is 200 times faster than structuredClone.
By the way, I re-ran the benchmark with a much smaller object (using the
package.json instead of the package-lock.json) and then Object.assign({}, ...)
is only 20 times faster.

Mind you! They're both ridiculously fast in the grand scheme of things.

If you do this...


for (let i = 0; i < 10; i++) {
  console.time("f1")
  f1()
  console.timeEnd("f1")

  console.time("f2")
  f2()
  console.timeEnd("f2")

  console.time("f3")
  f3()
  console.timeEnd("f3")
}


the last bit of output of that is:

f1: 0.006ms
f2: 0.06ms
f3: 0.053ms

which means that it took 0.06 milliseconds for structuredClone to make a
convenient deep copy of an object that is 5KB as a JSON string.


CONCLUSION

Yes Object.assign({}, ...) is ridiculously faster than structuredClone but
structuredClone is a better choice.

Please post a comment if you have thoughts or questions


PIP-OUTDATED.PY WITH INTERACTIVE UPGRADE


SEPTEMBER 21, 2023
0 COMMENTS PYTHON

Last year I wrote a nifty script called Pip-Outdated.py "Pip-Outdated.py - a
script to compare requirements.in with the output of pip list --outdated". It
basically runs pip list --outdated but filters based on the packages mentioned
in your requirements.in. For people familiar with Node, it's like checking all
installed packages in node_modules if they have upgrades, but filter it down by
only those mentioned in your package.json.

I use this script often enough that I added a little interactive input to ask if
it should edit requirements.in for you for each possible upgrade. Looks like
this:


❯ Pip-Outdated.py
black               INSTALLED: 23.7.0    POSSIBLE: 23.9.1
click               INSTALLED: 8.1.6     POSSIBLE: 8.1.7
elasticsearch-dsl   INSTALLED: 7.4.1     POSSIBLE: 8.9.0
fastapi             INSTALLED: 0.101.0   POSSIBLE: 0.103.1
httpx               INSTALLED: 0.24.1    POSSIBLE: 0.25.0
pytest              INSTALLED: 7.4.0     POSSIBLE: 7.4.2

Update black from 23.7.0 to 23.9.1? [y/N/q] y
Update click from 8.1.6 to 8.1.7? [y/N/q] y
Update elasticsearch-dsl from 7.4.1 to 8.9.0? [y/N/q] n
Update fastapi from 0.101.0 to 0.103.1? [y/N/q] n
Update httpx from 0.24.1 to 0.25.0? [y/N/q] n
Update pytest from 7.4.0 to 7.4.2? [y/N/q] y


and then,


❯ git diff requirements.in | cat
diff --git a/requirements.in b/requirements.in
index b7a246e..0e996e5 100644
--- a/requirements.in
+++ b/requirements.in
@@ -9,7 +9,7 @@ python-decouple==3.8
 fastapi==0.101.0
 uvicorn[standard]==0.23.2
 selectolax==0.3.16
-click==8.1.6
+click==8.1.7
 python-dateutil==2.8.2
 gunicorn==21.2.0
 # I don't think this needs `[secure]` because it's only used by
@@ -18,7 +18,7 @@ requests==2.31.0
 cachetools==5.3.1

 # Dev things
-black==23.7.0
+black==23.9.1
 flake8==6.1.0
-pytest==7.4.0
+pytest==7.4.2
 httpx==0.24.1


That's it. Then if you want to actually make these upgrades you run:


❯ pip-compile --generate-hashes requirements.in && pip install -r requirements.txt


To install it, download the script from:
https://gist.github.com/peterbe/a2b158c39f1f835c0977c82befd94cdf
and put it in your ~/bin and make it executable.
Now go into a directory that has a requirements.in and run Pip-Outdated.py

Please post a comment if you have thoughts or questions


PARSE A CSV FILE WITH BUN


SEPTEMBER 13, 2023
0 COMMENTS BUN

I'm really excited about Bun and look forward to trying it out more and more.
Today I needed a quick script to parse a CSV file to compute some simple
arithmetic on some numbers in it.

To do that, here's what I did:


bun init
bun install csv-simple-parser
code index.ts


And the code:


import parse from "csv-simple-parser";

console.time("total");
const numbers: number[] = [];
const file = Bun.file(process.argv.slice(2)[0]);
type Rec = {
  Pageviews: string;
};
const csv = parse(await file.text(), { header: true }) as Rec[];
for (const row of csv) {
  numbers.push(parseInt(row["Pageviews"] || "0"));
}
console.timeEnd("total");
console.log("Mean  ", numbers.reduce((a, b) => a + b, 0) / numbers.length);
console.log("Median", numbers.sort()[Math.floor(numbers.length / 2)]);


And running it:


❯ wc -l file.csv
   13623 file.csv

❯ /usr/bin/time bun run index.ts file.csv
[8.20ms] total
Mean   7.205534757395581
Median 1
        0.04 real         0.03 user         0.01 sys


(On my Intel MacBook Pro...) The reading in the file and parsing the 13k lines
took 8.2 milliseconds. The whole execution took 0.04 seconds. Pretty neat.

Please post a comment if you have thoughts or questions

Previous page
Next page
 * Home
 * Archive
 * About
 * Contact
 * Search

© peterbe.com 2003 - 2023

Check out my side project: That's Groce!