wincent.dev Open in urlscan Pro
2600:1f18:2bf1:e500::119  Public Scan

Submitted URL: http://wincent.com/
Effective URL: https://wincent.dev/
Submission Tags: tranco_l324
Submission: On April 21 via api from DE — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Wincent
   Open
 * Blog
 * Wiki
 * Snippets
 * Tags
 * Search


MOVING TO WINCENT.DEV

a week ago1 minute read

Please update any bookmarks you may have for wincent.com to point at wincent.dev
instead.

If you cloned a Git repository from git.wincent.com, you should update the
remote URL to use git.wincent.dev or GitHub instead. For example, if you cloned
Command-T from git.wincent.com, you would use a command like one of the
following:

# To set the "origin" remote's URL to git.wincent.dev:
git remote set-url origin git.wincent.dev:public/command-t.git

# To set the "origin" remote's URL to GitHub:
git remote set-url origin git@github.com:wincent/command-t.git


 * blog


AI

4 weeks ago13 minute read

Once in a while I take a stab at a big, uncertain topic — like COVID or Bitcoin
— as a way of recording a snapshot of my thinking. Now it’s time to do the same
for AI (Artificial Intelligence), in a post that will surely be massively
out-of-date almost as soon as I’ve published it. Be though, as it may, a doomed
enterprise, I still want to do this, if nothing else because my own job as a
software engineer is one of the ones that is most likely to be dramatically
affected by the rise of AI. And while I could go on about image and video
generation or any of a number of other applications of the new wave of AI
products, I’m mostly going to focus on the area that is currently most relevant
to the business of software engineering; that is, LLMs (Large Language Models)
as applied to tasks in and around software development.


THE CURRENT STATE OF AI IN SOFTWARE DEVELOPMENT

In the world of programming, LLMs are being crammed and wedged into every
available gap. I say "crammed" because the textual, conversational model doesn’t
necessarily always feel like a natural fit within our existing user interfaces.
Products like GitHub Copilot seek to make the interaction as natural as possible
— for example, proposing completions for you when you do something like type a
comment describing what the code should do — but fundamentally the LLM paradigm
imposes a turn-based, conversational interaction pattern. You ask for something
by constructing a prompt, and the LLM provides a (hopefully) reasonable
continuation. In various places you see products trying to make this interaction
seem more seamless and less turn-like — sometimes the AI agent is hidden behind
a button, a menu, or a keyboard shortcut — but I generally find these attempts
to be clumsy and intrusive.

And how good is this state of affairs? At the time of writing, the answer is "it
depends". There are times when it can produce appallingly buggy but
reasonable-seeming code (note: humans can do this too), and others where it
knocks out exactly what you would have written yourself, given enough time. Use
cases that have felt anywhere from "good" to "great" for me have been things
like:

 1. Low-stakes stuff like Bash and Zsh scripts for local development. Shell
    scripts that run locally, using trusted input only, doing
    not-mission-critical things. Shells have all sorts of esoteric features and
    hard-to-remember syntax that an LLM can generally churn out quite rapidly;
    and even if it doesn’t work, the code it gives you is often close enough
    that it can give you an idea of what to do, or a hint about what part of the
    manual page you should be reading to find out about, say, a particular
    parameter expansion feature. The conversational model lends itself well to
    clarifying questions too. You might ask it to give you the incantation
    needed for your fancy shell prompt, and when it gives you something that
    looks indistinguishable from random noise, you can ask it to explain each
    part.
 2. React components. Once again, for low-stakes things (side-projects, for
    example), the LLM is going to do just fine here. I remember using an LLM
    after a period of many months of not doing React, and it helped me rapidly
    flesh out things like Error Boundary components that I would otherwise have
    had to read up on in order to refresh my memory.
 3. Dream interpretation. Ok, so I snuck in a non-programming use case. If
    you’ve ever had a weird dream and asked Google for help interpreting it,
    you’ll find yourself with nothing more than a bunch of links to low-quality
    "listicles" and SEO-motivated goop that you’ll have to wade into like a
    swamp, with little hope of actually coming out with useful answers; ask an
    LLM on the other hand, and you’ll obtain directed, on-point answers of a
    calibre equal to that of an experienced charlatan professional dream
    interpreter.
 4. Writing tests. Tests are often tedious things filled with painful
    boilerplate, but you want them to be that way (ie. if they fail, you want to
    be able to jump straight to the failing test and be able to read it
    straightforwardly from top to bottom, as opposed to having to jump through
    hoops reverse-engineering layers of cleverness and indirection). An LLM is
    good for churning out these things, and the risk of it hallucinating and
    producing something that doesn’t actually verify the correct behavior is far
    more benign than a comparable flaw making it into the implementation code
    that’s going to run in production. The bar is lower here because humans are
    at least as capable of writing bad tests as LLMs are. This is probably
    because it’s harder to ship a flagrant but undetected implementation bug
    because if anybody actually uses the software then the bug will be flushed
    out in short order: on the other hand, all manner of disgusting tests can
    get shipped and live on for extended periods in a test suite as long was
    they remain green. We’ve all seen ostensibly green tests that ended up
    verifying the wrong behavior, not verifying anything meaningful at all, or
    being mere facsimiles of the form and structure of the thing they purport to
    test, but utterly failing to express, exercise, specify, or constrain the
    expected behavior.

But it’s not all roses. One of the problems with LLMs is they’re only as good as
the data used to train them. So, given a huge corpus of code written by humans
(code with bugs), it’s only to be expected that LLM code can be buggy too. The
dark art of tuning models can only get you so far, and curating the training
data is hard to scale-up without a kind of chicken-and-egg problem in which you
rely on (untrustworthy) AI to select the best training material to feed into
your AI model. In my first experiences with LLMs, I found they had two main
failure modes: one was producing something that looks reasonable, appears to be
what I asked for, and is indeed "correct", but was subtly ill-suited for the
task; the other was producing code that again had the right shape that I’d
expect to see in a solution, but which actually had some fatal bug or flaw (ie.
is objectively "incorrect"). This means you have to be skeptical of everything
that comes out of an LLM; just because the tool seemed "confident" about it is
no guarantee of it actually being any good! And as anybody who has had an
interaction with an LLM has seen, the apparent confidence with which they answer
your questions is the flimsiest of veneers, rapidly blown away by the slightest
puff of questioning air:

> Programmer: Give me a function that sorts this list in descending order,
> lexicographically and case-insensitively.
> 
> Copilot: Sure thing, the function you ask for can be composed of the following
> elements… (shows and explains function in great detail).
> 
> Programmer: This function sorts the list in ascending order.
> 
> Copilot: Oh yes, that is correct. My apologies for farting out broken garbage
> like that. To correct the function, we must do the following…

In practice, the double-edge sword of current LLMs mean that I mostly don’t use
tools like GitHub Copilot in my day-to-day work, but I do make light use of
ChatGPT like I described in a recent YouTube video. As I’ve hinted at already,
I’m more likely to use LLMs for low-stakes things (local scripts, tests), and
only ever as a scaffolding that I then scrutinize as closely or more closely
than I would code from a human colleague. Sadly, when I observe my own
colleagues’ usage of Copilot I see that not everybody shares my cautious
skepticism; some people are wary of the quality of LLM-generated code and vet it
carefully, but others gushingly accept whatever reasonable-seeming hallucination
it sharts out.

One thing I’m all too keenly aware of right now is that my approach to code
review will need to change. When I look at a PR, I still look at it with the
eyes of a human who thinks they are reading code written by another human. I
allow all sorts of circumstantial factors to influence my level of attention
(who wrote the code? what do I know about their strengths, weaknesses, and
goals? etc), and I rarely stop to think and realize that some or all of what I’m
reviewing may actually have been churned out by a machine. I’m sure this
awareness will come naturally to me over time, but for now I have to make a
conscious effort in order to maintain that awareness.


AM I WORRIED ABOUT LOSING MY JOB?

I’m notoriously bad at predicting the future, but it seems it would be derelict
of me not to at least contemplate the possibility of workforce reductions in the
face of the rise of the AI juggernaut. I don’t think any LLM currently can
consistently produce the kind of results I’d expect of a skilled colleague, but
it’s certainly possible that that could change within a relatively short
time-scale. It seems that right now the prudent cause is to judiciously use AI
to get your job done faster, allowing you to focus on the parts where you can
clearly add more value than the machine can.

At the moment, LLMs are nowhere near being able to do the hard parts of my job,
precisely because those parts require me to keep and access a huge amount of
context that is not readily accessible to the machine itself. In my daily work,
I routinely have to analyze and understand information coming from local sources
(source code files, diffs) and other sources spread out spatially and temporally
across Git repos (commit messages from different points in history, files spread
across repositories and organizations), pull requests, issues, Google Docs,
Slack conversations, documentation, lore, and many other places. It’s only a
matter of time before we’ll be able to provide our LLMs with enough of that
context for them to become competitive with a competent human when it comes to
those tricky bug fixes, nuanced feature decisions, and cross-cutting changes
that require awareness not just of code but also of how distributed systems,
teams, and processes are structured.

It’s quite possible that, as with other forms of automation, AI will displace
humans when it comes to the low-level tasks, but leave room "up top" for human
decision-makers to specialize in high-leverage activities. That is, humans
getting the machines to do their bidding, or using the machine to imbue them
with apparent "superpowers" to get stuff done more quickly. Does this mean that
the number of programming jobs will go down? Or that we’ll just find new things
— or harder things — to build with all that new capacity? Will it change the job
market, compensation levels, supply and demand? I don’t have the answer to any
of those questions, but it makes sense to remain alert and seek to keep pace
with developments so as not to be left behind.


WHERE WILL THIS TAKE US ALL?

There have been some Twitter memes going around about how AI is capable of
churning out essentially unreadable code, and how we may find ourselves in a
future where we no longer understand the systems that we maintain. To an extent,
it’s already true that we have systems large and complicated enough that they
are impossible for any one person to understand exhaustively, but AI might be
able to build something considerably worse: code that compiles and apparently
behaves as desired but is nevertheless not even readable at the local level,
when parts of it are examined in isolation. Imagine a future where, in the same
way that we don’t really know how LLMs "think", they write software systems for
us that we also can’t explain. I don’t really want to live in a world like that
(too scary), although it may be that that way lies the path to game-changing
achievements like faster-than-light travel, usable fusion energy, room-temperate
superconductors and so on. I think that at least in the short term we humans
have to impose the discipline required to ensure that LLMs are used for "good",
in the sense of producing readable, maintainable code. The end goal should be
that LLMs help us to write the best software that we can, the kind of software
we’d expect an expert human practitioner to produce. I am in no hurry to rush
forwards into a brave new world where genius machines spit out magical software
objects that I can’t pull apart, understand, or aspire to build myself.

The other thing I am worried about is what’s going to happen once the volume of
published code produced by LLMs exceeds that produced by humans, especially
given that we don’t have a good way of indicating the provenance of any
particular piece — everything is becoming increasingly mixed up, and it is
probably already too late to hope to rigorously label it all. I honestly don’t
know how we’ll train models to produce "code that does X" once our training data
becomes dominated by machine-generated examples of "code that does X". The
possibility that we might converge inescapably on suboptimal implementations is
just as concerning as the contrary possibility (that we might see convergence in
the direction of ever greater quality and perfection) is exciting. There could
well be an inflection point somewhere up ahead, if not a singularity, beyond
which all hope of making useful predictions breaks down.


WHERE WOULD THIS TAKE US ALL IN AN IDEAL WORLD?

At the moment, I see LLMs being used for many programming-adjacent applications;
for example, AI-summarization. There is something about these summaries that
drains my soul. They end up being so pedestrian, so bland. I would rather read a
thoughtful PR description written by a human than a mind-numbingly plain AI
summary any day. Yet, in the mad rush to lead the race into the new frontier
lands, companies are ramming things like summarization tools down our throats
with the promise of productivity, in the hope of becoming winners in the AI gold
rush.

Sadly, I don’t think the forces of free-market capitalism are going to drive AI
towards the kinds of applications I really want, at least not in the short term,
but here is a little wish list:

 * I’d like the autocomplete on my phone to be actually useful as opposed to
   excruciating. Relatedly, I’d like speech-to-text to be at least as good at
   hearing what I’m saying as a human listener. Even after all these years, our
   existing implementations feel like they’ve reached some kind of local maximum
   beyond which progress is exponentially harder. 99% of all messages I type on
   my phone require me to backspace and correct at least once. As things
   currently stand, I can’t imagine ever trusting a speech-to-text artifact
   without carefully reviewing it.
 * Instead of a web populated with unbounded expanses of soulless, AI-generated
   fluff, I want a search engine that can guide me towards the very best
   human-generated content. Instead of a dull AI summary, I’d like an AI that
   found, arranged, and quoted the best human content for me, in the same way a
   scholar or a librarian might curate the best academic source material.
 * If I must have an AI pair-programmer, I’d want it to be a whole lot more like
   a skilled colleague than things like Copilot currently are. Right now they
   feel like a student that’s trying to game the system, producing answers that
   will get them the necessary marks and not deeply thinking and caring about
   producing the right answer[1].
 * AI can be useful not just for guiding one towards the best information on the
   public internet. Even on my personal computing device, I already have an
   unmanageably large quantity of data. Consider, for example, the 50,000 photos
   I have on my laptop, taken over the last 20 years. I’d like a trustworthy
   ally that I can rely on to sort and classify these; not the relatively
   superficial things like face detection that software has been able to do for
   a while now, but something capable of reliably doing things like "thinning"
   the photo library guided only by vague instructions like "reduce the amount
   of near-duplication in here by identifying groups of similar photos taken
   around the same time and place, and keep the best ones, discarding the
   others". Basically, the kind of careful sorting you could do yourself if only
   you had a spare few dozen hours and the patience and resolve to actually get
   through it all.

I’m bracing myself for a period of intensive upheaval, and I’m not necessarily
expecting any of this transformation to lead humanity into an actually-better
place. Will AI make us happier? I’m not holding my breath. I’d give this an
excitement score of 4 out of 10. For comparison, my feelings around the birth of
personal computing (say, in the 1980s) were a 10 out of 10, and the mainstream
arrival of the internet (the 1990s) were a 9 out of 10. But to end on a positive
note, I will say that we’ll probably continue to have some beautiful,
monumental, human-made software achievements to be proud of and to continue
using into the foreseeable future (that is, during my lifetime): things like
Git, for example. I’m going to cherish those while I still can.

--------------------------------------------------------------------------------

 1. And yes, I know I’m anthropomorphizing AI agents by using words like
    "thinking". At the moment we have only a primitive understanding of how
    consciousness works, but it seems clear to me that in a finite timespan,
    machines will certainly pass all the tests that we might subject them to in
    order to determine whether they are conscious. At that point, the
    distinction becomes meaningless: what is conciousness? It’s that thing that
    agents who appear to have conciousness have. ↩︎

 * blog


25 YEARS AND COUNTING

Created 27.10.2023, updated 28.10.20231 minute read

I’ve been publishing writing on the web for almost 25 years now — at least, the
oldest snapshot I can find for a website of mine dates back to December 3, 1998
(it’s possible I published this even earlier, because the footer notes that I
wrote the article in "November 1997"). I look back at those attempts at academic
writing by my 22-year-old-self and sometimes have to grimace at how strained the
wording is, but I don’t feel that bad about it. I graduated in the end, after
all.

Lately, life has been too busy to write anything on here. I have a number of
topics I’d like to dip into, but I can’t do them justice in the time I’m willing
and able to allocate them. So for now, this will have to do.

 * blog

Load more…

SITE

 * About
 * Blog
 * Wiki
 * Snippets
 * Tags
 * Search

EXTERNAL

 * GitHub
 * Twitter
 * YouTube
 * Facebook
 * LinkedIn

COLOPHON

Made by Greg Hurrell using React, Relay and GraphQL (with help from Git, Redis
and Neovim).