mike42.me Open in urlscan Pro
194.195.254.112  Public Scan

Submitted URL: http://mike42.me/
Effective URL: https://mike42.me/blog/
Submission: On January 11 via api from US — Scanned from AU

Form analysis 1 forms found in the DOM

https://mike42.me/blog/

<form class="form-inline form-navbar order-2 search-expand" action="https://mike42.me/blog/">
  <svg class="icon icon-search" aria-hidden="true" role="img">
    <use xlink:href="#icon-search"></use>
  </svg>
  <input class="form-control search-box" name="s" type="text" placeholder="Search" autocomplete="off">
</form>

Text Content

Mike's Software Blog

 * Topics
   6502 How-to Linux Debian Software PHP Just for fun Hardware ESC/POS
 * Projects
   8-Bit Table Tennis 6502 Assembly Plugin for IntelliJ 6502 Computer Bring a
   plate ESC/POS print driver Samoan Language PHP image processing library
   (gfx-php) Word puzzles

 * About

 * 
 * About


GOING ALL-IN ON GPU PASSTHROUGH FOR SOFTWARE DEVELOPMENT

debian kvm meta
Posted on December 11, 2023

I recently spent some time improving my software development workflow at home,
since my previous setup was starting to limit me. I settled on a configuration
which uses GPU passthrough with the KVM hypervisor, running Debian as both a
host and a guest.

This post aims to show some of the benefits and drawbacks of this more complex
setup for my use case, as well as a specific combination of hardware and
software which can be made to work.


BACKGROUND AND PREVIOUS SETUP

I was using Debian testing as a host system, and provisioning project-specific
virtual machines to work in via virt-manager. I use two monitors, and typically
had an IDE (in the VM) open on one monitor, and a web browser (on the host) open
on the other.



The setup is simple and effective:

 * The VMs ran in the QEMU user session – previously blogged about here.
 * The path /home/mike/workspace on the host was shared with every VM via a 9p
   fileshare.
 * Each desktop was set to 1920 x 1080 resolution, with auto-resize disabled. I
   kept the VM’s at this lower resolution to get acceptable graphical
   performance on 4K monitors when I set this up.

This allowed me to keep the host operating system uncluttered, which I value for
both maintainability and security. Modern software development involves running
a lot of code, such as dependencies pulled in via a language-specific package
manager, random binaries from GitHub, or tools installed via a sketchy curl ...
| sudo sh command. Better to keep that in a VM.

The main drawbacks are with the interface into the VM.

 * Due to scaling, I had imperfect image quality. I also had a small but
   noticeable input lag when working in my IDE.
 * Development involving 3D graphics was impractical due to the performance
   difference. I reported a trivial bug in Blender earlier this year but didn’t
   have an environment suitable for more extensive development on that codebase.
 * There was an audio delay when testing within the VM. I created a NES demo
   last year and this delay was less than ideal when it came to adding sound.

I was at one point triple-booting Debian, Ubuntu 22.04 and Windows, so that I
could also run some software which wouldn’t work well in this setup.


HARDWARE

GPU passthrough on consumer hardware is highly dependent on motherboard and BIOS
support. The most critical components in my setup are:

 * Motherboard: ASUS TUF X470-PLUS GAMING
 * CPU: AMD Ryzen 7 5800X
 * Monitors: 2 x LG 27UD58
 * Graphics card: AMD Radeon RX 6950 XT
   * Installed at PCIEX16_1, directly connected to the CPU
   * Connected to DisplayPort inputs on the two monitors

I added one component specifically for this setup:

 * Second graphics card: AMD Radeon RX 6400
   * Installed at PCIEX16_2, connected to chipset
   * Connected to HDMI inputs on the two monitors (one directly, one via an
     adapter)

The second graphics card will be used by the host. The RX 6400 is a low-cost,
low-power, low-profile card which supports UEFI, Vulkan and the modern amdgpu
driver.

In the slot I’ve installed it, it’s limited to PCIE 2.0 x4, and 1920 x 1080 is
the highest resolution I can run at 60 Hz on these monitor inputs. I only need
to use this to set up the VM’s, or as a recovery interface, so I’m not expecting
this to be a major issue.


INSPECTING IOMMU GROUPS

On a fresh install of Debian 12, I started walking through the PCI passthrough
via OVMF guide on the Arch Wiki, adapting it for my specific setup as necessary.

I verified that I was using UEFI boot, had SVM and IOMMU enabled, and I also
enabled resizable BAR. I did not set any IOMMU-related Linux boot options, and
also could not find a firmware setting to select which graphics card to use
during boot.

Using the script on the Arch Wiki, I then printed out the IOMMU groups on my
hardware, which are as follows:

IOMMU Group 0:
    00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 1:
    00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 2:
    00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 3:
    00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 4:
    00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 5:
    00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 6:
    00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 7:
    00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 8:
    00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 9:
    00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU Group 10:
    00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 11:
    00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU Group 12:
    00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
    00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 13:
    00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 [1022:1440]
    00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 [1022:1441]
    00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 [1022:1442]
    00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 [1022:1443]
    00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 [1022:1444]
    00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 [1022:1445]
    00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 [1022:1446]
    00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 [1022:1447]
IOMMU Group 14:
    01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 15:
    02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43d0] (rev 01)
    02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
    02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
    03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
    08:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c7)
    09:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
    0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] [1002:743f] (rev c7)
    0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
    0b:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01)
IOMMU Group 16:
    0c:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c0)
IOMMU Group 17:
    0d:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 18:
    0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6950 XT] [1002:73a5] (rev c0)
IOMMU Group 19:
    0e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
IOMMU Group 20:
    0f:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU Group 21:
    10:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU Group 22:
    10:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU Group 23:
    10:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU Group 24:
    10:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]


Passthrough for an IOMMU group is all-or nothing. For example, all of the
chipset-connected devices are grouped together, and I can’t pass any of them
through to a VM unless I pass them all through.

I’m mainly interested in a graphics card, an audio device, and a USB controller,
and helpfully I have one of each isolated in their own groups, presumably
because they are connected to PCIe lanes which go directly to the CPU.




GRAPHICS

I created a virtual machine, also Debian 12, in the QEMU system session. The
only important setting for now is the chipset and firmware, and in my case I
selected Q35, and the OVMF UEFI firmware. GPU passthrough will not work if a
virtual machine is booting with legacy BIOS.



I use the WebGL Aquarium as a simple test for whether 3D acceleration is
working. It runs much faster on the host system (this is using the RX 6400). The
copy in the VM runs at just 3 frames per second, using SPICE display and virtio
virtual GPU at this stage.



The next step was to isolate the GPU from the host. The relevant line from lspci
-nn is:

0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6950 XT] [1002:73a5] (rev c0)


The vendor and device ID for this card is shown at the end of the line,
1002:73a5.

This needs to be added to the vfio-pci.ids boot option, which in my case
involves updating /etc/default/grub.

GRUB_CMDLINE_LINUX="vfio-pci.ids=1002:73a5"


This is then applied by running update-grub2, and rebooting. It’s apparently
possible to accomplish this without a reboot, but I’m following the easy path
for now.

I verified that it worked by checking that the vfio-pci kernel module is in use
for this device.

$ lspci -v
0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6950 XT] (rev c0) (prog-if 00 [VGA controller])
...
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu


Before booting up the VM, I added the card as a “Host PCI Device”. It was then
visible in lspci output within the VM, but was not being used for video output
yet.



To encourage the VM to output to the physical graphics card, I switched the
virtualised video card model from “virtio” to “None”.



Booting up the VM after this change, the SPICE display no longer produces
output, but USB redirection still works. I passed through the keyboard, then the
mouse.



I then switched monitor inputs to the VM, now equipped with a graphics card, and
the WebGL aquarium test ran at 30 FPS.



Now that I was switching between near-identical Debian systems, I started using
different desktop backgrounds to stay oriented.


AUDIO OUTPUT

I needed to make sure that audio output worked reliably in the virtual machine.

Sound was working out of the box through the emulated AC97 device, but I checked
an online Audio Video Sync Test, and confirmed that there was a significant
delay, somewhere in in the 200-300ms range. This is not good enough, so I
deleted the emulated device and decided to try some other options.

I first tried passing through the audio device associated with the graphics
card, identified as 1002:ab28.

0e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]


I took the ID, and added it to the option in /etc/default/grub. Now that there
are multiple devices, they are separated by commas.

GRUB_CMDLINE_LINUX="vfio-pci.ids=1002:73a5,1002:ab28"


As before, I ran update-grub2, rebooted, added the device to the VM through
virt-manager, booted up the VM, and tried to use the device.

With that change, I could connect headphones to the monitor and the audio was no
longer delayed. This environment would now be viable for developing apps with
sound, or following along with a video tutorial for example.


AUDIO INPUT

Next I attempted to pass through the on-board audio controller to see if I could
get both audio input and output. Discussions online suggest that this doesn’t
always work, but there is no harm in trying.

I’ll skip through the exact process this time, but I again identified the
device, isolated it from the host, and passed it through to a VM.

10:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]


Output worked immediately, but the settings app did not show any microphone
level, and attempts to capture would immediately stop. I did some basic reading
and troubleshooting, but didn’t have a good idea of what was happening.

What worked for me was blindly upgrading my way out of the problem by switching
to the Debian testing rolling release in the guest VM.

I was then able to see the input level in settings, and capture the audio with
Audacity.



Audio input is not critical for me, but does allow me to move additional
use-cases into a virtual machines without switching to a USB sound card.


USB

My computer has two USB controllers. One is available for passthrough, while the
other is in the same IOMMU group as all other chipset-attached devices.

02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43d0] (rev 01)
10:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]


The controller I passed through is the “Matisse USB 3.0 Host Controller”, device
ID 1022:149c, which I initially expected would have plenty of USB 3 ports
attached to it.

Reading the manual for my motherboard more carefully, I discovered that this
controller is only responsible for 2 USB-A ports and 1 USB-C port, or 20% of the
USB ports on the system.



I use a lot of USB peripherals for hardware development. More physical USB ports
would be ideal, but I can work around this by using a USB hub.

I’ll also continue to use USB passthrough via libvirt to get the keyboard and
mouse into the VM. Instead of manually passing these through each time I start
the VM, I added each device in the configuration as a “Physical USB Device”.



This automatically connects the device when the VM boots, and returns it to the
host when the VM shuts down, without the SPICE console needing to be open. If I
need to control the host and guest at the same time, I can connect a different
mouse and keyboard temporarily – I’ve got plenty of USB ports on the host after
all.

If this ever becomes a major issue, it should also be possible to switch this
setup around, and pass through the IOMMU group containing all chipset-attached
PCIe devices instead of the devices I’ve chosen. This would provide extensive
I/O and expansion options to the VM, at the cost of things like on-board
networking, SATA ports, and an NVMe slot on the host. The 3 USB ports on the
“Matisse USB 3.0 Host Controller”, if left to the host, would be just enough for
a mouse, keyboard, and USB-C Ethernet adapter.


FILE SHARE

On my previous setup, I used a 9p fileshare to map a directory on the host to
the same path on every VM, allowing an easy way to exchange files.

The path was /home/mike/workspace – a carry-over from when I used Eclipse. In
practice it has been slower to work on a fileshare, so I’ll switch to developing
in a local directory.

I’ll still set up a fileshare, but with two changes:

 * I’ll map a share to a more generic /home/mike/Shared, and start to back up
   anything that lands there.
 * I’ll use virtiofs instead of virtio-9p. This claims to be more performant,
   and it’s apparently possible to get this working on Windows as well.

This is added as a hardware device in virt-manager on the host.



In the guest, I added the following line to /etc/fstab.

shared /home/mike/Shared/ virtiofs


To activate this change, I would previously have created the mount-point and run
mount -a. I recently learned that systemd creates mount-points automatically on
modern systems, so I instead ran the correct incantation to trigger this
process, which is:

$ sudo systemctl daemon-reload
$ sudo systemctl restart local-fs.target



POWER MANAGEMENT

In my current setup, sleep/wake causes instability. I initially disabled idle
suspend on the host in GNOME:



While testing a VM which I had configured to start on boot, the system decided
to go to sleep while I was using it. In hindsight this makes complete sense: as
far as the host could tell, it was sitting on the login page with no mouse or
keyboard input, and had been configured to go to sleep after an idle time-out
(the setting within GNOME only applies after login).

Dec 05 16:24:43 mike-kvm systemd-logind[706]: The system will suspend now!


To avoid this, I additionally disabled relevant systemd units (source of this
command).

$ sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
Created symlink /etc/systemd/system/sleep.target → /dev/null.
Created symlink /etc/systemd/system/suspend.target → /dev/null.
Created symlink /etc/systemd/system/hibernate.target → /dev/null.
Created symlink /etc/systemd/system/hybrid-sleep.target → /dev/null.


I’ve also configured each VM to simply blank the screen when idle, since
allowing a guest to suspend causes its own problems.




RECAP OF UNEXPECTED ISSUES

Despite my efforts to plan around the limitations of my hardware, I did hit four
unanticipated problems, mostly highlighted above.

 * I assumed that there would be a setting in my BIOS to change which graphics
   card to use during boot, but I couldn’t find one.
 * The audio devices associated with both of my graphics cards had the same
   vendor and device ID. The way I have it configured, no audio output is
   available on the host as a result.
 * Microphone input didn’t work on the HD Audio Controller until I upgraded the
   guest operating system. The usual workaround for this is apparently to use a
   USB sound card.
 * I misunderstood the limitations of my USB setup, so I have fewer USB ports
   directly available in the VM than I had hoped for.


WRAP-UP AND FUTURE IDEAS

I really like the idea of switching into an environment which only has the tools
I need for the task at hand. Compared with my previous setup for developing in a
VM, GPU passthrough (and sound controller passthrough, and USB controller
passthrough) is a huge improvement.

I’ve manually provisioned 3 VM’s, including a general-purpose development VM
which has a mix of basic tools so that I can get started.



Since I can only have one VM using the graphics card at a time, this setup works
similarly to having multiple operating systems installed as a multi-boot setup.
It does however have far better separation between the systems – they can’t read
and write each others’ disks for example.

The next steps for me will be to streamline the process of switching VM’s and
shutting down the system. Both of these currently require me to manually switch
monitor inputs to interact with the host, which I would prefer to avoid.


BUILDING A 1U QUIET NAS

backup hardware kvm linux sysadmin
Posted on October 22, 2023October 26, 2023

I recently built a compact, quiet rackmount NAS for home. I haven’t seen any
builds quite like it online, so I’m writing a bit about how it came together.




THE PROBLEM

My old backup was a mirrored pair of 2 TB hard drives in an old desktop
computer, with a portable 2 TB hard drive as an off-site copy. The disks are now
9 years old, and 2 TB is small enough that I need to ration the space. I also
recently repurposed most of the components in that system, so I no longer have
an up-to-date backup.

I want to solve this properly, and hopefully build a replacement setup which I
can install in my network cabinet, to run 24/7.

But if I’m going to do that, the must-haves are:

 * It fits in a 1U rack-mount form-factor with maximum 25cm depth
 * It’s quiet
 * It has front USB for making an offline copy of the backups
 * It has 2 SATA disks – I don’t want to be running a NAS off USB-SATA adapters,
   for example
 * It has wired ethernet – my network is 1 Gigabit

Nice-to-haves would be:

 * Fast front USB
 * Hot-swap disks or more disks
 * Faster network

I’ve worked with servers, and I’ve built small-form-factor computers, so how
hard can it be to build a small-form-factor server?


PARTS LIST

I spent a lot of time sketching out possible builds in LibreOffice Draw. Every
combination of parts had some compromise, which is a familiar theme from
small-form-factor PC building.



I ended up deciding on parts which follow normal PC standards, hopefully giving
me a good chance of keeping it working for many years.

 * Case: Case Athena Power RM1UC138
 * Power supply: HDPLEX GAN 250W
 * Motherboard: Topton N6005 Mini ITX
 * RAM: Crucial 16GB (2 x 8GB) DDR4 3200 SO-DIMM
 * Boot disk: Samsung 970 EVO Plus 1TB M.2 SSD
 * Storage disks: 2 x Samsung 870 QVO 8TiB 2.5″ SSD
 * Fans: 4 x Noctua A4x20 PWM
 * Drive cage: Icy Dock MB608SP-B – 6 x 2.5″ SATA bays
 * Set of 6 x 50cm SATA cables

There are some compatibility issues in the above part list, and it took a little
bit of problem-solving to get everything working together.


CLOSER LOOK AT THE ATHENA POWER RM1UC138

Short-depth 1U computer cases are nearly impossible to find in Australia. I
ordered the Athena Power RM1UC138 from the United States, which is an OEM case
with some flexibility built in.

On the front it has a 5.25″ bay, which I plan to use to add a drive cage. The
front USB is only USB 2.0, which would normally be a disadvantage, but in this
instance is a good match for the motherboard I’m using. The power button looks
like a toggle switch, but is actually a momentary switch.



The side shows that the rack ears can be put on the front or back, allowing the
case to be mounted in either direction.



On the back there is a wire mesh panel for cutting out a custom I/O shield,
which is handy since I have an off-brand motherboard.



On the inside, it’s configured for 2 x 3.5″ hard drives by default. It also
includes 2 x 2-pin 12v fans. They are loud like you would find in most managed
network switches, but are not jet-engine loud like most servers. The fan
controller simply distributes 12v, and has no speed control.



This is not a very common case, and I read everything I could find online about
it. In order to help the next person who is searching for it, here are two more
random pieces of information which I could not confirm until I had the case
in-hand:

 * Stand-offs on the case are all 4mm tall and non-removable
 * Screw spacing of USB 2.0 front panel connector is approx 30.5mm (from centre
   of each screw). Stacked USB 3 headers that are 30mm spacing are available
   online and could be made to work.


CLOSER LOOK AT THE TOPTON N6005 MINI ITX

I chose to use a Topton motherboard with a built-in Intel N6005 CPU for this
build, since the alternatives were either too tall, use a socketed CPU, use an
old CPU, would require add-in cards to get multiple SATA ports, or were not sold
in Australia. All of these would make it far more difficult to complete the
small, quiet build which I was aiming for.



From the few threads online about this board, I gathered that it is fussy about
RAM compatibility, so I booted it up at the first opportunity with an SSD
containing Pop!_OS to check that it worked. It’s not my use case, but this
motherboard would definitely be viable as a lightweight desktop.



The specific memory I used was a 16 GB kit with the model number CT2K8G4SFRA32A.
According to Intel’s product documentation, the N6005 only supports 16 GB
maximum, and while I could find claims that higher-capacity memory does work, I
couldn’t find anybody who posted actual part numbers.

I was happy to find that the built-in cooler is inaudible at idle loads. This
was a bit of a risk: the cooler doesn’t have standard dimensions, so I couldn’t
have easily replaced it with an alternative if it was noisy. Based on other
people’s experiences with this board, I re-pasted the cooler with Noctua NT-H1
thermal paste, to hopefully help keep temperatures down at higher loads so that
the fan will not need to spin up as much. I also also avoided using the M.2 slot
which receives the most hot air from the cooler.

Topton also sells an N5105 variant which appears to be more popular (more info
here), as well as an alternative layout which has a PCIe slot instead of a
second M.2 slot.


CUSTOM POWER SUPPLY ADAPTER PLATE

The case is designed for a Flex ATX power supply, which is not what I’m using.
I’ve instead opted to use a passively-cooled HDPLEX GAN 250W power supply, which
ships with both an IEC C6 and IEC C14 cable.

I needed to choose one of these cables, and figure out how to securely mount it
to the case.



I designed an adapter plate in FreeCAD around the included IEC C6 cable, since
it had threaded holes, and screws were included.



I ordered it from a prototype supplier in laser cut 1mm steel, painted in matte
black.



This is the first time I’ve used FreeCAD to design a part, and parametric CAD
certainly has a learning curve. For this build, it was well worth it, since the
result is better (and safer) than anything I could have improvised.



At the time of writing this post, HDPLEX sells plates for mounting their IEC C14
cables in cases accepting SFX and ATX power supplies, but none for cases which
accept Flex ATX power supplies.


CUSTOM FAN CONTROLLER

Cooling this build quietly was always going to be a challenge. The case shipped
with 2 x 12 V fans, and had a simple splitter which ran them at max speed, which
was just too loud.

I designed a replacement fan controller in KiCad, which allows me to upgrade to
high-quality 4-pin fans with PWM speed control, and to set the speed using a
potentiometer. I wrote about prototyping this in a separate blog post.



This photo shows the custom controller alongside the original one it replaces.



As you can probably guess, I’ve designed this to use the same mounting location,
at the front of the case. My power supply has no Molex connectors, so I’m using
a SATA-Molex power adapter.



The main drawback to this simple design is that once I close up the case, I can
no longer adjust the fan speed.


FINAL ASSEMBLY

Before continuing any further, I took apart the case completely and deleted
three standoffs with a belt sander, to leave a flat area for power supply
installation later.



Once I got the case back together, the motherboard went in first. I raised it by
1mm using plastic washers, hoping to line it up better with the I/O shield
included with the motherboard.

The I/O opening for 1U servers is narrower than standard PC builds, so I needed
to cut the I/O shield, which I unfortunately did not do correctly.



Since that did not work, I carefully marked and cut out the wire mesh I/O shield
included with the case instead. I still left the motherboard raised up on 1mm
washers, though this is not necessary anymore. You need to use slightly longer
screws if you try this.



After that I installed the four case fans, plus the fan controller. I’m using a
front-to-back airflow, with 2 x 40mm fans mounted at the front, and 2 x 40mm
fans at the back. I added a Y splitter to the back fans, which did not have long
enough cables to reach the fan controller.

The next component I installed was the drive cage. It’s worth mentioning at this
point that the drive cage also has a fan header, which is the same as a 3-pin
header that you would find on a PC motherboard. It supplies a different voltage
for each of the speed settings. Medium is approximately 7.5 volts and is
relatively quiet with the included “Good quality DC fan” fan, and high speed is
12 volts. I set mine to off but left the fan installed.



To install items into the 5.25″ bay in this case, you attach a bracket, then
fasten it from above. The bracket allows the depth to be adjusted as well.



I also installed disks in the drive cage at this point, and numbers on the
front. Disk 1 is connected to SATA0 on the motherboard, disk 2 is connected to
SATA1, and so on.



Next was the power supply. I installed the custom plate for the power connector,
and also installed the mounting plate on the bottom of the PSU so that it would
have a flat surface. After confirming that it would fit, I cleaned both surfaces
with alcohol, and applied double-sided tape.



I then followed a rehearsed path to drop the PSU into place. There is no
opportunity to adjust it once it sticks.



At this point I connected everything up and booted up the system to start
checking for problems, since it’s easier to troubleshoot in this state. Two
modifications I made here were to disconnect the bright red HDD LED, and to
introduce a SATA power Y splitter, because the power supply SATA cables were
stretched to the limit.



It took a lot of work (and cable ties) to arrange the cables flat so that the
case could close. In defence of cable ties, they do make maintenance more
difficult, but that’s a worthwhile trade-off for keeping cables clear of airflow
paths, fan blades, and the guillotine-like action of the top cover sliding shut.




COMPLETED BUILD

After closing the case, the build is, 434mm × 254mm x 44mm, or 4.8 litres,
excluding rack ears.

This is how it appears from the front.



And this is how it appears from the back.




SOFTWARE

I’m starting with Proxmox, with OpenMediaVault deployed as a virtual machine. I
haven’t used either of these before, but both are Debian-based and provide
convenient web front-ends to the tools I would otherwise be configuring on the
command-line.

I’m passing through the disks as block devices. Running the NAS like this should
make it possible to provision extra workloads which need their own SATA disks in
future, or to switch from OpenMediaVault to stock Debian if necessary, all
without connecting a monitor.



Within OpenMediaVault, I’ve configured Linux software RAID, with an ext4
filesystem, shared via Samba, and can access that file share over the network.



I’ve enabled some basic power management features such as C-states. The system
idles in the range of 12-14 watts measured from the wall, and goes up to 20
watts when moving files around.

I don’t need a lot of disk capacity, so I’ve been able to preserve a useful
property of my old setup, where every disk in the system has a full copy of the
data, in a format which can be understood by a normal Linux system. This it
makes single-disk recovery possible using any surviving disk from the system on
practically any computer, and that disk can be from either an offline copy or
one of the disks in the RAID mirror.

I haven’t tested the process of making an offline copy of the backup volume, but
that will be up next.


WRAP-UP

This is possibly the most effort I’ve ever put into a PC build. The only
unexpected issue I encountered is how heavy it is, and wont be rack-mounting it
until I get some generic rails.

The computer uses a strange mix of parts, but meets my requirements well. I hope
that by writing this up, I’ll be providing some useful notes to anybody
attempting to build something similar.

This project also gave me a chance to practice my entry-level CAD skills to
build something which I’ll actually be using. I find a lot of utility in paper
prototyping, and printed each design in 1:1 scale to check the physical
dimensions before ordering anything.



For the circuit board, I used a print-out to check each part footprint, as well
as the hole locations for fitting it in the case.



As with many of the projects which I blog about, I’ve put the design files up on
GitHub. The fan controller is at mike42/fan-controller-athena-power, while the
Flex ATX adapter plate is at mike42/flexatx-adapter-hdplex.


CONTROLLING COMPUTER FANS WITH A MICROCONTROLLER

C++ hardware python software
Posted on September 21, 2023September 20, 2023

I’m currently working on building a small computer, and want to add some 4-pin
computer fans, running quietly at a fixed speed.

This blog post is just a quick set of notes from prototyping, since it covers a
few topics which I haven’t written about before.




THE PROBLEM

The speed of 4-pin computer fans can be controlled by varying the duty cycle on
a 25 KHz PWM signal. This signal normally comes from a 4-pin case fan header on
the motherboard, which will not be available in this situation. Rather than run
the fans at 100%, I’m going to try to generate my own PWM signal.

Two main considerations led me to choose to use a microcontroller for this:

 * I’ll need some way to adjust the PWM duty cycle after building the circuit,
   because I don’t know which value will give the best trade-off between airflow
   and noise yet.
 * Fans need a higher PWM duty cycle to start than they do to run. If I want to
   run the fans at a very low speed, then I’ll need them to ramp up on start-up,
   before slowing to the configured value.

It’s worth noting that I’m a complete beginner with microcontrollers. I’ve run
some example programs on the Raspberry Pi Pico, but that’s it.


FIRST ATTEMPT

I already had MicroPython running on my Raspberry Pi Pico, so that was my
starting point.

For development environment, I installed the MicroPython plugin for PyCharm,
since I use JetBrains IDE’s already for programming. Most guides for beginners
suggest using Thonny.



There are introductory examples on GitHub at
raspberrypi/pico-micropython-examples which showed me everything I needed to
know. I was able to combine an ADC example and a PWM example within a few
minutes.

import time
from machine import Pin, PWM, ADC, Timer

level_input = ADC(0)
pwm_output = PWM(Pin(27))
timer = Timer()

# 25 KHz
pwm_output.freq(25000)


def update_pwm(timer):
    """ Update PWM duty cycle based on ADC input """
    duty = level_input.read_u16()
    pwm_output.duty_u16(duty)


# Start with 50% duty cycle for 2 seconds (to start fan)
pwm_output.duty_u16(32768)
time.sleep(2)

# Update from ADC input after that
timer.init(mode=Timer.PERIODIC, period=100, callback=update_pwm)


On my oscilloscope, I could confirm that the PWM signal had a 25 KHz frequency,
and that the code was adjusting the duty cycle as expected. When the analog
input (above) is set to a high value, the PWM signal has a high duty cycle.



When set to a low value, the PWM signal has a low duty cycle.




BUT CAN IT CONTROL FANS?

I wired up a PC fan to 12 V power, and also sent it this PWM signal, but the
speed didn’t change. This was expected, since I knew that I would most likely
need to convert the 3.3 V signal up to 5 V.

I ran it through a 74LS04 hex inverter with VCC at 5 V, which did the trick. I
could then adjust a potentiometer, and the fan would speed up or slow down.

I captured the breadboard setup in Fritzing. Just note that there are floating
inputs on the 74LS04 chip (not generally a good idea) and that the part is
incorrectly labelled 74HC04, when the actual part I used was a 74LS04.



This shows a working setup, but it’s got more components than I would like. I
decided to implement it a second time, on a simpler micro-controller which can
work at 5 V directly.


PORTING TO THE ATTINY85

For a second attempt, I tried using an ATtiny85. This uses the AVR architecture
(of Arduino fame), which I’ve never looked at before.



This chip is small, widely available, and can run at 5 V. I can also program it
using the TL-866II+ rather than investing into the ecosystem with Arduino
development boards or programmers.



I found GPL-licensed code by Marcelo Aquino for controlling 4-wire fans.

After a few false starts trying to compile manually, I followed this guide to
get the ATtiny85 ‘board’ definition loaded into the Arduino IDE.



From there I was able to build and get an intel hex file, using the “Export
compiled binary” feature.


WRITING FUSES

The code assumes an 8 MHz clock. The Attiny85 ships from the factory with a
“divide by 8” fuse active. This needs to be turned off, otherwise the clock will
be 1 MHz. This involves setting some magic values, separate to the program code.

I found these values using a fuse calculator.

The factory default for this chip is:

lfuse 0x62, hfuse 0xdf, efuse 0xff.


To disable the divide-by-8 clock but leave all other values default, this needs
to be:

lfuse 0xe2, hfuse 0xdf, efuse 0xff.


I am using the minipro open source tool to program the chip, via a TL-866II+
programmer. First, to get the fuses:

$ minipro -p ATTINY85@DIP8 -r fuses.txt -c config
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is out of date.
  Expected  04.2.132 (0x284)
  Found     04.2.126 (0x27e)
Chip ID: 0x1E930B  OK
Reading config... 0.00Sec  OK


This returns the values expected in a text file.

lfuse = 0x62
hfuse = 0xdf
efuse = 0x00
lock = 0xff


I then set lfuse = 0xe2, and wrote the values back with this command.

$ minipro -p ATTINY85@DIP8 -w fuses.txt -c config
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is out of date.
  Expected  04.2.132 (0x284)
  Found     04.2.126 (0x27e)
Chip ID: 0x1E930B  OK
Writing fuses... 0.01Sec  OK
Writing lock bits... 0.01Sec  OK



WRITING CODE

Now the micro-controller is ready to accept the exported binary containing the
program.

minipro -s -p ATTINY85@DIP8 -w sketch_pwm.ino.tiny8.hex -f ihex
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is out of date.
  Expected  04.2.132 (0x284)
  Found     04.2.126 (0x27e)
Chip ID: 0x1E930B  OK
Found Intel hex file.
Erasing... 0.01Sec OK
Writing Code...  1.09Sec  OK
Reading Code...  0.45Sec  OK
Verification OK


With the chip fully programmed, I wired it up on a breadboard with 5 V power.



Checking the output again, the signal was the correct amplitude this time, but
the frequency does move around a bit. This is likely because I’m using the
internal RC timing on the chip, which is not very accurate. My understanding is
that anything near 25 KHz will work fine.




UPDATES

I made only one change to Marcelo’s code, which is to spin the fan at 50% for a
few seconds before using the potentiometer-set value. This is to avoid any
issues where the fans fail to start because I’ve set them to run at a very low
PWM value.

/*
 *                         ATtiny85
 *                      -------u-------
 *  RST - A0 - (D 5) --| 1 PB5   VCC 8 |-- +5V
 *                     |               |
 *        A3 - (D 3) --| 2 PB3   PB2 7 |-- (D 2) - A1  --> 10K Potentiometer
 *                     |               | 
 *        A2 - (D 4) --| 3 PB4   PB1 6 |-- (D 1) - PWM --> Fan Blue wire
 *                     |               |      
 *              Gnd ---| 4 GND   PB0 5 |-- (D 0) - PWM --> Disabled
 *                     -----------------
 */

// normal delay() won't work anymore because we are changing Timer1 behavior
// Adds delay_ms and delay_us functions
#include <util/delay.h>    // Adds delay_ms and delay_us functions

// Clock at 8mHz
#define F_CPU 8000000  // This is used by delay.h library

const int PWMPin = 1;  // Only works with Pin 1(PB1)
const int PotPin = A1;

void setup()
{
  pinMode(PWMPin, OUTPUT);
  // Phase Correct PWM Mode, no Prescaler
  // PWM on Pin 1(PB1), Pin 0(PB0) disabled
  // 8Mhz / 160 / 2 = 25Khz
  TCCR0A = _BV(COM0B1) | _BV(WGM00);
  TCCR0B = _BV(WGM02) | _BV(CS00); 
  // Set TOP and initialize duty cycle to 50%
  OCR0A = 160;  // TOP - DO NOT CHANGE, SETS PWM PULSE RATE
  OCR0B = 80; // duty cycle for Pin 1(PB1)
  // initial bring-up: leave at 50% for 4 seconds
  _delay_ms(4000);
}

void loop()
{
  int in, out;
  // follow potentiometer-set speed from there
  in = analogRead(PotPin);
  out = map(in, 0, 1023, 0, 160);
  OCR0B = out;
  _delay_ms(200);
}


The wiring of the breadboard is shown below. The capacitor is 0.1 µF for
decoupling.



This is far more compact than the Raspberry Pi Pico prototype. I could also
miniaturise it further by simply using surface-mount versions of the same
components, where using an RP2040 microcontroller from the Pico directly on a
custom board would incur some design effort.


NEXT STEPS & LESSONS LEARNED

Although this project is simple, I had to learn quite a few things to prototype
it successfully. Using a generic chip programmer like the TL-866II+ appears to
be uncommon in the AVR world, and most online guides instead suggest repurposing
an Arduino development board or using an Arduino ICSP programmer to program the
Attiny85. I was glad to confirm that I could use my existing hardware instead of
buying ecosystem-specific items which I would have no other use for. I find the
development experience to be far better with the Raspberry Pi Pico, and that’s
what I would be choosing for a more complex project.

I also captured the breadboard wiring in Fritzing for this blog post. The
diagrams are clearer than a photo of a breadboard, but I’m not confident that
they communicate information about the circuit as well as alternative
approaches. For future posts, I’ll return to using KiCad EDA for schematic
capture, unless there is some reason to highlight the physical layout of a
breadboard prototype.

As a next step, I’ll be building a simple break-out PCB for a specific computer
case to power the fans and supply a PWM signal, based on the ATtiny85 prototype
shown here.


CONVERTING MY 65C816 COMPUTER PROJECT TO 3.3 V

65C816 hardware
Posted on January 12, 2023January 29, 2023

I recently spent some time re-building my 65C816 computer project to run at 3.3
volts, instead of 5 volts which it used previously. This blog post covers some
of things I needed to take into consideration for the change.

It involved making use of options which I added when I designed this test board,
as well as re-visiting all of the mistakes in that design.




RE-CAP: WHY 3.3 V IS USEFUL

One of the goals of this project is to build a modern computer which uses a
65C816 CPU, using only in-production parts.

A lot of interesting retro chips run at 5 V, but it’s much easier to find modern
components which run at 3.3 V. I can make use of these options without adding
level-shifting if I can switch important buses and control signals to 3.3 V.


ROM

The ROM chip is the most visible change, because the chip has a different
footprint. When assembling this board for 5 V use, I used used an AT28C256
Parallel EEPROM, in a PDIP-28 ZIF socket. I couldn’t find 3.3 V drop-in
replacement for this, so I added a footprint for a SST39LF010 flash chip, which
has a similar-enough interface.

This was the first time I’ve soldered a surface-mount PLCC socket. I attempted
to solder this with hot air, which was unsuccessful, so I instead cut the center
of the socket out so that I could use a soldering iron. I then added a small
square of 1000 GSM card (approx 1mm thick) as a spacer under the chip.



I also added a new make target to build the system ROM for this chip, which
involves padding the file to a larger size, and invoking minipro with a
different option so that it could write the file. I’m using a TL866II+ for
programming, and adapter boards are available for programming PLCC-packaged
chips.

$ make flashrom_sst39lf010
cp -f rom.bin rom_sst39lf010.bin
truncate -s 131072 rom_sst39lf010.bin
minipro -p "SST39LF010@PLCC32" -w rom_sst39lf010.bin
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is newer than expected.
  Expected  04.2.123 (0x27b)
  Found     04.2.126 (0x27e)
Chip ID OK: 0xBFD5
Erasing... 0.40Sec OK
Writing Code...  8.38Sec  OK
Reading Code...  1.18Sec  OK
Verification OK



UART

The UART chip is an NXP SC16C752B, which is 3.3 V or 5 V compatible. I rescued
one of these from an adapter board (previous experiment). This was my first time
de-soldering a component with hot air gun. Hopefully it still works!



My FTDI-based USB/UART adapter has a configuration jumper which I adjusted to
3.3 V.


CLOCK, RESET AND ADDRESS DECODE

I again used a MIC2775 supervisory IC for power-on reset, though the exact part
was different. I previously used a MIC2275-46YM5 (4.6 V threshold), where the
re-build used a MIC2275-31YM5 (3.1 V threshold).

I needed a 1.8432 MHz oscillator for the UART. The 3.3 V surface-mount part I
substituted in had a different footprint, which I had prepared for.

I also prepared for this change by using an ATF22LV10 PLD for address decoding,
which is both 3.3 V and 5 V compatible. The ATF22V10 which I used for earlier
experiments works at 5 V only.




SD CARD

I installed the DEV-13743 SD card module more securely this time by soldering it
in place and adding some double-sided tape as a spacer. This module is
compatible with 3.3 V or 5 V.



The level-shifting and voltage regulation on this module is now superfluous, so
I could probably simplify things by adding an SD card slot and some resistors
directly in a later revision.


MODIFICATIONS

There are three errors in the board which I know about (all described in my
earlier blog post). This is my second time fixing them, so I tried to make sure
the modifications were neat and reliable this time.

Firstly, the flip-flop used as a clock divider is wired up incorrectly, so I cut
a trace and run a wire under the board.

Second, the reset button pin assignments are incorrect. Simply rotating the
button worked on the previous board, but it wasn’t fitted securely. This this
time I cut one of the legs and ran a short wire under the board, and the
modification is barely noticeable.



Lastly, the address bus is wired incorrectly into the chip which selects I/O
devices. I previously worked around this in software, but this time I cut two
traces and ran two wires (the orange wires in the picture). I made an equivalent
modification to the 5 V board, so that I could update the software and test it.



The wire I was using for these mods is far too thick and inflexible. I’ve added
some 30 AWG silicon-insulated wire to my inventory for the next time I need to
do this type of work, which should be more suitable.


WRAP-UP

I transferred components from the old board, and attempted to boot it every time
I made change. The power-on self test routine built in to the ROM showed that
each new chip was being detected, and I soon had a working 65C816 test board
again, now running at 3.3 V. I’m using my tiny linear power supply module to
power it.

I can now interface to a variety of chips which only run at 3.3 V. This opens up
some interesting possibilities for adding peripherals, which I hope to explore
in future blog posts.

It will be more difficult to remove and re-program the ROM chip going forward
though. This is hopefully not a problem, since can now run code from an SD card
or serial connection as well (part 1, part 2).


PORTING THE AMIGA BOUNCING BALL DEMO TO THE NES

6502 just-for-fun software
Posted on November 23, 2022November 22, 2022

Earlier this year, I ported the Amiga bouncing ball demo to the Nintendo
Entertainment System. This is a video capture from a NES emulator.



I completed this back in January, but I’m only publishing this blog post now,
because I was considering entering it into a demo competition.


HOW IT WORKS

If you are familiar with NES development, then you will probably notice that
there is nothing ground-breaking going on here. The ball is made up of 64 8×8
sprites, bouncing around the screen over a static background.

There are two different versions of the ball in sprite memory, and with some
palette swapping, this can be stretched to 4 frames of animation, just enough to
make the ball appear to spin.

There is enough sprite memory to extend this to 8 frames of animation without
any banking tricks, but I would need to start again from scratch to achieve
that.


PRE-RENDERED 3D

I drew the ball in Blender, with twice the number of segments required for the
final image. This is a UV sphere with 8 rings, and 32 segments.



I coloured each face with one of 4 colours, then rendered it with the Workbench
renderer, which I had configured to use Flat lighting, no anti-aliasing, and the
texture colour.

I also tilted the ball.



This gave me a crisp, high-resolution ball with solid colours.



To test the idea, I processed this down to a 64×64 image on a transparent
background. I need to substitute colours, so there is no antialiasing here.



I then wrote up a Python script to swap out colours to make the 4-frame pattern.
Note that frames 3 and 4 are the same as frames 1 and 2, but with white and red
swapped.



I ran this through ImageMagick to convert it into a GIF preview.

convert -delay 3.5 -loop 0 *.png animation.gif




The result appeared workable, so I went about making the same animation in a NES
rom.


NES IMPLMENTATION

On the NES, it’s not possible to create the spinning ball effect with palette
changes only, because it would require four colours plus transparency. To
overcome this, I split the image into two frames, each stored as two colours
plus transparency, which is possible.



To start the code, I checked out a fresh copy of Brad Smith’s NES example
project, then deleted things that I didn’t need.

When working with sprites on the NES, it is typical to store object attributes
in RAM, then perform a DMA operation once per frame to copy this over to the
Picture Processing Unit (PPU). For this project, I wanted to try setting up two
different copies of the object attribute memory in RAM – one for each of the two
rotations.

This should allow me to set any of the four frames by choosing between two
possible colour palettes, and two possible DMA source addresses. This worked,
and the first milestone was a spinning ball.



Switching between two DMA sources did not save much effort in the end, because I
still needed quite a lot of code to set the X/Y positions of 64 sprites each
frame. I set up some assembler macros to help with this.


PHYSICS AND SOUND

I have made one NES game before, and I was not happy with the physics. For this
project, I wanted to do a bit better.

For the X position, I use a fixed speed, and simple collision detection with
left/right boundaries. The animation frame is calculated from the X position, so
the ball changes rotation depending on which direction it is moving, just like
the Amiga demo.

The Y location of the ball is a simple loop, and follows the absolute value of a
sine wave. I pre-computed this with some Python.

My last project also had no sound, so I read up on the NES APU, and added some
noise when the ball changes direction.


DEVELOPMENT SETUP

Since my last NES project, I have improved my development setup quite a bit. As
always, I am using the ca65 assembler on Linux.

Previously, I was using a text editor. I have since moved to a custom 6502
assembly plugin on PyCharm, which allows me to quickly jump to
definitions/usages. I developed this for my other 65C02 and 65C816 projects,
which you can read about on this blog.

When I click run, I’ve set up the project to assemble a NES ROM, then launch
with fceux, which has debugging features.



The debug features were previously only available on Windows builds of fceux, so
when I made this demo, I was running the emulator via WINE. This has been added
to the Linux builds as of v2.5.0, so I’ll most likely switch to that for my next
project.


WRAP-UP

I learn something new every time I write code for the NES, and it’s a lot of fun
to make simple demos for these old systems. It’s also refreshing to create
something standalone which I can explain through screenshots, instead of pages
of assembly code.

For those who do want to read pages of assembly code, though, this project is up
on GitHub at mike42/nes-ball-demo.


LET’S MAKE A BOOTLOADER – PART 2

65C816 software
Posted on September 22, 2022October 17, 2022

In my previous blog post, I wrote about loading and executing a 512 byte program
from an SD card for my 65C816 computer project.

For part 2, I’ll look at what it takes to turn this into a bootloader, which can
load a larger program from the SD card. I want the bootloader to control where
to read data from, and how much data to load, which will allow me to change the
program structure/size without needing to update the code on ROM.


GETTING MORE DATA

The code in my previous post used software interrupts to print some text to the
console, and my first challenge was to implement a similar routine for loading
additional data from external storage.

The interface from the bootloader is quite simple: it sets a few registers to
specify the source (a block number), destination memory address, and number of
blocks to read, then triggers a software interrupt. The current implementation
can read up to 64 KiB of data from the SD card each time it is called.

ROM_READ_DISK := $03
; other stuff ...
    ldx #$0000                      ; lower 2 bytes of destination address
    lda #$01                        ; block number
    ldy #$10                        ; number of blocks to read - 8 KiB
    cop ROM_READ_DISK               ; read kernel to RAM



HIGHER MEMORY ADDRESSES

To load data in to higher banks (memory addresses above $ffff, I set the data
bank register.

I needed to update a lot of my code to use absolute long (24-bit) addressing for
I/O access, where it was previously using absolute (16-bit) addressing, for
example:

sta $0c00


In my assembler, ca65, I already use the a: prefix to specify 16-bit addressing.
I learned here that I can use the f: prefix for 24-bit addressing.

sta f:$0c00


Without doing this, the assembler chooses the smallest possible address size.
This is fine in a 65C02 system, but I find it less helpful on the 65C816, where
the meaning of a 16-bit address depends on the data bank register, and the
meaning of an 8-bit address depends on the direct page register.


NEW BOOTLOADER

With the ROM code sorted out, I went on to write the new bootloader.

This new code sets the data bank address, then uses a software interrupt to load
8 KiB of data to address $010000

; boot.s: 512-byte bootloader. This utilizes services defined in ROM.
ROM_PRINT_STRING := $02
ROM_READ_DISK := $03

.segment "CODE"
    .a16
    .i16
    jmp code_start

code_start:
    ; load kernel
    ldx #loading_kernel             ; Print loading message (kernel)
    cop ROM_PRINT_STRING
    phb                             ; Save current data bank register
    .a8                             ; Set data bank register to 1
    php
    sep #%00100000
    lda #$01
    pha
    plb
    plp
    .a16
    ldx #$0000                      ; lower 2 bytes of destination address
    lda #$01                        ; block number
    ldy #$10                        ; number of blocks to read - 8 KiB
    cop ROM_READ_DISK               ; read kernel to RAM
    plb                             ; Restore previous data bank register

    ldx #boot                       ; Print boot message
    cop ROM_PRINT_STRING
    jml $010000

loading_kernel: .asciiz "Loading kernel\r\n"
boot: .asciiz "Booting\r\n"

.segment "SIGNATURE"
    wdm $42                         ; Ensure x86 systems don't recognise this as bootable.


The linker configuration for the bootloader is unchanged from part 1.


A PLACEHOLDER KERNEL

The bootloader needed some code to load. I don’t have any real operating system
to load, so I created the “Hello World” of kernels to work with in the meantime.

This is the simplest possible code I can come up with to test the bootloader.
This assembles to 3 machine-language instructions, which occupy just 6 bytes. It
is also is position-independent, and will work from any memory bank.

; kernel_tmp.s: A temporary placeholder kernel to test boot process.

ROM_PRINT_CHAR   := $00

.segment "CODE"
    .a16
    .i16
    lda #'z'
    cop ROM_PRINT_CHAR
    stp


The linker configuration for this, kernel_tmp.cfg, creates an 8 KiB binary.

MEMORY {
    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0200,  size = $7e00, type = rw, file = "";
    PRG:    start = $e000,  size = $2000, type = ro, file = %O, fill = yes, fillval = $00;
}

SEGMENTS {
    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = PRG, type = ro,  start = $e000;
}


The commands I used to assemble and link the bootloader are:

ca65 --feature string_escapes --cpu 65816 --debug-info boot.s
ld65 -o boot.bin -C boot.cfg boot.o


The commands I used to assemble and link the placeholder kernel are:

ca65 --feature string_escapes --cpu 65816 --debug-info kernel_tmp.s
ld65 -o kernel_tmp.bin -C kernel_tmp.cfg kernel_tmp.o


I assembled the final disk image by concatenating these files together.

cat bootloader/boot.bin kernel_tmp/kernel_tmp.bin > disk.img


I used the GNOME disk utility to write the image to an SD card, which helpfully
reports that I’m not using all of the available space.




RESULT

It took me many attempts to get the ROM code right, but this did work. This
screen capture shows the test kernel source code on the left, which is being
executed at the end of the boot-up process.



If I wanted to load the kernel from within a proper filesystem on the SD card
(eg. FAT32), then I would need to update the starting block in the bootloader
(hard-coded to 1 at the moment).

The limitations of this mechanism are that the kernel needs to be stored
contiguously, be sector-aligned, and located within the first 64 MiB of the SD
card.


SPEED INCREASE

This was the first time that I wrote code for this system and needed to wait for
it to execute. The CPU was clocked at just 921.6 KHz. My SPI/SD card code was
also quite generic, and was optimised for ease of debugging rather than
execution speed.

I improved this from two angles. My 65C816 test board allows me to use a
different clock speed for the CPU and UART chip, so I sped the CPU up to 4 MHz
by dropping in an 8 MHz oscillator (it is halved to produce a two-phase clock).
As I sped this up, I also needed to add more retries to the SD card
initialisation code, since it does not respond immediately on power-up.

I also spent a lot of time hand-optimising the assembly language routine to
receive a 512-byte block of data from the SD card. There is room to speed this
up further, but it’s fast enough to not be a problem for now.

I had hoped to load an in-memory filesystem (ramdisk) alongside the test kernel,
but I’ve deferred this until I can compress it, since reading from SD card is
still quite slow.


A DEBUGGING DETOUR

Writing bare-metal assembly with no debugger is very rewarding when it works,
and very frustrating when there is something I’m not understanding correctly.

I ran into one debugging challenge here which is obvious in hindsight, but I
almost couldn’t solve at the time.

This code is (I assure you) a bug-free way to print one hex character, assuming
the code is loaded in bank 0. I was using this to hexdump the code I was loading
into memory.

.a8                             ; Accumulator is 8-bit
.i16                            ; Index registers are 16-bit
lda #0                          ; A is 0 for example
and #$0f                        ; Take low-nibble only
tax                             ; Transfer to X
lda f:hex_chars, X              ; Load hex char for X
jsr uart_print_char             ; Print hex char for X

hex_chars: .byte '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'


The problem I had was that if the data bank register was 0, this would print
ASCII character ‘0’ as expected. But if the data bank register was 1, it would
print some binary character.

Nothing in this code should be affected by the data bank register, so I went
through a methodical debugging process to try to list and check all of my
assumptions. At one point, I even checked that the assembler was producing the
correct opcode for this lda addressing mode (there are red herrings on the
mailing lists about this).

I was able to narrow down the problem by writing different variations of the
code which should all do the same thing, but used different opcodes to get
there. This quickly revealed that it was the tax instruction which did not work
as I thought, after finding that I could get the code working if I avoided it:

.a8                             ; Accumulator is 8-bit
.i16                            ; Index registers are 16-bit
ldx #0                          ; X is 0 for example
lda f:hex_chars, X              ; Load hex char for X
jsr uart_print_char             ; Print hex char for X

hex_chars: .byte '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'


My first faulty assumption was that if I set the accumulator to 0, then
transferring the accumulator value to X would set the X register to 0 as well.

.a8                             ; Accumulator is 8-bit
.i16                            ; Index registers are 16-bit
lda #0                          ; Accumulator is 0
tax                             ; X is 0?


On this architecture, there are two bits of CPU status register which specify
the size of the accumulator and index registers. When the accumulator is 8-bits,
only the lower 8-bits of the 16-bit value are set by lda. I hadn’t realised that
when the index registers are set to 16-bit, the tax instruction transfers all 16
bits of the accumulator to the X register (regardless of the accumulator size),
which was causing surprising results.

Far away from the bug-free code I was focusing on, I had used a lazy method to
set the data bank register, which involved setting the accumulator to $0101.

My second faulty assumption was that the data bank register was involved at all
– in fact lda #$0101 would have been enough to break the later code.

.a16
lda #$0101                      ; Set data bank register to 1
pha
plb
plb


To fix this, when switching to an 8-bit accumulator, I now zero out the high
8-bits.

.a8
lda #0                          ; zero out the B accumulator
xba



ALTERNATIVE BOOT METHOD

I also added an option to boot from serial, based on my implementation of an
XMODEM receive function for the 6502 processor.

This is a bit like PXE boot for retro computers. From a program such as minicom,
you tell the ROM to boot from serial, then upload the kernel via an XMODEM send.
It loads up to 64KiB to bank 1 (the same location as the bootloader on the SD
card would place it), then executes it.



This is an important option for testing, since it’s a fast way to get some code
running while I don’t have an operating system, and writing to an SD card from a
modern computer is not a fast process. It may also be an important fallback if I
need to troubleshoot SD card routines.


WRAP-UP

Previously, I needed to use a ROM programmer to load code onto this computer. I
can now use an SD card or serial connection, and have a stable interface for the
bootstrapping code to access some minimal services provided by the ROM.

It is also the first time I’m running code outside of bank 0, breaking through
the 64 KiB limit of the classic 6502. There is an entire megabyte of RAM on this
test board.

Of course, this computer still does nothing useful, but at least it now
controlled by an SD card rather than flashing a ROM. On the hardware side of the
project, this will help me to convert the design from 5 V to 3.3 V. I’ll need to
convert the ROM to a PLCC-packaged flash chip for that, which is not something
I’ll want to be programming frequently.

As far as software goes, my plan is to work on some more interesting demo
programs, so that I can start to build a library of open source 65C816 code to
work with. The hardware design, software and emulator for this computer can be
found on GitHub, at mike42/65816-computer.


LET’S MAKE A BOOTLOADER – PART 1

65C816 software
Posted on September 1, 2022September 13, 2022

I’ve been working on a homebrew computer based on the 16-bit 65C816 CPU. All of
my test programs so far have run from an EEPROM chip, which I need to remove and
re-program each time I need to make a change.



Plenty of retro systems ran all of their programs from ROM, but I only want to
use it for bootstrapping this computer. I’ve got 8 KiB of space for ROM-based
programs in the memory map, which should be plenty to check the hardware and
load software from disk.

In this two-part blog post, I’ll take a look at handing over control from the
ROM to a program loaded from an SD card.


TECHNICAL BACKGROUND

I did some quick reading about the process for bootstrapping a PC during the
16-bit era. My homebrew computer is a completely different architecture to an
IBM-compatible PC, but I’m planning to follow a few of the conventions from this
ecosystem, since I’ll have some similar challenges.

The best resources on this topic are aimed at bootloader developers, such as
this wikibooks page.

For a disk to be considered bootable, the first 512-byte sector needs to end
with hex 0xAA55 (little-endian) . This is 01010101 10101010 in binary (a great
test pattern!). My system is not x68-based, so I’ll store a different value
there.

If a disk is bootable, then the BIOS will transfer the 512 byte boot sector to
$7C00 and jump to it. The only assumption that the BIOS seems to make about the
bootloader structure is that it starts with executable code. I’ll do the same on
my system.

It’s worth noting that the first sector may also contain data structures for a
filesystem or partitioning scheme, and it’s up to the bootloader code to work
around that. For now, my SD card will contain only a bootloader, which does
simplify things a bit.

Most bootloaders will then make a series of BIOS calls via software interrupts,
which enables them produce text output or load additional data from disk. This
is where I’ll have the biggest challenge, since my ROM has no stable interface
for a bootloader to call.


RE-VISITING SD CARD HANDLING

My first task was to load the bootloader itself from SD card, storing the first
512 bytes from the disk to RAM, at address $7C00 onwards. This should be
straightforward, since I have working 6502 assembly routines for reading from an
SD card, and I’ve added a port for an SD card module to my 65C816 test board.



I came up with a routine which prints the contents of the boot sector, then
prompts for whether to execute it. My ROM code is not checking the signature at
this stage, and is not aware that the boot sector in this screen capture
contains x86 machine code within a FAT32 boot sector, but this is a good start.



It took quite a few revisions to get this working, since my old 65C02 code for
reading from SD produced strange output on this system. On my 65C816 test board,
it showed almost the right values, but it was jumbled up, and mixed with SPI
fill bytes ($FF). The below screen capture shows a diff between the expected and
actual output of the ROM.



After a long process to rule out other programming and hardware errors, I
finally noticed that I was writing the data starting from address $0104, which
was never going to work. The default stack pointer on this CPU is $01ff and
grows down, so writing 512 bytes to $0104 would always corrupt the stack after a
few hundred bytes.

At this stage I was using the assembler to statically allocate a 512 byte space
for IO. It appeared in code like this:

.segment "BSS"
io_block_id:              .res 4
io_buffer:                .res 512


The error was in the linker configuration, which I updated to start assigning
RAM addresses from $0200 onwards.

 MEMORY {
     ZP:     start = $00,    size = $0100, type = rw, file = "";
-    RAM:    start = $0100,  size = $7e00, type = rw, file = "";
+    RAM:    start = $0200,  size = $7d00, type = rw, file = "";
     PRG:    start = $e000,  size = $2000, type = ro, file = %O, fill = yes, fillval = $00;
 }




The full SD card handling code is too long to post in this blog, but now allows
any 512-byte segment from the first 32 MB of the SD card (identified by a 16-bit
segment ID) to be loaded into an arbitrary memory address.


MAKING AN API

My next challenge was to define an API for the bootloader to call into the ROM
to perform I/O.

I considered using a jump table, but decided to use the cop instruction instead.
This triggers a software interrupt on the 65C816, and has parallels to how the
int instruction is used to trigger BIOS routines on x86 systems.

I defined a quick API for four basic routines, passing data via registers.

 * print char
 * read char
 * print string
 * load data from storage

The caller would need to set some registers, then call cop from assembly
language. Any return data would also be passed via registers.

The cop instruction takes a one-byte operand, which in this case specifies the
ID of the function to call.

cop $00


To prove that the interface would work, I implemented just the routine for
printing strings.

; interrupt.s: Handling of software interrupts, the interface into the ROM for
; software (eg. bootloaders)
;
; Usage: Set registers and use 'cop' to trigger software interrupt.
; Eg:
;   ldx #'a'
;   cop ROM_PRINT_CHAR
; CPU should be in native mode with all registers 16-bit.

.import uart_printz, uart_print_char
.export cop_handler
.export ROM_PRINT_CHAR, ROM_READ_CHAR, ROM_PRINT_STRING, ROM_READ_DISK

; Routines available in ROM via software interrupts.
; Print one ASCII char.
;   A is char to print
ROM_PRINT_CHAR   := $00

; Read one ASCII char.
;   Returns typed character in A register
ROM_READ_CHAR    := $01

; Print a null-terminated ASCII string.
;   X is address of string, use data bank register for addresses outside bank 0.
ROM_PRINT_STRING := $02

; Read data from disk to RAM in 512 byte blocks.
;   X is address to write to, use data bank register for addresses outside bank 0.
;   A is low 2 bytes of block number
;   Y is number of blocks to read
ROM_READ_DISK    := $03

.segment "CODE"
; table of routines
cop_routines:
.word rom_print_char_handler
.word rom_read_char_hanlder
.word rom_print_string_handler
.word rom_read_disk_handler

cop_handler:
    .a16                            ; use 16-bit accumulator and index registers
    .i16
    rep #%00110000
    ; Save caller context to stack
    pha                             ; Push A, X, Y
    phx
    phy
    phb                             ; Push data bank, direct register
    phd
    ; Set up stack frame for COP handler
    tsc                             ; WIP set direct register to equal stack pointer
    sec
    sbc #cop_handler_local_vars_size
    tcs
    phd
    tcd
caller_k := 15
caller_ret := 13
caller_p := 12
caller_a := 10
caller_x := 8
caller_y := 6
caller_b := 5
caller_d := 3
cop_call_addr := 0
    ; set up 24 bit pointer to COP instruction
    ldx <frame_base+caller_ret
    dex
    dex
    stx <frame_base+cop_call_addr
    .a8                             ; Use 8-bit accumulator
    sep #%00100000
    lda <frame_base+caller_k
    sta <frame_base+cop_call_addr+2
    .a16                            ; Revert to 16-bit accumulator
    rep #%00100000

    ; load COP instruction which triggered this interrupt to figure out routine to run
    lda [<frame_base+cop_call_addr]
    xba                             ; interested only in second byte
    and #%00000011                  ; mask down to final two bits (there are only 4 valid functions at the moment)
    asl                             ; multiply by 2 to index into table of routines
    tax
    jsr (cop_routines, X)

    ; Remove stack frame for COP handler
    pld
    tsc
    clc
    adc #cop_handler_local_vars_size
    tcs

    ; Restore caller context from stack, reverse order
    pld                             ; Pull direct register, data bank
    plb
    ply                             ; Pull Y, X, A
    plx
    pla
    rti

cop_handler_local_vars_size := 3
frame_base := 1

rom_print_char_handler:
    ldx #aa
    jsr uart_printz
    rts

rom_read_char_hanlder:
    ldx #bb
    jsr uart_printz
    rts

rom_print_string_handler:
    ; Print string from X register
    ldx <frame_base+caller_x
    jsr uart_printz
    rts

rom_read_disk_handler:
    ldx #cc
    jsr uart_printz
    rts

aa: .asciiz "Not implemented A\r\n"
bb: .asciiz "Not implemented B\r\n"
cc: .asciiz "Not implemented C\r\n"


This snippet is quite dense, and uses several features which are new to the
65C816, not just the cop instruction.

I’m relocating the direct page to use as a stack frame, which is an idea I got
from reading the output of the WDC 65C816 C compiler. Pointers are much easier
to work with on the direct page.

This is the first snippet I’ve shared which uses a 24-bit pointer, via “direct
page, indirect long” addressing. The pointer is used to load the instruction
which triggered the interrupt, so that the code can figure out which function to
call.

lda [<frame_base+cop_call_addr]


This snippet is also the first time I’ve used the jump to subroutine instruction
(jsr) with the “absolute indirect, indexed with X” address mode. On the 65C02, I
could only use this addressing mode on the jmp instruction. The only example of
that on this blog is also an interrupt handling example.

jsr (cop_routines, X)



THE “HELLO WORLD” OF BOOTLOADERS

My next goal was to load a small program from disk, and show that it can call
routines from the ROM. For now it is just a program on the boot sector on an SD
card, which demonstrates that the new software interrupt API works.

This assembly file boot.s prints out two strings, so that I can be sure that the
ROM is returning control back to the bootloader after a software interrupt
completes.

ROM_PRINT_STRING := $02

.segment "CODE"
    .a16
    .i16
    ldx #test_string_1
    cop ROM_PRINT_STRING
    ldx #test_string_2
    cop ROM_PRINT_STRING
    stp

test_string_1: .asciiz "Test 1\r\n"
test_string_2: .asciiz "Test 2\r\n"

.segment "SIGNATURE"
    wdm $42                         ; Ensure x86 systems don't recognise this as bootable.


The linker configuration which goes with this is boot.cfg:

MEMORY {
    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $7e00,  size = $0200, type = rw, file = "";
    PRG:    start = $7c00,  size = $0200, type = rw, file = %O, fill = yes, fillval = $00;
}

SEGMENTS {
    ZEROPAGE:   load = ZP,  type = zp;
    BSS:        load = RAM, type = bss;
    CODE:       load = PRG, type = rw,  start = $7c00;
    SIGNATURE:  load = PRG, type = rw,  start = $7dfe;
}


The commands to assemble and link this are:

ca65 --feature string_escapes --cpu 65816 boot.s
ld65 -o boot.bin -C boot.cfg boot.o


This produces a 512 byte file, which I wrote to SD card.



This is the first time this computer is running code from RAM, which is an
important milestone for this project.


EDITOR IMPROVEMENTS

I needed to do some work on my 6502 assembly plugin for IntelliJ during this
process, since it didn’t understand the square brackets used for the long
address modes.

    lda [<frame_base+cop_call_addr]


While I was fixing this, I also implemented an auto-format feature. This saves
me the manual effort of lining up all the comments in a column, as is typical in
assembly code.



Lastly, I added support for jumping to unnamed labels, which are a ca65-specific
feature.




NEXT STEPS

In the second half of this blog post, I’ll get the bootloader to load a larger
program from the SD card. I’m hoping to allow the bootloader to control how much
code to load, and where to load it from.


BUILDING A SIMPLE POWER SUPPLY MODULE

hardware
Posted on August 18, 2022August 18, 2022

I recently put together a tiny module to power my electronics projects. This is
a 3cm x 4cm circuit board, and can be assembled to deliver either a fixed 3.3V
or 5V output.




BACKGROUND: BREADBOARD POWER SUPPLIES

I start a lot of my projects on a breadboard, and previously used a breadboard
power supply. The voltage regulators on these are not particularly resilient,
and I’ve damaged two of them by wiring something up incorrectly and drawing too
much current.

They do not exactly fail-safe, and now pass through the input voltage to
whatever I’m working on. This is 12 V in my case, which is enough to fry most of
the components I work with.

A bit of searching shows that this is a common issue. The first power supply
which did this was a Bud Industries BBP-32701, which uses an LM1117 regulator.



The second one was a DFRobot DFR0140, which uses the AMS1117 regulator.



For the past year or so, I’ve used an L7805 regulator connected to a breadboard
instead. This is not particularly efficient, but it works well, and I’ve caused
enough short circuits to know that the over-current protection works.




CONVERTING TO A PCB

My only goal was to take the exact parts I’ve already got, and make them into a
more compact module for permanent use.

I used KiCad to capture the schematic and design the board. The minimum order is
generally 5 circuit boards, so I planned to assemble some with an L7805 (5 V),
and others with a LD33V (3.3 V) regulator. These have different pin-outs, so I
added a series of solder jumpers so that I can use the same board for either
regulator.



Layout for this PCB was straightforward. It’s a two-layer board, and I added a
ground pour on the bottom layer, and the output voltage on most of the top
layer.



I also decided to switch off thermal relief on the copper pours, to see if a
direct connection to a ground plane really makes soldering more difficult (it
does! lesson learned).



I tried to make this board quite dense, but didn’t go completely overboard: It’s
useful to have holes to install standoffs, plus a small area to write on, so
that I can distinguish the 5 V and 3.3 V modules later.




WRAP-UP

I’m still a little surprised that custom PCB’s are so accessible for hobby use.
The unit price of these boards was just 1.36 AUD before shipping, which is
cost-competitive with blank prototyping board.



It’s easy to get assembled DC-DC converters, so I don’t suggest building any of
these modules yourself. However, as with many of the things I blog about, I’ve
uploaded the project files to GitHub under an open source license. The KiCad
project, Gerber files and parts list for this project can be found at
mike42/simple-power-supply.


LET’S IMPLEMENT POWER-ON SELF TEST (POST)

65C816 hardware software
Posted on June 16, 2022December 4, 2023

I recently implemented simple Power-on self test (POST) routine for my 65C816
test board, so that it can stop and indicate a hardware failure before
attempting to run any normal code.



This was an interesting adventure for two main reasons:

 * This code needs to work with no RAM present in the computer.
 * I wanted to try re-purposing the emulation status (E) output on the 65C816
   CPU to blink an LED.


BACKGROUND

Even modern computers will stop and provide a blinking light or series of beeps
if you don’t install RAM or a video card. This is implemented in the BIOS or
UEFI.

I got the idea to use the E or MX CPU outputs for this purpose from this thread
on the 6502.org forums. This method would allow me to blink a light with just a
CPU, clock input, and ROM working.

My main goal is to perform a quick test that each device is present, so that
start-up fails in a predictable way if I’ve connected something incorrectly.
This is much simpler than the POST routine from a real BIOS, because I’m not
doing device detection, and I’m not testing every byte of memory.


BOOT-UP PROCESS

On my test board, I’ve connected an LED directly to the emulation status (E)
output on the 65C816 CPU. The CPU starts in emulation mode (E is high). However
I have noticed that on power-up, the value of E appears to be random until /RES
goes high. If I were wiring this up again, I would also prevent the LED from
lighting up while the CPU is in reset:



The first thing the CPU does is read an address from ROM, called the reset
vector, which tells it where start executing code.

In my case, the first two instructions set the CPU to native mode, which are clc
(clear carry) and xce (exchange carry with emulation).

.segment "CODE"
reset:
    .a8
    .i8
    clc                            ; switch to native mode
    xce
    jmp post_check_loram


By default accumulator and index registers are 8-bit, the .a8 and .i8 directives
simply tell the assembler ca65 that this is the case.

Next, the code will jmp to the start of the POST process.


CHECKING LOW RAM

The first part of the POST procedure checks if the lower part of RAM is
available, by writing values to two address and checking that the same values
can be read back.

Note that a:$00 causes the assembler to interpret $00 as an absolute address.
This will otherwise be interpreted as direct-page address, which is not what’s
intended here.

post_check_loram:
    ldx #%01010101                 ; Power-on self test (POST) - do we have low RAM?
    ldy #%10101010
    stx a:$00                      ; store known values at two addresses
    sty a:$01
    ldx a:$00                      ; read back the values - unlikely to be correct if RAM not present
    ldy a:$01
    cpx #%01010101
    bne post_fail_loram
    cpy #%10101010
    bne post_fail_loram
    jmp post_check_hiram


If this fails, then the boot process stops, and the emulation LED blinks in a
distinctive pattern (two blinks) forever.

post_fail_loram:                   ; blink emulation mode output with two pulses
    pause 8
    blink
    blink
    jmp post_fail_loram            ; repeat indefinitely



MACROS: PAUSE AND BLINK

It’s a mini-challenge to write code to blink an LED in a distinctive pattern
without assuming that RAM works. This means no stack operations (eg. jsr and rts
instructions), and that I need to store anything I need in 3 bytes: the A, X, Y
registers. A triple-nested loop is the best I can come up with.

I wrote a pause macro, which runs a time-wasting loop for the requested duration
– approximately a multiple of 100ms at this clock speed. Every time this macro
is used, the len value is substituted in, and this code is included in the
source file. This example also uses unnamed labels, which is a ca65 feature for
writing messy code.

.macro pause len                   ; time-wasting loop as macro
    lda #len
:
    ldx #64
:
    ldy #255
:
    dey
    cpy #0
    bne :-
    dex
    cpx #0
    bne :--
    dec
    cmp #0
    bne :---
.endmacro


The second macro I wrote is blink, which briefly lights up the LED attached to
the E output by toggling emulation mode. I’m using the pause macro from both
native mode and emulation mode in this snippet, so I can only treat A, X and Y
as 8-bit registers.

.macro blink
    sec                            ; switch to emulation mode
    xce
    pause 1
    clc                            ; switch to native mode
    xce
    pause 2
    sec                            ; switch to emulation mode
.endmacro



CHECKING HIGH RAM

There is also a second RAM chip, and this process is repeated with some
differences. For one, I can now use the stack, which is how I set the data bank
byte in this snippet.

Here a:$01 is important, because with direct page addressing, $01 means $000001
at this point in the code, where I want to test that I can write to the memory
address $080001.

post_check_hiram:
    ldx #%10101010                 ; Power-on self test (POST) - do we have high RAM?
    ldy #%01010101
    lda #$08                       ; data bank to high ram
    pha
    plb
    stx a:$00                      ; store known values at two addresses
    sty a:$01
    ldx a:$00                      ; read back the values - unlikely to be correct if RAM not present
    ldy a:$01
    cpx #%10101010
    bne post_fail_hiram
    cpy #%01010101
    bne post_fail_hiram
    lda #$00                       ; reset data bank to boot-up value
    pha
    plb
    jmp post_check_via


The failure here is similar, but the LED will blink 3 times instead of 2.

post_fail_hiram:                   ; blink emulation mode output with three pulses. we could use RAM here?
    pause 8
    blink
    blink
    blink
    jmp post_fail_hiram            ; repeat indefinitely


To make sure that I was writing to different chips, I installed the RAM chips
one at a time, first observing the expected failures, and then observing that
the code continued past this point with the chip installed.

I also checked with an oscilloscope that both RAM chips are now being accessed
during start-up. Now that I’ve got some confidence that the computer now
requires both chips to start, I can skip a few debugging steps if I’ve got code
that isn’t working later.


CHECKING THE VERSATILE INTERFACE ADAPTER (VIA)

The third chip I wanted to add to the POST process is the 65C22 VIA. I kept this
check simple, because one read to check for a start-up default is sufficient to
test for device presence.

VIA_IER = $c00e
post_check_via:                    ; Power-on self test (POST) - do we have a 65C22 Versatile Interface Adapter (VIA)?
    lda a:VIA_IER
    cmp #%10000000                 ; start-up state, interrupts enabled overall (IFR7) but all interrupt sources (IFR0-6) disabled.
    bne post_fail_via
    jmp post_ok


This stops and blinks 4 times if it fails. I recorded the GIF at the top of this
blog post by removing the component which generates a chip-select for the VIA,
which causes this code to trigger on boot.

post_fail_via:
    pause 8
    blink
    blink
    blink
    blink
    jmp post_fail_via



BEEP FOR GOOD MEASURE

At the end of the POST process, I put in some code to generate a short beep.

This uses the fact that the 65C22 can toggle the PB7 output each time a certain
number of clock-cycles pass. I’ve connected a piezo buzzer to that output, which
I’m using as a PC speaker. The 65C22 is serving the role of a programmable
interrupt timer from the PC world.

VIA_DDRB = $c002
VIA_T1C_L = $c004
VIA_T1C_H = $c005
VIA_ACR = $c00b
BEEP_FREQ_DIVIDER = 461            ; 1KHz, formula is CPU clock / (desired frequency * 2), or 921600 / (1000 * 2) ~= 461
post_ok:                           ; all good, emit celebratory beep, approx 1KHz for 1/10th second
    ; Start beep
    lda #%10000000                 ; VIA PIN PB7 only
    sta VIA_DDRB
    lda #%11000000                 ; set ACR. first two bits = 11 is continuous square wave output on PB7
    sta VIA_ACR
    lda #<BEEP_FREQ_DIVIDER        ; set T1 low-order counter
    sta VIA_T1C_L
    lda #>BEEP_FREQ_DIVIDER        ; set T1 high-order counter
    sta VIA_T1C_H
    ; wait approx 0.1 seconds
    pause 1
    ; Stop beep
    lda #%11000000                 ; set ACR. returns to a one-shot mode
    sta VIA_ACR
    stz VIA_T1C_L                  ; zero the counters
    stz VIA_T1C_H
    ; POST is now done
    jmp post_done


The post_done label points to the start of the old ROM code, which is currently
just a “hello world” program.


NEXT STEPS

I’m now able to lock in some of my assumptions about should be available in
software, so that I can write more complex programs without second-guessing the
hardware.

Once the boot ROM is interacting with more hardware, I may add additional
checks. I will probably need to split this into different sections, and make use
of jsr/rts once RAM has been tested, because the macros are currently generating
a huge amount of machine code. I have 8KiB of ROM in the memory map for this
computer, and the code on this page takes up around 1.1KiB.


BUILDING A 65C816 TEST BOARD

65C816 hardware
Posted on June 2, 2022December 4, 2023

In the last few months, I’ve been learning about the 65C816 processor, and
trying to build a working computer which uses it. My latest breadboard-based
prototype was not reliable, and I decided to convert it to a PCB to hopefully
eliminate the problem, or to at least identify it.




QUICK GOALS

I was aiming to make a debug-friendly 4-layer PCB, the size of two standard
breadboards. This will be my first time designing a 4-layer board, and also my
first time using KiCad 6 to create a PCB.

I didn’t have a working prototype, so I built in test points for connecting an
oscilloscope or logic analyser to troubleshoot. In case of errors, I made it
possible to leave some components unpopulated, and instead drive signals
externally.

I also left some extra footprints which I might use for future improvements, or
can fall back to if some of my ideas don’t work out.


THE CLOCK

Some components require the inverse of the CPU clock, and there was previously a
small delay from inverting the signal. I don’t think this is a problem on its
own, but I introduced a D-type flip-flop to create a proper two-phase clock.

This does halve the CPU clock speed, so I added jumpers to allow an alternative
clock source to be selected for the CPU, where previously the UART and CPU
clocks needed to be the same. Note that there is an error in the wiring here,
which I only discovered later.



The oscillator I’m using on the breadboard prototype is a DIP-8 package. A
larger variety of oscillators are available as a 5x7mm QFN package, where the
four corner pins have the same function as the DIP-8 version. I added an
alternative footprint, which fits without taking up any extra board space. I
expect that I’ll only use this if I attempt to run the board at 3.3V later.



Some 6-pin QFN oscillators provide an inverted clock output on one of the pins,
and it would be an idea to use one of those in a future design to reduce the
component count.


NEW RESET CONTROLLER

Several components on the board require a reset input. These are mostly
active-low inputs, but one component has an active-high reset. I discovered the
tiny MIC2775 supervisory IC, which provides both.



I haven’t used this part before, so in case of problems, I also made it possible
to remove this part and use a DS1813, which generates an active-low reset on my
prototype.



These parts work by sensing voltage, and drop-in alternatives exist for either
part if I move to 3.3V.


ROM

I am currently using a parallel EEPROM to store code for my prototype, and will
add this to the PCB for this test board. As with the the clock and reset
controller, I’m working with 5V components today, but considering what I would
need to change to run the whole board at 3.3V.

The few EEPROMs which are available at 3.3V use a different package, and are
quite slow (200ns or worse). Parallel flash chips seem to be the most promising
alternative, since they have a similar interface, although I haven’t confirmed
that I can program them yet. I added a footprint for the SST39LF010, which also
has a 5V variant with the same pin-out.




OTHER FOOTPRINTS

I also added footprints so that I could connect the SD card module which I used
on my 6502 computer, a piezo buzzer, and some electrolytic capacitors if needed.


DESIGN LESSONS LEARNED

Although my first attempt at making a PCB was a success, I learned a few things
which I’m using here.

For that previous project, I found that machined-pin sockets are not very
tolerant of used chips with bent legs, so I’ve used cheaper stamped pin sockets
for this test board. I’ve also left spacing at the ends of each socketed IC, so
that they can be pulled out more easily.

I made an effort to keep unrelated traces away from points which I need to
solder, because I had some problems with this on my last board.

I also avoided creating one large expansion header, and instead exposed signals
in places which are easy to route. For example, the I/O device select signals
are exposed on a pin header right beside the chip which generates them.




PCB LAYOUT

The standard process is to make a schematic, then place components, then route
traces.

For this test board, I instead chose the physical dimensions of the board first,
and added components incrementally (particularly test points, expansion headers
and alternative footprints) until I was making good use of the available space.
KiCad has an item in the “Tools” menu to “Update PCB from Schematic”, which I
used extensively.

Just to see how it would look, I also added the letters “65C816” to the back
copper layer, which is visible in the top-right here.



I put a lot of time into labeling different parts of the board to help with
debug/assembly, and used the 3D view to check that it wasn’t too crowded.



Before sending the files to a manufacturer, I printed it at 1:1 scale for a
reality check.

Among other things, this confirmed that the ROM sockets had enough clearance
from other components.



It also confirmed that either DIP-8 or QFN-packaged oscillators would fit.




ASSEMBLY

I ordered the boards from a manufacturer which I’ve used before. They are
large-ish, lead-free 4-layer boards, but I’m not optimising for cost. I’ve left
a lot of options in the board, so I’m hoping to make use of several of these
PCBs in different configurations, depending on which direction this project
goes.



I assembled the board incrementally, starting with the power LED. Most of the
passive components on this board are 0603 (imperial) size surface-mount parts,
and I’ve used footprints with long pads for hand soldering.



The first problem I found was with the reset button – I had switched to using a
footprint with the correct pin spacing, but had assigned the pins incorrectly,
so I needed to cut/bend some legs and install it rotated 90 degrees.

I next found a problem with the clock, where I had wired up the flip-flop
incorrectly in the schematic – the /Q output should go to D. I could recover
from this by cutting a trace and running a short wire. Of course the board is
mirrored when flipped upside-down, so I cut the wrong trace first and needed to
repair that too.



I added enough components to get the CPU to run NOP instructions from ROM, then
built up to a running some test programs which I’ve blogged about here. The
final mistake I discovered was a mix-up with some of the lines used for
selecting I/O devices, which means that devices are not mapped to their intended
addresses. I can work around this in software.

This is already an improvement over the breadboard prototype, because I can
quickly swap the ROM chip without accidentally disconnecting anything.



The board looked fairly complete at this point, and the only major component
missing was the UART chip. This was the part which did not work reliably in my
prototype, so I was prepared to do some debugging here. Note that I’ve got
orange test points all over the board to connect important signals to an
oscilloscope, with a few black test points for GND connections. All of the
74-series chips are in the 74AC logic family, and I sourced 74HC versions as
well in case I needed to switch any to see the difference.

However, I was able to run more or less the same test program I used before, and
it now works reliably. This is captured through a Cypress FX2-based logic
analyser using Sigrok.



It’s great that this works, but I don’t know for sure why this was so unreliable
on my previous prototype. Several possible causes were eliminated through this
process, since I used a new UART chip and freshly programmed address decode PLD,
and eliminated a possible timing issue. On a PCB, I’m also able to make better
electrical connections, and add a ground plane, which is an immediate advantage
over breadboards.


THE DESIGN AS IT STANDS

Now that I am back to having a working prototype, I’ll take this chance to post
some updated schematics.

Just a note of caution: this is snapshot of a work-in-progress learning project.
I’m absolutely aware that there are errors in here, and that the layout is quite
messy. Still, I hope that this is useful to anybody else who is attempting to
use this relatively obscure CPU.

The design no longer fits on one page, so I’ve split it into 3 sheets.




CPU

The CPU sheet contains the circuitry for de-multiplexing the data bus and bank
address byte. All power, reset, and clock components are in here as well, along
with pin headers for the address & data bus, and all those test points.

There are a lot of “just in case” components as well, such as pull-up resistors
on the data bus, which I have not fitted.




MEMORY

The memory section is quite straightforward. I’ve got a PLD generating
chip-selects for ROM, RAM0, or RAM1, with some extra components to add some
flexibility.




I/O

In case I/O is selected, there are three possible choices: the 65C22 VIA, or one
of the two UART interfaces. Most of the other components on this sheet are
optional footprints or external ports. Note that the clock going into the UART
is mis-labelled on this sheet.




NEXT STEPS

I’m going to spend some time using this board as a development platform for
low-level software which targets the 65C816 CPU. I’ll most likely also use the
emulator which I put together a few weeks ago to speed up development, since
I’ve been able to confirm that the code I’m writing works on real hardware.

The basic functionality is now a lot more stable than what I had before, so this
test board will allow me to prototype some different hardware options once I’ve
got some simple text-based programs up and running.

The hardware design, software and emulator for this computer can be found in the
GitHub repository for this project. I’m updating the repository as I make
progress with this project, and the version used for this blog post is here.


POSTS NAVIGATION

Page 1 Page 2 … Page 20 Next page


RECENT POSTS

 * Going all-in on GPU passthrough for software development
 * Building a 1U quiet NAS
 * Controlling computer fans with a microcontroller
 * Converting my 65C816 computer project to 3.3 V
 * Porting the Amiga bouncing ball demo to the NES
 * Let’s make a bootloader – Part 2
 * Let’s make a bootloader – Part 1
 * Building a simple power supply module
 * Let’s implement power-on self test (POST)
 * Building a 65C816 test board

Proudly powered by WordPress