Doom on nRF5340

Doom on nRF5340

During the development of nRF5340 - a chip that may power your next headphones or gaming mouse - an important question came up: can it run the classic game Doom? A fully functional version of the game, with little to no compromises? To run smoothly, Doom required an Intel 486 processor running at 66MHz. The application processor of the nRF5340 can run at 128MHz, the multiplier is single-cycle and access to most of RAM and NVM is single-cycle (with cache enabled). So we should have more than enough processing power. But Doom required 8MB of RAM. The nRF5340 application core has only 512kB of internal RAM. It could be a challenge.

Despite the idea that Doom runs on everything, it's surprisingly hard to find an uncompromising yet minimal port of Doom targeted at memory constrained devices. It looked like we'd have to do most of that work ourselves.

First Steps

The first question was how to load the game data. All of the data necessary to play Doom is stored in a file called a WAD file. We copied this file to a microSD card, put it in an SD card breakout board, hooked it up to the nRF5340 and used the block device library in the nRF SDK to access the file. This part was easy.

Reducing RAM Requirements

Now we had to get the game to fit in less than 512kB of RAM. The lowest hanging fruit was to move data from RAM to flash NVM. In various parts of the game there are tables containing static information.

state_t	states[NUMSTATES] = {
    {SPR_TROO,0,-1,{NULL},S_NULL,0,0},	// S_NULL
    {SPR_SHTG,4,0,{A_Light0},S_NULL,0,0},	// S_LIGHTDONE
    ...

We just mark the data as "const" and the compiler takes care of keeping the data in flash memory. This memory can be accessed relatively quickly, especially with the cache enabled, so it shouldn't affect performance.

const state_t	states[NUMSTATES] = {
    {SPR_TROO,0,-1,{NULL},S_NULL,0,0},	// S_NULL
    {SPR_SHTG,4,0,{A_Light0},S_NULL,0,0},	// S_LIGHTDONE
    ...

To maximize the number of variables that could be constant, we removed the ability to configure various settings in the game, such as video resolution, key mappings and sound settings. Since we're locking the video resolution to the maximum supported by the game, we could also pre-generate a few lookup tables that depend on this resolution, and make this data constant as well.

The original Doom source code is heavily influenced by being more CPU-constrained than memory constrained. In several cases, computations are pre-calculated and stored in lookup tables. Since the nRF5340 is more RAM constrained than CPU constrained, this optimization can in some cases be reversed.

Another trick to reduce memory usage is to mark structs as packed. This will make it so fields in the struct is no longer aligned to word boundaries, which can in some cases make it slower to access the data. But that's a good trade-off for this project. A similar optimization is that many variables are often declared as "int" (32-bit) while the variable might never be higher than the value that can be stored in a 16 or 8-bit integer. In these cases the variable was changed to "short" or "int8_t"

Using Quad-SPI Memory

When Doom loads a level, it copies a lot of data from the WAD file to structures in memory. The data in the WAD files are stored in little-endian format. This is lucky for us, as the Cortex-M33 as configured in nRF5340 is little-endian as well. So maybe we can access the game data directly from the WAD file instead of copying it to RAM?

The first step is to copy the WAD file to the external 8MiB Quad-SPI flash memory chip on the nRF5340 Development Kit. This happens to be just about big enough to store the whole WAD for the shareware version of Doom. Once we've done that, we can access the data on this external memory directly through the execute in place (XIP) functionality. The memory is then cached by the cache component of the application core, making most data accesses to this external memory relatively fast, despite the moderate bandwidth of the Quad-SPI link.

Let's take a look at one of the structures used by the game

typedef struct line_s
{
    vertex_t*	v1;
    vertex_t*	v2;
    fixed_t	dx;
    fixed_t	dy;
    short	flags;
    short	special;
    short	tag;
    short	sidenum[2];			
    fixed_t	bbox[4];
    slopetype_t	slopetype;
    sector_t*	frontsector;
    sector_t*	backsector;
    int		validcount;
    void*	specialdata;		
} line_t;

Most of this data is copied from the WAD, where it's stored in a struct called maplinedef_t. So we put a pointer to this data in the line_t structure and add a function to access the data through the maplinedef_t struct. In the end, we're left with the following, where we've also applied the packing optimization mentioned earlier.

typedef struct  __attribute__((packed)) line_s
{
    maplinedef_t* mld;
    byte          special; // NRFD-NOTE: Was short
    int8_t        validcount; // NRFD-NOTE: Was int 
} line_t;

// Function to access the "flags" field of line_t
short LineFlags(line_t *line)
{
    return SHORT(line->mld->flags);
}

The textures used for walls in Doom are often generated by stitching together smaller images. These composite textures are generated on demand in the original game. But to avoid having to store these composite textures in RAM, we pre-generate all of them when the game starts and store them in the QSPI flash memory as well.

Display

To play a game, you usually have to be able to see it, so we needed a display that we could drive from the nRF5340. We found this nice little 4.3" 800x480 LCD display with an FT810 display driver. The display can be driven over SPI at 30MHz, which happens to be just enough to handle the 320x200 frames from Doom, at 35 frames per second (the maximum frame rate in the original version of the game). The frames are transferred as 8-bit paletted images, and the FT810 takes care of doing palette lookup and scaling the image up to fill the display, at a 4:3 aspect ratio. While this does offload some processing, it is in line with the spirit of the project as this would be handled by the graphics card and the CRT monitor itself in a 486 computer.

Sound

The nRF5340 contains an I2S peripheral for interfacing with external DACs. The sound samples in Doom is 8-bit mono PCM audio with a sample rate of 11025Hz. This is scaled up to 16-bit stereo after processing in the game. The I2S DAC is driven with 16-bit stereo format at 10869.5Hz. The game has been tested with Texas Instruments PCM5102 and Maxim Integrated MAX98357 through a simple break-out board.

Gamepad

We need some way to control the game, and since this is a low-power wireless chip, we should obviously have wireless controls. We would like to have Bluetooth keyboard and mouse of course, but when this project started we needed to keep it simple, so we used a BBC micro:bit, which uses the nRF51822, and a gamepad expansion we found on AliExpress. We hacked together some simple firmware for the micro:bit and the nRF53 network core using Nordic proprietary radio. As a little bonus, the LED-matrix on the micro:bit was used to display a picture of the Doom Slayers face, which matches the expressions on the face in the the game as you're playing.

Hardware Overview

The pictures below show how the display and the DAC is connected to the nRF53 development kit. There is more documentation of the hardware configuration and connections on Github.

Source Code

All the source code for Doom, the network core and the micro:bit - along with some build instructions - is published on Github. An unmodified copy of Chocolate Doom 3.0 was committed to the Git repo first, so all changes made for this port can be tracked in the commit log.

Videos

The following shows the latest version of the port, playing through E1M1 and E1M2.

Here's a video from a year ago, when we didn't have sound and there were still some unresolved graphic glitches

Future Improvements

There are several things we feel are missing for a perfect port of Doom

Music

Modern ports like Chocolate Doom has a software synthesizer for playback of the MIDI music. But considering that the original game would use external MIDI sound cards for music synthesis, it could be in the spirit of the original game to use an external MIDI synthesizer for playing music. This shouldn't be too processor intensive. Another option might be to use the Network Core for MIDI synthesis, since it's currently underutilized.

Multiplayer

Doom is generally remembered as the game that popularized the FPS genre. But it was arguably also the game that popularized multiplayer gaming. Within hours of Doom's release, university networks were banning Doom multiplayer games, as a rush of players overwhelmed their systems. So is it really a proper Doom port if you can't do multiplayer? Who will be the first to play Doom over a Thread network?

Keyboard and Mouse

For some people, the only real way to play Doom is with keyboard and mouse. With the Zephyr RTOS we can easily make the nRF53 act as a Bluetooth central (host) and support HID devices like keyboard and mouse. It's about time Zephyr got a Doom port.

Prior Art

Doom was ported to the STM32 series of microcontrollers a while back, running on STM32F429, STM32F7 and even a thermostat. The port used a large external RAM chip so it wasn't so useful to us when trying to get our port to run entirely on internal RAM.

Just a week ago a hack running Doom on the wireless microcontroller inside an IKEA trådfri lamp was posted on next-hack.com. The port was based on GBADoom, a port. Similar to ours, it ran mostly on internal RAM, using external SPI memory for read-only game data. The port has some clever memory optimizations - some similar to ours, and some we didn't think of - but it wasn't published when we started ours a couple of years ago. Unfortunately it seems the post, video and source code has been removed from the internet before we published this article.

These ports illustrates that there's one more challenge for the world to solve: running a full version of Doom entirely within the on-chip memory of a microcontroller.

Anonymous