Jia Hao

Building a tiny remote control for Claude Code (and more)

My Playdate connects to a remote tmux session. Crank to scroll or push B to talk (and exit vim).

Over the past few weeks, I’ve been working on turning the game console into a remote control for my terminal. It’s a Cortex M7 with a 1-bit display, Wi-Fi and a mic. There is a C SDK and people have ported Doom onto it.

Here’s my architecture:

Playdate on the left connects to server over mTLS. Server runs MLX-Whisper and connects the Playdate pty to a shared tmux session. Channel between Playdate and server transfers pty, voice and commands.

Voice as the terminal keyboard

While the D-pad and crank map to obvious inputs like arrow keys and scrolling, I still needed a way to enter text and use other keyboard commands. The UX I settled on was a push-to-talk flow.

Sound and text in the current viewport are multiplexed over the same TLS connection to the server. Two separate connections would be simpler but would overload the hardware. The sound chunks are uploaded in real time, then transcribed with MLX-Whisper running locally once the user releases the talk button. This turns out to be a second or so faster than streaming the sound to Gemini 3.1 Flash Live. With some of these optimizations, the latency is brought down from close to 10 seconds to ~2-3 seconds.

The converted transcript, terminal text, running process name and cursor position are fed as context to Gemini 3.1 Flash Lite. Naturally, the terminal context is critical for reasonable results. The model is prompted to either emit raw strings, or text commands like <esc> which will be interpreted as keystrokes. Results are sent back to the Playdate and appear as a preview; the user can press A to enter the command.

It’s not 100% and may require a few attempts, but this provides sufficient accuracy to operate the terminal and even use TUIs like vim. Some primitive evals were also set up to evaluate the accuracy of the generated commands. Latency is still much slower than I would like it to be though. There’s likely a lot of room for improvement here.

Aside: Something to consider here is that the terminal contents can potentially prompt inject the Gemini call. I assume that the user looks at the screen before invoking the push-to-talk and also checks the returned command in the preview before confirming.

Ghostty on bare metal

The Playdate app embeds ghostty-vt, with some tiny patches to support a freestanding target. Since ghostty-vt is a Zig library, I just went with it and let Claude Code convert the Playdate starter project to Zig, based on a Zig Playdate Template.

This change introduced a large number of relocations to the compiled binary, so I needed to compile in ReleaseSmall. The coding agent easily figured out most of this once the feedback loop was set up and robust enough (more on that later).

The only remaining quirks for terminal rendering are font rendering and glyphs. A subset of glyphs commonly used by TUIs is bundled into the font atlas. Bold text is implemented using double-strike (drawing the character repeatedly at small offsets), and lighter colored text is implemented using a small amount of dithering.

Screenshot illustrating rendering of various glyphs

Screenshot illustrating rendering of various glyphs.

The ghostty embedder is set up to connect to the pty of a tmux session running on the server. This means that I can attach to the same session, make changes to the session, and observe them in real time on the Playdate.

usessl is not actually secure

While it would be sufficient to limit the scope to assume that the Playdate is on some secured network, I wanted to support the use case of connecting to a Wi-Fi hotspot, or an unsecured public Wi-Fi. This meant that I needed to use a secure channel.

The simplest option here could be to run some embedded Tailscale client on the Playdate to connect it to the shared Tailnet with the server. This doesn’t work because the available Tailscale libraries assume an OS with a real socket layer. The Playdate only exposes custom high level HTTP / TCP APIs.

The approach I really wanted to go for was to make the Playdate an SSH client. The challenge was that the sources of entropy on the Playdate are not necessarily cryptographically secure. I also needed to implement a large portion of the SSH client from std.crypto primitives, or port something like libssh2 over. There were also some hiccups with using std.crypto that led to more hand-rolled workarounds that made it much less attractive.

To work around the need to hand-roll crypto primitives, I toyed around with offloading crypto to a remote server (similar to drand) since the Playdate appeared to support usessl in its networking APIs. The idea was to delegate crypto functionality that was difficult to implement to a fully featured remote server over a secure channel.

That was when I found that usessl provides confidentiality, but doesn’t actually validate that the remote server provides a valid certificate. This invalidates the whole offloading crypto approach.

In the end, I settled on implementing mTLS in userspace with the BearSSL C library over the raw TCP APIs. The client and server validate each other’s identities using pinned credentials. The build process was set up to derive credentials from the .env and embed the relevant certs into both the server and client.

Development loop

All the code was written with Claude Code and a bit of Amp Code. Code quality was not really a concern; keeping the architecture simple and secure was.

For development velocity, it is important to take as much manual work out of the loop as possible. Early on, I set up a hacky script for the agent to test its changes on both the Playdate simulator and hardware. This slowly grew into screenshot functionality with logs and stack traces (thanks to playdate-reverse-engineering). For example, it runs the app on the simulator, waits a few seconds, then prints out:

attach: tmux attach -t crankshell
Current screen shows:
  zig-out/pdsim.txt  (OCR text)
  zig-out/pdsim.png  (screenshot)
Sim log: zig-out/sim.log
Relay:   tmux attach -t crankshell

OCR of the screen’s contents is provided with claude --model haiku -p. Based on the task, the agent can decide whether to read the OCR text (more token-efficient), or read the screenshot directly.

In general, it is a workflow smell when the agent looks like it’s repeatedly running a set of long bespoke commands instead of running a pre-defined shell script. In CLAUDE.md, I have a line that says:

  • Watch for dev-loop friction during normal work — hand-chained commands that should be one tool, missing visibility (had to read a PNG to know X), repetitive cleanup, anything that turned a 5-minute task into 30 — and append it to todo.md with enough context to act on later. Observations evaporate between sessions.

Wrapping up

I’m still not pleased with the push-to-talk latency, but it’s been a fun way to spend my weekends justifying a Playdate that ended up sitting on a shelf for years.

If you have one, try it yourself (repo link).