The Devil Is in the AI Skills

Bernard Huang

May 25, 2026 · 6 min read

I wanted to experiment with game design, and used Rebecca and I's engagement photo as the starting point.

Selfie of Bernard and Rebecca after the engagement — both smiling, Rebecca's left hand raised to show her ring. — The source photo. Goal: turn this into a walking 96×96 sprite of each of us.

Round one was generic stick figures with no glasses, no recognizable hair, and a walk cycle that looked like a Microsoft Bob avatar. Round two — same underlying image model, with an open-source skill wrapped around it — was us. By the end of the weekend, Bernard was running across an Austin side-scroller to reach Rebecca, with checkpoint lanterns, dash afterimages, and procedural WebAudio. Same flagship models. Different scaffolding. That's the whole thing.

TL;DR

Frontier AI models are commodities. The thing that determines what they can actually do for you is the layer of skills stacked on top — and right now that layer is heavily biz/code, undercooked for creative work.

Stock Claude Design + GPT-5.5 produced sprites that didn't look like us. The model itself hedged: “solid usable draft, not yet artist-polished.”
I found agent-sprite-forge, an open-source skill by 0x0funky on GitHub. Same Gemini image model under the hood, but with prompt rules + deterministic postprocessing + a QA repair pass. Output looked like us.
From the same sprites I had GPT-5.5 build a playable Austin side-scroller. Play it →
The model didn't get better between attempts. The skill stack around it did. That gap is where the next two years of AI work lives.

Find the chauffeurs. Borrow their toolboxes.

Round one: stock AI

I fed the engagement photo to a flagship harness out of the box — Claude Design — and asked for a sprite sheet. Here is what came back.

Five pixel-art sprite frames of a generic male character — black hair, no glasses, simple grey shirt. No resemblance to Bernard. — Round one, “Bernard.” No glasses. No spiky hair. Could be any generic NPC.

Five pixel-art sprite frames of a generic female character in a black hood — no facial features, no recognizable hair shape. — Round one, “Rebecca.” A black hood. The last frame is the ring pose — it's the only thing that survived the abstraction.

The model itself knew. In the Slack thread where it dropped the package, it added a caveat:

This is a solid usable draft, not yet “artist-polished.” The next useful pass would be tightening likeness: her hair shape, your face/glasses, and maybe adding a couple's idle/emote sheet.

Translation: the model could see what was missing, and could enumerate what would fix it — it just couldn't do that work itself. It generated the cells, but it had no opinion about cell consistency, no eye for likeness, no game-engine convention to anchor on.

Why stock AI couldn't draw

Out-of-box AI harnesses ship with a skill library. That library is heavily biased toward business and code, because that's where the early enterprise demand was: PowerPoint decks, Word documents, spreadsheet ops, debugging sessions, code review, GitHub PRs, security scans. The default agent is a suit-and-tie consultant.

Creative work — pixel art, game feel, sprite rigging, animation timing — isn't in the standard kit. The model can wing it, but “wing it” without the right scaffolding is exactly what produces generic NPC walk cycles and faces that don't look like the people in the photo.

This isn't a model-capability problem. It's a model-skill problem. The same weights that can't draw my fiancée can draw her perfectly — if you give them the right wrapper.

Round two: a skill from GitHub

I went looking. A search on GitHub for sprite agent surfaced 0x0funky/agent-sprite-forge — an open-source skill purpose-built for game-asset pixel art. The pipeline:

Prompt rules. Character-consistency scaffolding — same outfit, same proportions across all 16 cells (4 directions × 4 frames).
Gemini 3 Pro image model for generation. Same underlying weights anyone can call.
Forge postprocessor. Cell alignment, transparent background extraction, color quantization, frame normalization.
Deterministic QA repair / reassembly. Detects broken cells and patches them by re-rendering against the canonical pose, then reassembles the final sheet.

I gave it the same engagement photo. It returned a 384×768 sheet plus per-direction walk-cycle previews.

The receipts

The agent-sprite-forge output: a 4-direction, 4-frame sprite sheet for Bernard (top half, glasses + spiky hair) and Rebecca (bottom half, long flowing hair, dark outfit). Both characters are recognizable as themselves. — Round two output. Bernard has glasses and spiky hair. Rebecca has the long flowing hair from the photo. The walk cycles read correctly in all four directions.

Animated GIF of the Bernard sprite walking right — four-frame cycle, glasses and spiky hair visible. — Bernard, walking right.

Animated GIF of the Rebecca sprite walking right — four-frame cycle, long dark hair, dark outfit. — Rebecca, walking right.

These sprites are now wandering at the bottom of my about page, next to my sign-off line. They walk back and forth across the row, bounce off the edges, occasionally pass each other. Scroll to the bottom of that page and you'll see them. Round-one sprites would have looked like a screensaver from 2003. Round-two sprites look like us.

Then I built a game

With characters that actually looked like us, the question stopped being “can the AI draw” and started being “what do you do with them?”

I had GPT-5.5 build a side-scroller. Bernard runs from SOCO past the South Congress food trucks, over Lady Bird Lake under the Congress Avenue bridge bats, past the Capitol dome, to Rebecca on a neon rooftop. Touching her ends the game. No flagpole — the win condition is reaching her.

Annotated level overview of the Run to Rebecca game showing the Austin-themed side-scroller from start to finish: SOCO sign, food trucks, Lady Bird Lake, Capitol dome silhouette, UT Tower, Frost Bank crown tower, and Rebecca on a rooftop at the end. Annotated with labels for MOVE (coyote + buffered jump), DASH (Shift J K + shake), SAVE (checkpoint lanterns), JUICE (particles + zone toasts), SOUND (procedural WebAudio), WIN (Rebecca's final stats). — The level. Austin landmarks, six checkpoint lanterns, three hearts and three faces collectible per run. Rebecca is the finish line.

Three iterations got it from prototype to “tiny finished game.” v2 was pixel sprites + tilemap. v3 added an HD painterly parallax-scrolling Austin backdrop. v4 was the polish pass:

Game feel: coyote time, jump buffering, variable jump release, dash with recharge, screen shake on dash and damage.
Save: Six checkpoint lanterns that update respawn and reduce death punishment.
Juice: dash afterimages, speed streaks, landing dust, collectible bursts, checkpoint rings, a heart-and-confetti burst when Rebecca is reached.
Presentation: progress bar, run timer, zone title toasts, mute badge, end-of-run stats panel.
Audio: procedural WebAudio that boots after first input — ambient bed plus jump/dash/collect/checkpoint/hurt/win SFX.

Polish review board for Run to Rebecca v4, showing active gameplay screenshot, the Rebecca-reached win screen, the list of v4 polish items (Celeste-ish movement, dash with recharge, checkpoint lanterns, procedural WebAudio, presentation), and a panorama strip of the level with checkpoints and game-feel callouts. — The v4 polish board — the stuff that makes a prototype feel like a tiny finished game.

Win screen of Run to Rebecca: 'Rebecca reached' in white serif text on a dark overlay with a pink heart, run stats showing time, faces collected, and hearts collected, with the Austin neon rooftop visible behind the panel. — The win state. No flagpole needed.

Play Run to Rebecca →

None of this would have happened if round one had been good enough. Stick-figure sprites wouldn't have inspired me to spend the weekend on it. Recognizable sprites did.

The devil is in the AI skills

The model didn't get better between rounds. The skill stack around it did.

This is the thing nobody briefs you on when they sell you the flagship model. The model itself is a commodity — it'll be replaced by something within ninety days. What isn't a commodity is the layer of skills that determine what the model can actually do for you. agent-sprite-forge uses the same Gemini image weights anyone with an API key can call. The difference is the wrapper.

And right now the open-source skill ecosystem is heavily skewed toward business and code, because that's where the early demand was. The skill library is full of things like:

PPTX skills for generating slide decks
DOCX skills for writing reports and letters
XLSX skills for cleaning and analyzing spreadsheets
PDF skills for extraction and form-filling
Code-review, debug, GitHub-PR, security-audit skills

What's not in the default kit: pixel art, sprite rigging, game feel tuning, animation timing, audio mixing, video editing, design-system enforcement, music composition, voice direction. The creative side of the skill ecosystem is several years behind the business side. That gap is where the next two years of AI work lives.

Chauffeur knowledge

There's a parable about Max Planck and his chauffeur. After Planck won the Nobel, he gave the same lecture so many times across Germany that his chauffeur memorized it. One night they swapped places — the chauffeur gave the lecture, Planck sat in the audience in the chauffeur's hat. The talk went fine. Then a physicist in the front row asked a real question. The chauffeur shrugged and said, “I'm surprised to hear such a simple question in an advanced city like Munich. I'll let my chauffeur answer it.”

The usual moral is about real expertise vs. surface fluency. But there's an inverse moral worth holding onto: you don't have to be Planck. You just have to know who Planck is and how to find their chauffeur.

For AI skills, the practical version is: interview people who actually ship things in your domain. Ask which libraries they reach for. Ask what they wish existed. They will tell you faster than you can search the GitHub trending feed yourself. One conversation gets you eighty percent of the way.

I found agent-sprite-forge because I described the failure mode out loud to someone who'd already built sprites for a game in the same engine. He sent the link. Hours of fumbling through Hugging Face wouldn't have produced that link. The chauffeur did.

This is how I now do skill discovery for everything new I touch — reels, voice agents, music, 3D. Find the chauffeur. Borrow their toolbox. Skip the long tail.

The takeaway

Stock AI couldn't make a pixel version of my fiancée. A skill did.

The sprites wandering on /about/ — those are receipts. The Austin side-scroller at /games/run-to-rebecca/ — those are receipts. Both came from finding the right skill, not from picking a better model.

The chauffeur of this month is whoever curated agent-sprite-forge on GitHub. The chauffeur of next month is whoever's curating the creative-stack skill you haven't found yet. Go find them.

The devil is in the AI skills. The model is the easy part.

— Bernard