Hard Lemonade: Three Fixes to Get Local AI Pouring on AMD
Running a local LLM server is the easy part. Getting three separate pieces of infrastructure to agree that a model is downloaded, reachable, and worth waiting for is where the afternoon goes. Over the past two weeks I shipped three fixes across two open-source projects to get AMD’s Lemonade serving models behind the Olla proxy on my Strix Halo box. None of them was hard in the algorithmic sense – the diffs are a struct field, a config key, and a prepended path. They all came out of the same goal: point Olla at Lemonade on a Radeon and get a chat completion back.