Guide Labs open-sources Steerling-8B, an interpretable LLM with traceable outputs
Guide Labs released Steerling-8B, an 8B-parameter LLM built with a novel architecture that makes every generated token traceable to its training data origins. A concept layer buckets data into traceable categories, enabling developers to understand why the model produces specific outputs and reliably control behaviors like gender encoding.
The co-founder started this work during MIT PhD research, co-authoring a 2018 paper showing existing interpretability methods were unreliable. The approach flips traditional AI interpretability from post-hoc "neuroscience on a model" to built-in transparency.
View full digest for February 24, 2026