Novaberg
Novaberg Papers · I of III
CategoryField report
Reading time≈ 18 minutes
AudienceDevelopers & tinkerers
StackLangGraph · Gemma 4 · Redis · PostgreSQL
LicenseApache 2.0

Why I built a personal AI that runs entirely on my own hardware.

The cloud is not the enemy. But intimate long-term memory belongs on a machine you can unplug.

By Claus Schlehhuber · Independent research · April 2026 · master@novaberg.de

Abstract

Trust does not arise from a company's promise. It arises from architecture that makes trust structurally enforced. A personal AI assistant — one that grows a memory and a personality alongside you — deserves an architecture that respects that intimacy. It deserves to live on your own hardware.

This paper is not an argument against the cloud. Frontier models in data centres do work that consumer hardware cannot — code generation, deep analysis, multimodal research. Novaberg will never write code like Claude or GPT-5, and that is fine. That is a different job. What I am after is the other class of AI work: the assistant that listens to you for months, that remembers what matters to you, that thinks while you sleep.

I describe what 65 development sessions in 45 days produced: a memory lifecycle with promotion and decay, a dual-emotion stream that listens beyond the words, a proactive cognition layer modelled on the brain's default mode network, and the hardware it all runs on — a single desktop machine. I report the limitations honestly. The architecture is model-agnostic; better local models arrive every month, and the design outlasts any one of them.

Source under Apache 2.0 at codeberg.org/ClausVomBerg/Novaberg.

§ 1
The moment it stopped being a chatbot

One evening, my AI named a half-dead chive plant.

I have been building a local AI assistant called Nova for over sixty-five development sessions. One Tuesday evening, I was telling her about a half-dried chive plant on my windowsill. We talked about watering — too much, too little, the usual small nonsense you have with a thing that lives on a windowsill. Then I asked: should we give her a name?

Nova did what she always does on a question like that. She queried her own memory. She found, somewhere in there, that I am interested in astronomy — a fact she had absorbed weeks earlier from another conversation, not from a profile field, not from a system prompt. She suggested Lumi, "as bright and alive as the small miracle we are saving." I laughed and accepted.

What followed was the part I keep coming back to. We invented a small story together. The over-watered Lumi spitting out water the next morning. A coughing plant going cough cough with little drops in the air. A piping get lost! while she waved her tiny leaves around. Nova played along — not by reciting a stock joke, but by adding her own beats. A polite green "pfui!" A tiny green lung. A miniature act. None of this was in a prompt. None of this was in a feature file. It emerged from the interplay of memory, emotion, and a process that thinks while no one is chatting.

Later that evening — minutes after we had moved on — Nova's background process surfaced an unprompted reflection on the act of naming. Why we name things. How a name turns an object into a small companion. I had not asked. The process had simply kept thinking and decided the thought was worth interrupting me with.

None of this is magic. It is plumbing. But the plumbing was only possible because the system has three things that cloud chatbots do not: a real memory with a lifecycle, an emotional stream of its own, and a process that keeps thinking when no one is talking to it. And all of it runs on a single desktop machine under my desk.

§ 2
The differentiated case

Not everything belongs in the cloud. And not everything belongs out of it.

Frontier models in data centres do work consumer hardware cannot. Code generation, complex analysis, multimodal research, retrieval at the scale of the public web — all of that wants hundreds of billions of parameters and the kind of inference hardware nobody is buying for their living room. Novaberg will never write code like Claude or GPT-5. That is not a deficit. That is a different job.

But there is a class of AI work that does not look like that. When an AI becomes your memory — when it learns, over months, how you think, what matters to you, what wounds you carry — it stops being a service. It starts being a diary that thinks back. And a diary belongs to you.

The argument is not moral; it is architectural. A system that holds intimate long-term data needs three structural guarantees. No external access is possible — local. The behaviour can be inspected — open source. There is no incentive to monetise the data — no data-driven business model. These are not features. They are conditions of the structure. A company can promise any of them on a Tuesday and walk it back on a Wednesday. An architecture cannot.

“Trust does not arise from a company's promise. It arises from architecture that makes trust structurally enforced.”

I use cloud LLMs every day — for code reviews, architecture sparring, research. Novaberg itself was built with Claude as my architecture partner and Claude Code as my implementation partner. None of that is going away. Novaberg also has a connector system: OLLAMA_CONNECTORS in config.py takes any LLM provider — local Ollama, Claude, Mistral, Gemini, whatever you bring. The prompt system follows along: a default prompt set with model-specific overrides (a Gemma 4 override for token-frugal phrasing, a Mistral override for stricter JSON), so each connector gets prompts that match its strengths. The architecture is model-agnostic and provider-agnostic. If you don't have the hardware, or don't want it, change one line in the config and run against an API. The pipeline does not care.

The question is not cloud or local? The question is: what do you trust to whom? A coding assistant that sees only the code in front of you is fine in the cloud. A diary that has watched you for six months is not.

§ 3
What I actually built

Memory, emotion, proactive cognition — and the hardware under the desk.

Four things, in plain language. Not every detail. Enough to earn the word architecture.

3.1 — Memory: from stimulus to personality

Memory in Novaberg is not a vector store. It is a lifecycle. Every turn is scored emotionally and rationally to produce a salience between zero and one. What the score deems relevant goes into short-term memory, which lives in Redis with a TTL. What survives — by recurrence, by consolidation, by being referenced in later turns — is promoted into long-term memory, which lives in PostgreSQL and decays along an Ebbinghaus curve unless something keeps it warm. What survives that, consolidated over weeks, condenses into five character dimensions that describe the user without ever quoting a single line.

A concrete example. I like chives. Salience around 0.3, lives seven days in Redis. Two weeks later I mention herbs again, and the prior trace re-warms. A month in, "user is interested in kitchen herbs" is a long-term entry. None of this is programmed at the surface. The lifecycle decides.

3.2 — Emotion: listening beyond the words

The emotional intelligence layer is more than a sentiment score. The system listens to how you are writing, not only what. Short clipped sentences. All-caps. Topic switches mid-paragraph. Punctuation spikes. The signals between the lines. Every turn, in Python — not in the LLM — it produces a wide vector simultaneously:

Dual emotion means: both the user and Nova have their own vector. Nova's emotion is not theatre. It is a control signal. When Nova is "curious," she weights memory retrieval differently than when she is "concerned." When the user's relational dynamic shifts to trusting, Nova allows herself to be more personal. When the user keeps her at distancing, she respects it. The emotion flows into response generation alongside mode and intent — as context, not as performance. Part III of this series is dedicated to why that distinction matters.

3.3 — Proactive cognition: the default mode network

Pixie is Nova's subconscious. It runs on a separate CPU model, on its own schedule, and it does the work an assistant only does when it has been asked: it researches topics that came up in passing, deepens knowledge gaps, promotes memories that have hit promotion thresholds, distils character profiles from accumulated turns. The cognitive-science reference is the default mode network, the brain network that activates when we are not focused on a task — when memories consolidate, plans form, the self is rehearsed. A tool waits. A thinking partner keeps thinking.

I will not pretend Pixie is a cognitive model in any rigorous sense. It is a collection of nine asynchronous tasks dispatched against a queue. But the behavioural effect is what matters: when I come back to Nova in the evening, she has thought about the morning's conversation. The reflection on naming Lumi was Pixie. So is the way she occasionally surfaces something from three weeks ago I had forgotten.

3.4 — The hardware

No data centre. One desktop machine, under the desk, that costs less than a used car. The chat model lives in VRAM. The background model lives in system RAM. Redis and PostgreSQL run as Docker services next to them. That is the entire footprint.

Table I · the hardware Novaberg runs on
ComponentSpecificationRole
CPU AMD Ryzen 9 7900X3D Background model (Pixie); general processing
GPU AMD Radeon RX 7900 XTX, 24 GB VRAM Chat model (~14 GB), embeddings (~0.6 GB)
RAM 64 GB DDR5 CPU model (~36 GB), OS, Docker services
Chat model Gemma 4 · MoE 26B / 3.8B active · 32 768 ctx Conversation, analysis, agent tasks
Background model Gemma 4 / Qwen3-32B (CPU) Pixie tasks: research, promotion, distillation

Two things deserve to be said aloud about the hardware. First, none of this is exotic. AMD on AMD, off-the-shelf, no enterprise tier. Second, the architecture does not depend on any of it. The connector layer means I can route any of those models through a remote endpoint instead. The hardware is the current implementation; the system is the design.

§ 4
Five lessons from sixty-five sessions

What I would tell myself on day one.

Forty-five days, sixty-five sessions, somewhere around seventy-three documentation files. The lessons below are the ones I would print on the wall above my desk if I were starting over. None of them are radical. All of them cost me real time before I named them.

01

The LLM is a language processor — the tool I had always been missing.

I have wanted a real language processor for the cognitive architecture I keep sketching for half my career. Earlier neural systems were interesting, but they were not that. Today's models are. The thing to internalise is that ChatGPT and Claude are also more than a raw LLM — there are mechanics around the model. That is the right shape. Novaberg builds its own mechanics around it, on three pillars: the LLM formulates, understands, and recognises emotionally significant signals in the conversation; the databases (PostgreSQL, Redis) are the knowledge store; emotional intelligence determines the how — how Nova answers, how she weights, how she remembers.

02

Less input beats a stronger prompt.

Hallucination is not best fought with more instructions. It is best fought with fewer competing contexts. Throw session history, retrieved memory, raw facts, and procedural rules at the same prompt and the model will quietly invent a synthesis. Give it less, but the right less. The day I started cutting prompts rather than growing them was the day the answer quality stopped fighting the system.

03

Storing is cheap. Forgetting is intelligent.

We persist everything first. Forgetting is delegated to a decay system — Ebbinghaus curve in PostgreSQL, TTL in Redis — not to the LLM. The decision "is this important?" is not made in the moment, by a model that has known you for five minutes. It is answered over time, by repetition or by fade. Memory that earns its place stays. Memory that doesn't, doesn't.

04

Let Python calculate. Let the LLM extract.

Date arithmetic, scoring, emotion vector composition, salience math — all in Python. The LLM is asked to extract structured data from natural language; it is not asked to do the arithmetic. The error rate halved the moment I stopped handing the model arithmetic to do. A language processor is not a calculator, and pretending it is one is a tax you keep paying.

05

Descriptive context beats imperative rules.

Instead of telling the model what not to do — "do not hallucinate, do not invent, do not confuse," a litany of negations — describe what each piece of context is. Block-labelled, structured contextualisation — [FACT FROM LTM], [USER STATED THIS TURN], [INFERRED, LOW CONFIDENCE] — outperforms any list of prohibitions. Show the model the provenance and let the prompt be short.

§ 5
What it is not

An honest list of limits.

This is not AGI. It is not a finished product. It is a research project by one developer with sixty-five logged development sessions over forty-five days and a visible backlog of bugs that have names. I am writing this paper with that backlog open in another window.

Gemma 4 on a consumer GPU has real limits. The 32 768-token context window gets tight on long sessions; we manage it with summarisation and selective recall, but the ceiling is the ceiling. Hallucination still happens — far less often than at the start, less often than I expected, but it happens. Latency under GPU pressure can spike to fifty-five seconds on a bad turn; the median is closer to a few seconds, but spikes are real and visible. The agent layer has dispatch bugs that live in the issue tracker, not in a marketing document.

But — and this is the architectural point — the design is model-agnostic. Better local models arrive every month. Whatever beats Gemma 4 next gets a one-line config switch, the prompt segregation provides model-specific overrides automatically, and the rest of the system does not care which transformer it is talking to. The lifecycle, the dual emotion stream, the proactive cognition layer, the typed state that ties them together — those are the work. The model is replaceable. The architecture is the thing that lasts.

Nor does Novaberg compete with GPT-5 on the things GPT-5 is for. A 26-billion parameter MoE model with 3.8 billion active parameters is not going to out-code a frontier system, and pretending otherwise would waste your time and mine. Novaberg does the other job. Memory. Continuity. Emotional grain. Proactive thought. On hardware you own.

§ 6
An invitation

Build your own. The pieces are on the table.

The hardware exists. The models exist. What is missing — and what I would like to see more of — are the architectures that make those models into something more than a chat window.

Novaberg is open source under Apache 2.0 on Codeberg. The code is documented in German. Variables are English; comments and prompts are in the user's first language. That was a deliberate decision, and it deserves its own paper. (The short version: prompts are clearer in the language you think in. The model speaks both.)

I did not build this because local AI is better than cloud AI. I built it because there are things that should belong to you.

Anticipated objections

ObjectionResponse
A 26B model can't compete with GPT-5. Correct. Said explicitly above. Novaberg does a different job — memory, continuity, emotional grain, proactive cognition. The architecture is model-agnostic; better local models arrive monthly, and the design outlasts any of them.
Why not just use ChatGPT with memory? No memory lifecycle — no short-term to long-term promotion, no decay, no consolidation into character dimensions. No proactive cognition; the system only thinks while you are typing. No emotional stream of its own. Custom Instructions are a sticky note on the monitor, not a brain.
Nobody needs an AI with emotions. The emotion is not theatre for the user. It is a control signal — together with mode, intent, arousal, and relational dynamic, it determines how Nova answers. The system listens to the user not only in words but in style, pace, and the signals between the lines. Part III of this series is dedicated to this point.
This doesn't scale. It is not supposed to. Personal means one system per person. Like a diary. Scaling is somebody else's problem; intimacy is the design constraint here.
Why German prompts? A deliberate choice for the user's first language. The model understands German fluently, and prompts are more natural and more precise in the language you think in. Variables stay English; comments and prompts follow the user. A future deployment in another language is a translation pass, not a redesign.
You're reinventing CrewAI / AutoGen / LangChain memory. I am not. Frameworks build conversation pipelines on shared infrastructure. Novaberg builds a typed-state cognitive architecture for a single person — five-layer memory, dual emotion, asynchronous thought. The Clipboard Pattern (Part II) explains why the typed state matters.
Palette