Model setup (backend LLM) - Morpheus Lumerin Node

The Morpheus proxy-router does not run inference itself. It forwards prompts to whatever OpenAI-compatible HTTP endpoint you point it at via models-config.json. That endpoint is your “backend LLM” or “model server.”

This page is intentionally short. Picking, sizing, and operating an inference engine is its own discipline; we link to the canonical references rather than maintaining our own.

Common backends

llama.cpp / llama-server

Single-binary CPU/GPU inference. Bundled in our local-only demo.

vLLM

Production-grade GPU serving with continuous batching.

Ollama

Easy local model server, OpenAI-compatible.

Hosted reseller

Front Venice / OpenAI / Anthropic via apiUrl + apiKey in models-config.json. See Resale provider.

What the proxy-router needs from your backend

OpenAI-compatible route appropriate for the model type:
- LLM: /v1/chat/completions
- Embeddings: /v1/embeddings
- STT: /v1/audio/transcriptions
- TTS: /v1/audio/speech
A stable, private URL the proxy-router can reach (e.g. http://10.0.0.5:8080/v1/chat/completions).
Enough concurrency to satisfy the concurrentSlots you advertise in models-config.json.

Capacity recommendations

There is no one-size-fits-all sizing — start by measuring on your own hardware. The tech.mor.org calculators help estimate revenue and tokens-per-second across hardware tiers; mirror summary at tech.mor.org (mirror).

TEE backends

For full Phase 2 attestation, the backend itself must run inside a SecretVM-style TEE that exposes attestation endpoints on :29343 (/cpu, /gpu, /docker-compose). See:

TEE overview
TEE reference
Backend-side developer notes: proxy-router/docs/tee-backend-verification.md

Full P-Node quickstart Proxy-router on Docker

​Common backends

llama.cpp / llama-server

vLLM

Ollama

Hosted reseller

​What the proxy-router needs from your backend

​Capacity recommendations

​TEE backends

Common backends

What the proxy-router needs from your backend

Capacity recommendations

TEE backends