← Back to docs index

Design: PKMS (Personal Knowledge Management System)

Architecture Overview

Browser (HTMX + Alpine.js) ──HTTPS── nginx ──► FastAPI (pkms/server.py)
                                                    │
                          ┌─────────────────────────┼─────────────────────────┐
                          ▼                         ▼                         ▼
                    Wiki Store                  LLM Service              Source Store
                 /home/pankaj/pkms/         (litellm + deepseek)    /home/pankaj/pkms/raw/
                    wiki/*.md                                           *.pdf, *.txt
                    index.md
                    log.md
                    AGENTS.md (schema)

Components

1. Wiki Store (filesystem)

Data shapes are directories + markdown files. No database for wiki content — Git-friendliness and Obsidian compatibility are the point.

/home/pankaj/pkms/
├── AGENTS.md              # Schema: how the LLM behaves (Layer 3)
├── raw/                   # Immutable sources (Layer 1)
│   └── *.pdf, *.txt
├── wiki/                  # LLM-owned pages (Layer 2)
│   ├── index.md           # Master index, auto-updated
│   ├── log.md             # Operation log, appended on every ingest
│   └── ...                # LLM decides structure (concepts/, people/, etc.)
├── requirements/          # Project docs
├── sw_design/             # This document
└── pm/                    # Todo + status

Key decision: The LLM decides the wiki subdirectory structure based on content. No enforced taxonomy. The LLM reads the source, identifies natural groupings, and creates pages accordingly. The schema (AGENTS.md) provides guardrails (use [[wikilinks]], keep pages focused, update index.md) but not a rigid taxonomy.

2. Web Server (FastAPI, port 8890)

Serves the web UI and API endpoints. Runs behind nginx at pkms.hermesbillpay.com.

Endpoints:

| Method | Path | Purpose | |--------|------|---------| | GET | / | Wiki browser homepage — shows index | | GET | /wiki/{path} | View a wiki page (rendered .md → HTML) | | GET | /raw | List uploaded sources with status | | POST | /upload | Upload source file(s) to raw/ | | POST | /ingest/{filename} | Trigger LLM ingest of a raw source | | POST | /query | Ask a question, LLM reads wiki, returns answer | | POST | /lint | Trigger LLM audit, returns health report | | GET | /status/{op_id} | Poll operation status (for long-running ingest) |

Frontend pages (HTMX + Alpine.js):

3. LLM Service (pure functions, litellm)

All LLM interactions go through litellm with the DeepSeek API key from ~/.hermes/.env.

Data shapes:

# Input to ingest
@dataclass(frozen=True)
class IngestInput:
    source_path: str       # e.g. "raw/bitter-lesson.pdf"
    source_content: str    # extracted text
    wiki_pages: dict[str, str]  # path → content of all existing wiki pages

# Output from ingest
@dataclass(frozen=True)
class IngestOutput:
    pages: dict[str, str]  # path → content for new/updated pages
    log_entry: str         # appended to log.md

# Input to query
@dataclass(frozen=True)
class QueryInput:
    question: str
    wiki_pages: dict[str, str]

# Output from query
@dataclass(frozen=True)
class QueryOutput:
    answer: str            # HTML-safe markdown
    citations: list[str]   # wiki page paths

# Input to lint
@dataclass(frozen=True)
class LintInput:
    wiki_pages: dict[str, str]

# Output from lint
@dataclass(frozen=True)
class LintOutput:
    contradictions: list[dict]
    orphans: list[str]
    missing_concepts: list[str]
    stale: list[str]

Functions (pure, no side effects):

def build_ingest_prompt(inp: IngestInput, schema: str) -> str
    """Construct the LLM prompt for ingest operation."""

def parse_ingest_response(response: str) -> IngestOutput
    """Parse LLM response into structured pages. Expects `### FILE: path`
    markers between pages."""

def build_query_prompt(inp: QueryInput) -> str
    """Construct prompt: 'Answer using ONLY the wiki pages below. Cite with [[links]].'"""

def parse_query_response(response: str) -> QueryOutput
    """Extract answer and [[citation]] links."""

def build_lint_prompt(inp: LintInput) -> str
    """Construct prompt asking LLM to find contradictions, orphans, missing concepts."""

def parse_lint_response(response: str) -> LintOutput
    """Parse the lint report."""

def extract_text(filepath: str) -> str
    """Extract readable text from PDF, TXT, or markdown files. Uses pymupdf for PDFs."""

def load_wiki_pages() -> dict[str, str]
    """Read all .md files from wiki/ directory. Returns {path: content}."""

LLM call:

def call_llm(prompt: str, system: str = "") -> str:
    """Call litellm with deepseek provider. Reads DEEPSEEK_API_KEY from env."""

4. Operations (wire functions to persistence)

These are the side-effectful functions that wire LLM calls to the filesystem:

def ingest_source(filename: str) -> str:
    """1. Extract text from raw/<filename>
       2. Load all wiki pages
       3. Build prompt → call LLM → parse response
       4. Write new/updated pages to wiki/
       5. Append to log.md
       6. Return operation summary"""

def query_wiki(question: str) -> QueryOutput:
    """1. Load all wiki pages
       2. Build prompt → call LLM → parse response
       3. Return answer + citations"""

def lint_wiki() -> LintOutput:
    """1. Load all wiki pages
       2. Build prompt → call LLM → parse response
       3. Return report"""

Data Flow

Ingest Flow

User uploads bitter-lesson.pdf
  → POST /upload → saved to raw/bitter-lesson.pdf
  → User clicks "Ingest"
  → POST /ingest/bitter-lesson.pdf
  → extract_text("raw/bitter-lesson.pdf") → source_content
  → load_wiki_pages() → wiki_pages
  → build_ingest_prompt(IngestInput(...), AGENTS.md) → prompt
  → call_llm(prompt) → response
  → parse_ingest_response(response) → IngestOutput
  → write pages to wiki/
  → append to log.md
  → redirect to /

Query Flow

User types: "How do Sutton and Karpathy agree?"
  → POST /query {question}
  → load_wiki_pages() → wiki_pages
  → build_query_prompt(QueryInput(...)) → prompt
  → call_llm(prompt) → response
  → parse_query_response(response) → QueryOutput
  → return HTML fragment with answer + [[citations]]

Lint Flow

User clicks "Lint Wiki"
  → POST /lint
  → load_wiki_pages() → wiki_pages
  → build_lint_prompt(LintInput(...)) → prompt
  → call_llm(prompt) → response
  → parse_lint_response(response) → LintOutput
  → return HTML fragment with report

Physical Architecture

| Component | Location | Details | |-----------|----------|---------| | Wiki files | /home/pankaj/pkms/wiki/ | Markdown, Obsidian-compatible | | Raw sources | /home/pankaj/pkms/raw/ | Immutable, PDF/TXT/MD | | Schema | /home/pankaj/pkms/AGENTS.md | LLM behavior rules | | Web server | /home/pankaj/pkms/server.py | FastAPI, port 8890 | | Venv | /home/pankaj/commerce-agent/.venv | Shared (reuse litellm etc.) | | LLM | litellm → DeepSeek API | Key from ~/.hermes/.env | | nginx | /etc/nginx/sites-enabled/pkms.hermesbillpay | Proxy to 8890 | | systemd | pkms-server.service | Auto-restart |


LLM Prompt Strategy

Ingest Prompt Pattern

You are a disciplined wiki maintainer. Your job is to read a source document
and compile it into the wiki.

Rules (from AGENTS.md):
- Use [[wikilinks]] for all cross-references
- One concept per page
- Update index.md to include new pages
- Append to log.md with what you did
- If the source contradicts an existing wiki page, flag it in log.md

Existing wiki pages:
[file path]: [content]
...

Source to ingest:
[source content]

Respond with updated/new pages using this format:
### FILE: wiki/path/to/page.md
[markdown content]

### FILE: wiki/log.md
[updated log content]

### FILE: wiki/index.md
[updated index content]

Query Prompt Pattern

Answer the question using ONLY the wiki pages below. Cite specific pages
using [[page path]] notation. If the wiki doesn't contain the answer, say so.

Wiki pages:
[file path]: [content]
...

Question: [user's question]

Lint Prompt Pattern

Audit the wiki below. Report:
1. Contradictions — two pages that say conflicting things
2. Orphan pages — pages with no [[links]] pointing to them
3. Missing concepts — important terms mentioned but lacking their own page
4. Stale content — pages that reference outdated information

Wiki pages:
[file path]: [content]
...

Dependencies

fastapi, uvicorn, python-multipart  (already in venv)
mistune                            (already in venv)
litellm                            (need to install — already a dep of hermes-agent)
pymupdf                            (PDF text extraction — install)

Build Order (HLI preview)

  1. llm_service.py — pure functions: prompt builders, response parsers, call_llm()
  2. wiki_store.pyload_wiki_pages(), extract_text(), file write helpers
  3. operations.pyingest_source(), query_wiki(), lint_wiki()
  4. server.py — FastAPI app, endpoints, HTML templates
  5. Wire to nginx — update pkms config to proxy to port 8890
  6. systemd — pkms-server.service