admin/main

Fork 0

Files

Alexander Rogov 0b698ccfcc fix repo URL and branch refs

2026-06-12 18:21:11 +03:00

21 KiB

Raw Blame History

Yandex Cloud Production Matrix Cluster — Migration Plan

0. Current State

Existing Clusters

Cluster	Type	Nodes	Purpose
k3s homelab (`~/infra/k3s`)	self-managed k3s	2 (oracle, sentinel)	GitOps-based, hosts test `mrt0rtikize.ru` ESS Community
Yandex Cloud prod (`kubectl` default)	managed k8s v1.32.1	3x 2CPU/6GB	Manual Helm, hosts 3 production Matrix instances

Current Prod Load (Yandex Cloud)

Node	CPU	RAM	What runs there
`ofem`	6%	32%	element-web x3, element-call x2, well-known x3, alloy, cert-manager-cainjector
`uzig`	19%	94%	Grafana(682Mi) + Prometheus(499Mi) + VictoriaMetrics(327Mi) + Synapse-t0rt1k(582Mi) + 3x PostgreSQL + Alertmanager + 3x Redis + 3x LiveKit SFU + Traefik
`efur`	10%	51%	Synapse-roglog(134Mi) + Synapse-uretra(137Mi) + Loki(304Mi) + 2x mas-postgresql + cert-manager + coredns

Key issue: uzig at 94% RAM — monitoring stack competes with busiest Synapse on same node.

Prod Matrix Instances

	`matrix-t0rt1k`	`matrix-roglog`	`matrix-uretra`
Domain	`t0rt1k.tech`	`roglog.space`	`uretra.space`
Age	87d	82d	82d
Synapse RAM	582Mi	134Mi	137Mi
Storage	10+8+1 Gi (HDD)	10+8+1 Gi (HDD)	10+8+1 Gi (HDD)
Helm chart	`matrix-2.9.17` (NOT ESS)	same	same
MAS migration	Failed (`syn2mas` job)	OK	OK
Components per instance	Synapse, Element Web, Element Call, LiveKit SFU + Redis + JWT, MAS, 2x PostgreSQL, well-known	same	same
Ingress	Traefik, `158.160.164.95`	same LB	same LB
TLS	cert-manager + `letsencrypt-production`	same	same

Test Instance (k3s homelab)

Property	Value
Domain	`mrt0rtikize.ru`
Chart	ESS Community (`oci://ghcr.io/element-hq/ess-helm/matrix-stack` v26.6.1)
Namespace	`matrix-mrt0rtikize`
Components	Synapse, MAS, Element Web, Element Admin, Matrix RTC, Hookshot, HAProxy
PostgreSQL	Built-in (chart-managed)
Storage	Longhorn
GitOps	ArgoCD, repo at `gitea.mrt0rtikize.ru`

1. New Cluster Architecture

1.1 Platform

Yandex Cloud Managed Kubernetes (new cluster)
Ability to add external nodes in future (supported experimentally, not needed now)
Managed control plane, self-managed worker nodes

1.2 GitOps Foundation

Component	How	Notes
Gitea	`kubectl apply` from `bootstrap/gitea/`	Self-hosted git server, deployed first (before ArgoCD)
ArgoCD	`helm install` via `bootstrap/argocd/install.sh`	Installed with `--insecure` (same as k3s), points to Gitea
Root App	`argocd/app-of-apps.yaml`	Scans `argocd/apps/*.yaml` recursively, deploys everything else

1.3 Infrastructure Components

Component	Type	Values	Notes
cert-manager	ArgoCD Helm app	`installCRDs: true`, `ClusterIssuer: letsencrypt-production`	TLS for all ingresses
CloudNativePG Operator	ArgoCD Helm app	`cluster.instances: 3`, `storageClass: yc-network-ssd`, `size: 50Gi`, `podAntiAffinityType: required`	HA PostgreSQL for all Matrix instances
Prometheus Stack	ArgoCD Helm app	Ported from `k3s/manifests/metrics/kube-prometheus-stack-values.yaml`, remoteWrite to VictoriaMetrics	Monitoring + Alertmanager
VictoriaMetrics	ArgoCD Helm app	Ported from `k3s/manifests/metrics/victoria-metrics-single-values.yaml`	Long-term metrics storage
Loki	ArgoCD Helm app	Log aggregation	—
Alloy/Grafana Alloy	ArgoCD Helm app	Agent for metrics/logs forwarding	—
Traefik	Managed by Yandex (or DaemonSet)	Cluster's built-in ingress controller	LB external IP provisioned by Yandex Cloud

1.4 ESS Instances

Each Matrix homeserver is a separate ArgoCD Application referencing the ESS chart:

argocd/apps/
├── matrix-mrt0rtikize.yaml   (first, test migration)
├── matrix-t0rt1k.yaml        (production, after procedure proven)
├── matrix-roglog.yaml
└── matrix-uretra.yaml

Each uses the shared CloudNativePG cluster (not built-in PostgreSQL).

1.5 Directory Structure

~/infra/yandex-prod/
├── bootstrap/
│   ├── gitea/
│   │   ├── namespace.yaml
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── ingress.yaml
│   │   └── pvc.yaml
│   └── argocd/
│       ├── install.sh
│       └── values.yaml
├── argocd/
│   ├── app-of-apps.yaml
│   └── apps/
│       ├── cert-manager.yaml
│       ├── cnpg-operator.yaml
│       ├── cnpg-cluster.yaml
│       ├── monitoring.yaml
│       ├── loki.yaml
│       ├── matrix-mrt0rtikize.yaml
│       ├── matrix-t0rt1k.yaml
│       ├── matrix-roglog.yaml
│       └── matrix-uretra.yaml
└── manifests/
    ├── cnpg/
    │   ├── namespace.yaml
    │   ├── databases.yaml         # Database CRs per homeserver
    │   └── secrets.yaml           # PG credentials per homeserver (or generated)
    └── matrix-mrt0rtikize/
        └── (supplemental manifests, if any)

1.6 CloudNativePG Architecture

CloudNativePG Cluster "shared-pg" (namespace: cnpg, 3 instances)
├── Instance 1 (node A)
├── Instance 2 (node B)   ← anti-affinity ensures spread
└── Instance 3 (node C)

Databases (one pair per homeserver):
├── synapse_mrt0rtikize  (owner: synapse_mrt0rtikize)
├── mas_mrt0rtikize      (owner: mas_mrt0rtikize)
├── synapse_t0rt1k       (owner: synapse_t0rt1k)
├── mas_t0rt1k           (owner: mas_t0rt1k)
├── synapse_roglog       (owner: synapse_roglog)
├── mas_roglog           (owner: mas_roglog)
├── synapse_uretra       (owner: synapse_uretra)
└── mas_uretra           (owner: mas_uretra)

Service: shared-pg-rw.cnpg.svc.cluster.local:5432  (primary, read-write)
         shared-pg-ro.cnpg.svc.cluster.local:5432  (replicas, read-only)

Each homeserver has a dedicated PostgreSQL role and database within the same cluster. Databases and roles are created via CNPG Database CRs (see manifests/cnpg/databases.yaml).

Credentials are stored in per-homeserver Kubernetes Secrets (manifests/cnpg/secrets.yaml), referenced by ESS via existingSecret / existingSecretKey.

1.7 ESS Configuration (per instance)

# Shared across all instances:
serverName: <domain>
certManager:
  clusterIssuer: letsencrypt-production
ingress:
  className: traefik  # or whatever Yandex provides

# PostgreSQL — external, shared CNPG cluster:
postgres:
  enabled: false

synapse:
  postgres:
    host: shared-pg-rw.cnpg.svc.cluster.local
    database: synapse_<name>
    user: synapse_<name>
    existingSecret: <name>-pg-creds
    existingSecretKey: synapse
  media:
    storage:
      size: 10Gi           # adjustable per instance load
      storageClassName: yc-network-hdd  # media is fine on HDD
  ingress:
    host: matrix.<domain>

matrixAuthenticationService:
  postgres:
    host: shared-pg-rw.cnpg.svc.cluster.local
    database: mas_<name>
    user: mas_<name>
    existingSecret: <name>-pg-creds
    existingSecretKey: mas
  ingress:
    host: account.<domain>

elementWeb:
  ingress:
    host: chat.<domain>

elementAdmin:
  ingress:
    host: admin.<domain>

matrixRTC:
  ingress:
    host: mrtc.<domain>

hookshot:
  enabled: true
  # ingress host if needed for webhooks

1.8 Boot Order

Step 1: kubectl apply bootstrap/gitea/
Step 2: helm install argocd (bootstrap/argocd/install.sh)
Step 3: git push manifests/ + argocd/ to Gitea
Step 4: kubectl apply argocd/app-of-apps.yaml
Step 5: ArgoCD syncs cert-manager, CNPG operator, CNPG cluster, databases, monitoring, ESS

2. Migration Procedure: `mrt0rtikize.ru` (test instance)

Perform on the test instance first to validate the procedure before touching production.

2.1 Backup (on k3s homelab)

NS=matrix-mrt0rtikize

# 1. Stop Synapse + MAS
kubectl scale sts -l "app.kubernetes.io/component=matrix-server" -n $NS --replicas=0
kubectl scale deploy -l "app.kubernetes.io/component=matrix-authentication" -n $NS --replicas=0

# 2. Dump PostgreSQL (built-in PG, release name is "ess" but pods are named matrix-mrt0rtikize-*)
# The PG pod is named based on the ESS release. Find it:
PG_POD=$(kubectl get pods -n $NS -l "app.kubernetes.io/name=postgres" -o name | head -1)
kubectl exec -n $NS $PG_POD -- pg_dumpall -U postgres > dump-mrt0rtikize.sql

# 3. Backup generated secrets (CRITICAL — contains signing key, MAS encryption key)
kubectl get secret matrix-mrt0rtikize-generated -n $NS -o yaml > secrets-mrt0rtikize.yaml

# 4. Backup deployment markers
kubectl get configmap \
  -l "app.kubernetes.io/managed-by=matrix-tools-deployment-markers" \
  -n $NS -o yaml > markers-mrt0rtikize.yaml

# 5. Backup media files
# Find PV path from the node:
kubectl get pv -n $NS -o yaml | grep -A5 "synapse-media"
# Copy from the reported path on the node to a safe location

# 6. Save ESS values (from the ArgoCD Application or helm get values)
kubectl get application matrix-mrt0rtikize -n argocd -o yaml > app-mrt0rtikize.yaml

Critical data that MUST be preserved:

Data	Location	Why
`SYNAPSE_SIGNING_KEY`	`matrix-mrt0rtikize-generated` secret	Federation identity — all other servers know this key. Lose it = all rooms break.
`MAS_ENCRYPTION_SECRET`	same secret	User session encryption. Lose it = all users must re-login.
`MAS_RSA_PRIVATE_KEY`	same secret	OIDC signing. Lose it = re-auth needed.
`SYNAPSE_MACAROON`	same secret	Admin API access token.
PostgreSQL dump	`dump-mrt0rtikize.sql`	All user accounts, rooms, messages.
Media files	Synapse media PV	Uploaded images/files/avatars.

2.2 Restore (on new Yandex cluster)

# 1. Create secrets in the matrix-mrt0rtikize namespace
NS=matrix-mrt0rtikize
kubectl create ns $NS

# Apply the generated secrets (signing key etc — DO NOT let initSecrets regenerate it)
kubectl apply -f secrets-mrt0rtikize.yaml
kubectl apply -f markers-mrt0rtikize.yaml

# 2. Restore PostgreSQL dumps
# CNPG service: shared-pg-rw.cnpg.svc.cluster.local
# Extract per-DB dumps from pg_dumpall or use pg_restore:
PG_POD=$(kubectl get pods -n cnpg -l "cnpg.io/cluster=shared-pg,cnpg.io/podRole=instance" -o name | head -1)

# Restore Synapse DB:
kubectl exec -n cnpg $PG_POD -- psql -U synapse_mrt0rtikize \
  -d synapse_mrt0rtikize < dump-mrt0rtikize.sql

# Restore MAS DB:
kubectl exec -n cnpg $PG_POD -- psql -U mas_mrt0rtikize \
  -d mas_mrt0rtikize < dump-mrt0rtikize.sql

# (Note: pg_dumpall produces a single file for all databases. You may need to
#  split it per-database first, or use pg_restore per-database.)

# 3. Restore media files
# Copy from backup to the new PV (path depends on storage class)
# For Yandex Cloud CSI: mount the PV on a temp pod and copy files in

# 4. Deploy ESS via ArgoCD
# The Application was already committed to git (argocd/apps/matrix-mrt0rtikize.yaml).
# ArgoCD syncs it. Since secrets + markers are pre-loaded, the chart initializes
# with the existing signing key and database credentials.

# 5. Verify
# - Log in with an existing user
# - Check federation: https://federationtester.matrix.org/?server_name=mrt0rtikize.ru
# - Test Element Call (VoIP)
# - Monitor logs for errors

2.3 DNS Cutover

Once validated:

Old records:  mrt0rtikize.ru → k3s cluster IP
              *.mrt0rtikize.ru → k3s cluster IP

New records:  mrt0rtikize.ru → new cluster Traefik LB IP
              matrix.mrt0rtikize.ru → new cluster LB
              account.mrt0rtikize.ru → new cluster LB
              chat.mrt0rtikize.ru → new cluster LB
              admin.mrt0rtikize.ru → new cluster LB
              mrtc.mrt0rtikize.ru → new cluster LB

Lower DNS TTLs 24h before cutover to minimize propagation delay.

2.4 Rollback

If migration fails:

Scale down Synapse + MAS on new cluster
Revert DNS to k3s cluster IP
Scale up Synapse + MAS on k3s homelab

The old instance on k3s should still be functional (just stopped, not deleted).

3. Production Migration (vague plan)

Repeat steps from Section 2 for each production instance, one at a time.

3.1 Order

#	Instance	Synapse Load	Complexity
1	`mrt0rtikize.ru`	Minimal (test)	Low — prove procedure
2	`t0rt1k.tech`	582Mi (busiest)	High — schedule during low-traffic, may need extended downtime
3	`roglog.space`	134Mi	Medium
4	`uretra.space`	137Mi	Medium

3.2 Pre-migration Checklist (per instance)

[ ] Announce maintenance window to users
[ ] Lower DNS TTLs (24h before)
[ ] Full PostgreSQL dump + verify (pg_restore --list)
[ ] Backup media files + verify checksums
[ ] Backup generated secrets (verify signing key matches federation)
[ ] Save current Helm values (helm get values)
[ ] Document current ingress/DNS/Certificate setup
[ ] Prepare rollback procedure

3.3 Migration Steps (per instance)

1. Stop Synapse + MAS on old cluster
2. Create CNPG databases on new cluster
3. Restore PostgreSQL dump to CNPG
4. Restore media files to new PV
5. Apply secrets (signing key, MAS keys, macaroon)
6. Apply deployment markers
7. Deploy ESS via ArgoCD on new cluster
8. Wait for pods healthy, certs issued
9. Test: login, federation, Element Call
10. Cut over DNS
11. Monitor for 24h
12. If stable: remove old instance resources from old cluster

3.4 Special Considerations for Prod Instances

t0rt1k.tech (busiest instance, 582Mi Synapse):

Uses older matrix-2.9.17 chart (NOT ESS). Migration means switching to ESS Community chart.
Has a failed syn2mas job — MAS migration was incomplete. When deploying ESS which bundles MAS, the migration may need to be completed or re-done.
The 582Mi memory usage suggests many concurrent users/rooms — dump may be large. Allocate enough storage and time for the SQL dump/restore.
Consider running the new ESS in parallel (different hostnames) first, then switching DNS once proven.

roglog.space and uretra.space:

Lower load (134Mi/137Mi) — quicker backups, less downtime risk.
Same chart switch (matrix-2.9.17 → ESS).
Can be done in shorter windows.

Chart migration (matrix-2.9.17 → ESS):

The old chart uses separate Helm releases per component (chat, element-call, livekit).
ESS bundles everything into one chart. The database schema may differ.
Key difference: ESS uses MAS for auth (Matrix 2.0), old chart may use legacy Synapse auth.
May need to run syn2mas migration or manual user migration. Investigate per-instance before cutover.

4. PostgreSQL Backup (ongoing)

CloudNativePG has built-in backup to S3-compatible storage. Configure once for automatic daily backups:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: shared-pg-daily
  namespace: cnpg
spec:
  schedule: "0 3 * * *"           # 03:00 UTC daily
  backupOwnerReference: self
  cluster:
    name: shared-pg
  immediate: false
  target: prefer-standby

CNPG also supports continuous WAL archiving to S3 for point-in-time recovery. Configure Yandex Object Storage as the S3 target.

5. Architecture Diagram (text)

┌────────────────────────────────────────────────────────────┐
│              Yandex Cloud Managed K8s                       │
│                                                             │
│  ┌───────────────────┐   ┌──────────────────────────────┐  │
│  │   Infrastructure   │   │        Matrix Layer           │  │
│  │                    │   │                               │  │
│  │  Gitea (git)       │   │  ┌─────────────────────────┐ │  │
│  │  ArgoCD (gitops)   │   │  │ matrix-mrt0rtikize (ESS)│ │  │
│  │  cert-manager      │   │  │ - Synapse                │ │  │
│  │  Traefik (LB)      │   │  │ - MAS                    │ │  │
│  │  Prometheus/Grafana │   │  │ - Element Web/Admin      │ │  │
│  │  VictoriaMetrics    │   │  │ - Matrix RTC (LiveKit)   │ │  │
│  │  Loki               │   │  │ - Hookshot               │ │  │
│  │  Alloy              │   │  │ - HAProxy                │ │  │
│  └───────────────────┘   │  └──────────┬──────────────┘ │  │
│                           │             │                 │  │
│  ┌───────────────────┐   │  ┌──────────▼──────────────┐ │  │
│  │   CNPG Cluster     │   │  │ matrix-t0rt1k (ESS)    │ │  │
│  │   (3 nodes, SSD)   │◄──┤  │ (same structure)       │ │  │
│  │                    │   │  └─────────────────────────┘ │  │
│  │  synapse_mrt0rtikize│  │  ┌─────────────────────────┐ │  │
│  │  mas_mrt0rtikize   │   │  │ matrix-roglog (ESS)    │ │  │
│  │  synapse_t0rt1k    │   │  │ (same structure)       │ │  │
│  │  mas_t0rt1k        │   │  └─────────────────────────┘ │  │
│  │  synapse_roglog    │   │  ┌─────────────────────────┐ │  │
│  │  mas_roglog        │   │  │ matrix-uretra (ESS)    │ │  │
│  │  synapse_uretra    │   │  │ (same structure)       │ │  │
│  │  mas_uretra        │   │  └─────────────────────────┘ │  │
│  └───────────────────┘   └──────────────────────────────┘  │
│                                                             │
│  External LB: <Yandex provisioned IP>                        │
└────────────────────────────────────────────────────────────┘

6. Implementation Notes

6.1 Secrets Management

ESS initSecrets generates 14 credentials. For migration, these MUST be restored from backup (not regenerated).
SYNAPSE_SIGNING_KEY is the most critical — it identifies the server to the federation. Changing it breaks all existing rooms and federation relationships.
The matrix*-generated secret and deployment markers ConfigMap must be applied before the first ArgoCD sync, so the ESS chart does not generate new (wrong) ones.
For fresh ESS instances (new homeservers, not migrations), let initSecrets generate them normally.

6.2 Image Registry

ESS pulls from oci.element.io (Synapse, Element Web, Element Admin, lk-jwt-service) and ghcr.io (matrix-tools, hookshot), and docker.io (livekit, postgres, redis).
oci.element.io S3 backend (oci-element-io-images-storage-prod.s3.eu-central-1.amazonaws.com) was observed to fail intermittently from Russia with "connection reset by peer". Images eventually pulled on retry, but consider:
- Setting image.pullPolicy: IfNotPresent to reduce re-pulls
- Setting up a containerd registry mirror or local pull-through cache for oci.element.io
- Pre-pulling images to nodes during initial setup

6.3 Resource Limits

Set resources.requests and resources.limits on all ESS components to prevent the 94% node issue seen in prod:

synapse:
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 1Gi
      cpu: 1000m

Do similar for MAS, element-web, livekit-sfu, etc. ESS chart supports per-component resource configuration.

6.4 Storage Classes

Workload	Storage Class	Reason
PostgreSQL (CNPG)	`yc-network-ssd`	Database — needs low latency / high IOPS
Synapse media	`yc-network-hdd` (default)	Media files — sequential access, SSD benefit is marginal
Prometheus TSDB	`yc-network-ssd`	Time-series DB — random writes benefit from SSD
Loki chunks	`yc-network-hdd`	Log storage — sequential writes, HDD is fine

7. Next Steps (for next session)

When the new cluster is ready, open a new session and point to this file. The next session should:

Read this plan
Explore the new cluster (nodes, storage classes, ingress config)
Implement Phase 0 (bootstrap GitOps foundation):
- Create ~/infra/yandex-prod/ directory structure
- Write bootstrap/gitea/ manifests
- Write bootstrap/argocd/install.sh + values.yaml
- Write argocd/app-of-apps.yaml
- Write infrastructure apps (cert-manager, CNPG, monitoring)
- Write ESS apps
- Push to Gitea
Execute Phase 1 (backup mrt0rtikize.ru from k3s)
Execute Phase 2 (restore mrt0rtikize.ru to new cluster)
Validate and plan DNS cutover

21 KiB Raw Blame History