Files
main/PLAN.md
2026-06-12 18:21:11 +03:00

528 lines
21 KiB
Markdown

# Yandex Cloud Production Matrix Cluster — Migration Plan
## 0. Current State
### Existing Clusters
| Cluster | Type | Nodes | Purpose |
|---------|------|-------|---------|
| k3s homelab (`~/infra/k3s`) | self-managed k3s | 2 (oracle, sentinel) | GitOps-based, hosts test `mrt0rtikize.ru` ESS Community |
| Yandex Cloud prod (`kubectl` default) | managed k8s v1.32.1 | 3x 2CPU/6GB | Manual Helm, hosts 3 production Matrix instances |
### Current Prod Load (Yandex Cloud)
| Node | CPU | RAM | What runs there |
|------|-----|-----|-----------------|
| `ofem` | 6% | 32% | element-web x3, element-call x2, well-known x3, alloy, cert-manager-cainjector |
| `uzig` | 19% | **94%** | Grafana(682Mi) + Prometheus(499Mi) + VictoriaMetrics(327Mi) + Synapse-t0rt1k(582Mi) + 3x PostgreSQL + Alertmanager + 3x Redis + 3x LiveKit SFU + Traefik |
| `efur` | 10% | 51% | Synapse-roglog(134Mi) + Synapse-uretra(137Mi) + Loki(304Mi) + 2x mas-postgresql + cert-manager + coredns |
Key issue: **uzig at 94% RAM** — monitoring stack competes with busiest Synapse on same node.
### Prod Matrix Instances
| | `matrix-t0rt1k` | `matrix-roglog` | `matrix-uretra` |
|---|---|---|---|
| Domain | `t0rt1k.tech` | `roglog.space` | `uretra.space` |
| Age | 87d | 82d | 82d |
| Synapse RAM | 582Mi | 134Mi | 137Mi |
| Storage | 10+8+1 Gi (HDD) | 10+8+1 Gi (HDD) | 10+8+1 Gi (HDD) |
| Helm chart | `matrix-2.9.17` (NOT ESS) | same | same |
| MAS migration | Failed (`syn2mas` job) | OK | OK |
| Components per instance | Synapse, Element Web, Element Call, LiveKit SFU + Redis + JWT, MAS, 2x PostgreSQL, well-known | same | same |
| Ingress | Traefik, `158.160.164.95` | same LB | same LB |
| TLS | cert-manager + `letsencrypt-production` | same | same |
### Test Instance (k3s homelab)
| Property | Value |
|----------|-------|
| Domain | `mrt0rtikize.ru` |
| Chart | ESS Community (`oci://ghcr.io/element-hq/ess-helm/matrix-stack` v26.6.1) |
| Namespace | `matrix-mrt0rtikize` |
| Components | Synapse, MAS, Element Web, Element Admin, Matrix RTC, Hookshot, HAProxy |
| PostgreSQL | Built-in (chart-managed) |
| Storage | Longhorn |
| GitOps | ArgoCD, repo at `gitea.mrt0rtikize.ru` |
---
## 1. New Cluster Architecture
### 1.1 Platform
- Yandex Cloud Managed Kubernetes (new cluster)
- Ability to add external nodes in future (supported experimentally, not needed now)
- Managed control plane, self-managed worker nodes
### 1.2 GitOps Foundation
| Component | How | Notes |
|-----------|-----|-------|
| Gitea | `kubectl apply` from `bootstrap/gitea/` | Self-hosted git server, deployed first (before ArgoCD) |
| ArgoCD | `helm install` via `bootstrap/argocd/install.sh` | Installed with `--insecure` (same as k3s), points to Gitea |
| Root App | `argocd/app-of-apps.yaml` | Scans `argocd/apps/*.yaml` recursively, deploys everything else |
### 1.3 Infrastructure Components
| Component | Type | Values | Notes |
|-----------|------|--------|-------|
| cert-manager | ArgoCD Helm app | `installCRDs: true`, `ClusterIssuer: letsencrypt-production` | TLS for all ingresses |
| CloudNativePG Operator | ArgoCD Helm app | `cluster.instances: 3`, `storageClass: yc-network-ssd`, `size: 50Gi`, `podAntiAffinityType: required` | HA PostgreSQL for all Matrix instances |
| Prometheus Stack | ArgoCD Helm app | Ported from `k3s/manifests/metrics/kube-prometheus-stack-values.yaml`, remoteWrite to VictoriaMetrics | Monitoring + Alertmanager |
| VictoriaMetrics | ArgoCD Helm app | Ported from `k3s/manifests/metrics/victoria-metrics-single-values.yaml` | Long-term metrics storage |
| Loki | ArgoCD Helm app | Log aggregation | — |
| Alloy/Grafana Alloy | ArgoCD Helm app | Agent for metrics/logs forwarding | — |
| Traefik | Managed by Yandex (or DaemonSet) | Cluster's built-in ingress controller | LB external IP provisioned by Yandex Cloud |
### 1.4 ESS Instances
Each Matrix homeserver is a separate ArgoCD Application referencing the ESS chart:
```
argocd/apps/
├── matrix-mrt0rtikize.yaml (first, test migration)
├── matrix-t0rt1k.yaml (production, after procedure proven)
├── matrix-roglog.yaml
└── matrix-uretra.yaml
```
Each uses the **shared CloudNativePG cluster** (not built-in PostgreSQL).
### 1.5 Directory Structure
```
~/infra/yandex-prod/
├── bootstrap/
│ ├── gitea/
│ │ ├── namespace.yaml
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ ├── ingress.yaml
│ │ └── pvc.yaml
│ └── argocd/
│ ├── install.sh
│ └── values.yaml
├── argocd/
│ ├── app-of-apps.yaml
│ └── apps/
│ ├── cert-manager.yaml
│ ├── cnpg-operator.yaml
│ ├── cnpg-cluster.yaml
│ ├── monitoring.yaml
│ ├── loki.yaml
│ ├── matrix-mrt0rtikize.yaml
│ ├── matrix-t0rt1k.yaml
│ ├── matrix-roglog.yaml
│ └── matrix-uretra.yaml
└── manifests/
├── cnpg/
│ ├── namespace.yaml
│ ├── databases.yaml # Database CRs per homeserver
│ └── secrets.yaml # PG credentials per homeserver (or generated)
└── matrix-mrt0rtikize/
└── (supplemental manifests, if any)
```
### 1.6 CloudNativePG Architecture
```
CloudNativePG Cluster "shared-pg" (namespace: cnpg, 3 instances)
├── Instance 1 (node A)
├── Instance 2 (node B) ← anti-affinity ensures spread
└── Instance 3 (node C)
Databases (one pair per homeserver):
├── synapse_mrt0rtikize (owner: synapse_mrt0rtikize)
├── mas_mrt0rtikize (owner: mas_mrt0rtikize)
├── synapse_t0rt1k (owner: synapse_t0rt1k)
├── mas_t0rt1k (owner: mas_t0rt1k)
├── synapse_roglog (owner: synapse_roglog)
├── mas_roglog (owner: mas_roglog)
├── synapse_uretra (owner: synapse_uretra)
└── mas_uretra (owner: mas_uretra)
Service: shared-pg-rw.cnpg.svc.cluster.local:5432 (primary, read-write)
shared-pg-ro.cnpg.svc.cluster.local:5432 (replicas, read-only)
```
Each homeserver has a dedicated PostgreSQL role and database within the same cluster.
Databases and roles are created via CNPG `Database` CRs (see `manifests/cnpg/databases.yaml`).
Credentials are stored in per-homeserver Kubernetes Secrets (`manifests/cnpg/secrets.yaml`),
referenced by ESS via `existingSecret` / `existingSecretKey`.
### 1.7 ESS Configuration (per instance)
```yaml
# Shared across all instances:
serverName: <domain>
certManager:
clusterIssuer: letsencrypt-production
ingress:
className: traefik # or whatever Yandex provides
# PostgreSQL — external, shared CNPG cluster:
postgres:
enabled: false
synapse:
postgres:
host: shared-pg-rw.cnpg.svc.cluster.local
database: synapse_<name>
user: synapse_<name>
existingSecret: <name>-pg-creds
existingSecretKey: synapse
media:
storage:
size: 10Gi # adjustable per instance load
storageClassName: yc-network-hdd # media is fine on HDD
ingress:
host: matrix.<domain>
matrixAuthenticationService:
postgres:
host: shared-pg-rw.cnpg.svc.cluster.local
database: mas_<name>
user: mas_<name>
existingSecret: <name>-pg-creds
existingSecretKey: mas
ingress:
host: account.<domain>
elementWeb:
ingress:
host: chat.<domain>
elementAdmin:
ingress:
host: admin.<domain>
matrixRTC:
ingress:
host: mrtc.<domain>
hookshot:
enabled: true
# ingress host if needed for webhooks
```
### 1.8 Boot Order
```
Step 1: kubectl apply bootstrap/gitea/
Step 2: helm install argocd (bootstrap/argocd/install.sh)
Step 3: git push manifests/ + argocd/ to Gitea
Step 4: kubectl apply argocd/app-of-apps.yaml
Step 5: ArgoCD syncs cert-manager, CNPG operator, CNPG cluster, databases, monitoring, ESS
```
---
## 2. Migration Procedure: `mrt0rtikize.ru` (test instance)
> Perform on the test instance first to validate the procedure before touching production.
### 2.1 Backup (on k3s homelab)
```bash
NS=matrix-mrt0rtikize
# 1. Stop Synapse + MAS
kubectl scale sts -l "app.kubernetes.io/component=matrix-server" -n $NS --replicas=0
kubectl scale deploy -l "app.kubernetes.io/component=matrix-authentication" -n $NS --replicas=0
# 2. Dump PostgreSQL (built-in PG, release name is "ess" but pods are named matrix-mrt0rtikize-*)
# The PG pod is named based on the ESS release. Find it:
PG_POD=$(kubectl get pods -n $NS -l "app.kubernetes.io/name=postgres" -o name | head -1)
kubectl exec -n $NS $PG_POD -- pg_dumpall -U postgres > dump-mrt0rtikize.sql
# 3. Backup generated secrets (CRITICAL — contains signing key, MAS encryption key)
kubectl get secret matrix-mrt0rtikize-generated -n $NS -o yaml > secrets-mrt0rtikize.yaml
# 4. Backup deployment markers
kubectl get configmap \
-l "app.kubernetes.io/managed-by=matrix-tools-deployment-markers" \
-n $NS -o yaml > markers-mrt0rtikize.yaml
# 5. Backup media files
# Find PV path from the node:
kubectl get pv -n $NS -o yaml | grep -A5 "synapse-media"
# Copy from the reported path on the node to a safe location
# 6. Save ESS values (from the ArgoCD Application or helm get values)
kubectl get application matrix-mrt0rtikize -n argocd -o yaml > app-mrt0rtikize.yaml
```
**Critical data that MUST be preserved:**
| Data | Location | Why |
|------|----------|-----|
| `SYNAPSE_SIGNING_KEY` | `matrix-mrt0rtikize-generated` secret | Federation identity — all other servers know this key. Lose it = all rooms break. |
| `MAS_ENCRYPTION_SECRET` | same secret | User session encryption. Lose it = all users must re-login. |
| `MAS_RSA_PRIVATE_KEY` | same secret | OIDC signing. Lose it = re-auth needed. |
| `SYNAPSE_MACAROON` | same secret | Admin API access token. |
| PostgreSQL dump | `dump-mrt0rtikize.sql` | All user accounts, rooms, messages. |
| Media files | Synapse media PV | Uploaded images/files/avatars. |
### 2.2 Restore (on new Yandex cluster)
```bash
# 1. Create secrets in the matrix-mrt0rtikize namespace
NS=matrix-mrt0rtikize
kubectl create ns $NS
# Apply the generated secrets (signing key etc — DO NOT let initSecrets regenerate it)
kubectl apply -f secrets-mrt0rtikize.yaml
kubectl apply -f markers-mrt0rtikize.yaml
# 2. Restore PostgreSQL dumps
# CNPG service: shared-pg-rw.cnpg.svc.cluster.local
# Extract per-DB dumps from pg_dumpall or use pg_restore:
PG_POD=$(kubectl get pods -n cnpg -l "cnpg.io/cluster=shared-pg,cnpg.io/podRole=instance" -o name | head -1)
# Restore Synapse DB:
kubectl exec -n cnpg $PG_POD -- psql -U synapse_mrt0rtikize \
-d synapse_mrt0rtikize < dump-mrt0rtikize.sql
# Restore MAS DB:
kubectl exec -n cnpg $PG_POD -- psql -U mas_mrt0rtikize \
-d mas_mrt0rtikize < dump-mrt0rtikize.sql
# (Note: pg_dumpall produces a single file for all databases. You may need to
# split it per-database first, or use pg_restore per-database.)
# 3. Restore media files
# Copy from backup to the new PV (path depends on storage class)
# For Yandex Cloud CSI: mount the PV on a temp pod and copy files in
# 4. Deploy ESS via ArgoCD
# The Application was already committed to git (argocd/apps/matrix-mrt0rtikize.yaml).
# ArgoCD syncs it. Since secrets + markers are pre-loaded, the chart initializes
# with the existing signing key and database credentials.
# 5. Verify
# - Log in with an existing user
# - Check federation: https://federationtester.matrix.org/?server_name=mrt0rtikize.ru
# - Test Element Call (VoIP)
# - Monitor logs for errors
```
### 2.3 DNS Cutover
Once validated:
```
Old records: mrt0rtikize.ru → k3s cluster IP
*.mrt0rtikize.ru → k3s cluster IP
New records: mrt0rtikize.ru → new cluster Traefik LB IP
matrix.mrt0rtikize.ru → new cluster LB
account.mrt0rtikize.ru → new cluster LB
chat.mrt0rtikize.ru → new cluster LB
admin.mrt0rtikize.ru → new cluster LB
mrtc.mrt0rtikize.ru → new cluster LB
```
Lower DNS TTLs 24h before cutover to minimize propagation delay.
### 2.4 Rollback
If migration fails:
1. Scale down Synapse + MAS on new cluster
2. Revert DNS to k3s cluster IP
3. Scale up Synapse + MAS on k3s homelab
The old instance on k3s should still be functional (just stopped, not deleted).
---
## 3. Production Migration (vague plan)
> Repeat steps from Section 2 for each production instance, one at a time.
### 3.1 Order
| # | Instance | Synapse Load | Complexity |
|---|----------|--------------|------------|
| 1 | `mrt0rtikize.ru` | Minimal (test) | Low — prove procedure |
| 2 | `t0rt1k.tech` | **582Mi** (busiest) | High — schedule during low-traffic, may need extended downtime |
| 3 | `roglog.space` | 134Mi | Medium |
| 4 | `uretra.space` | 137Mi | Medium |
### 3.2 Pre-migration Checklist (per instance)
```
[ ] Announce maintenance window to users
[ ] Lower DNS TTLs (24h before)
[ ] Full PostgreSQL dump + verify (pg_restore --list)
[ ] Backup media files + verify checksums
[ ] Backup generated secrets (verify signing key matches federation)
[ ] Save current Helm values (helm get values)
[ ] Document current ingress/DNS/Certificate setup
[ ] Prepare rollback procedure
```
### 3.3 Migration Steps (per instance)
```
1. Stop Synapse + MAS on old cluster
2. Create CNPG databases on new cluster
3. Restore PostgreSQL dump to CNPG
4. Restore media files to new PV
5. Apply secrets (signing key, MAS keys, macaroon)
6. Apply deployment markers
7. Deploy ESS via ArgoCD on new cluster
8. Wait for pods healthy, certs issued
9. Test: login, federation, Element Call
10. Cut over DNS
11. Monitor for 24h
12. If stable: remove old instance resources from old cluster
```
### 3.4 Special Considerations for Prod Instances
**`t0rt1k.tech` (busiest instance, 582Mi Synapse):**
- Uses older `matrix-2.9.17` chart (NOT ESS). Migration means switching to ESS Community chart.
- Has a **failed `syn2mas` job** — MAS migration was incomplete. When deploying ESS which bundles MAS, the migration may need to be completed or re-done.
- The 582Mi memory usage suggests many concurrent users/rooms — dump may be large. Allocate enough storage and time for the SQL dump/restore.
- Consider running the new ESS in parallel (different hostnames) first, then switching DNS once proven.
**`roglog.space` and `uretra.space`:**
- Lower load (134Mi/137Mi) — quicker backups, less downtime risk.
- Same chart switch (`matrix-2.9.17` → ESS).
- Can be done in shorter windows.
**Chart migration (`matrix-2.9.17` → ESS):**
- The old chart uses separate Helm releases per component (`chat`, `element-call`, `livekit`).
- ESS bundles everything into one chart. The database schema may differ.
- Key difference: ESS uses MAS for auth (Matrix 2.0), old chart may use legacy Synapse auth.
- May need to run `syn2mas` migration or manual user migration. Investigate per-instance before cutover.
---
## 4. PostgreSQL Backup (ongoing)
CloudNativePG has built-in backup to S3-compatible storage. Configure once for automatic daily backups:
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: shared-pg-daily
namespace: cnpg
spec:
schedule: "0 3 * * *" # 03:00 UTC daily
backupOwnerReference: self
cluster:
name: shared-pg
immediate: false
target: prefer-standby
```
CNPG also supports continuous WAL archiving to S3 for point-in-time recovery.
Configure Yandex Object Storage as the S3 target.
---
## 5. Architecture Diagram (text)
```
┌────────────────────────────────────────────────────────────┐
│ Yandex Cloud Managed K8s │
│ │
│ ┌───────────────────┐ ┌──────────────────────────────┐ │
│ │ Infrastructure │ │ Matrix Layer │ │
│ │ │ │ │ │
│ │ Gitea (git) │ │ ┌─────────────────────────┐ │ │
│ │ ArgoCD (gitops) │ │ │ matrix-mrt0rtikize (ESS)│ │ │
│ │ cert-manager │ │ │ - Synapse │ │ │
│ │ Traefik (LB) │ │ │ - MAS │ │ │
│ │ Prometheus/Grafana │ │ │ - Element Web/Admin │ │ │
│ │ VictoriaMetrics │ │ │ - Matrix RTC (LiveKit) │ │ │
│ │ Loki │ │ │ - Hookshot │ │ │
│ │ Alloy │ │ │ - HAProxy │ │ │
│ └───────────────────┘ │ └──────────┬──────────────┘ │ │
│ │ │ │ │
│ ┌───────────────────┐ │ ┌──────────▼──────────────┐ │ │
│ │ CNPG Cluster │ │ │ matrix-t0rt1k (ESS) │ │ │
│ │ (3 nodes, SSD) │◄──┤ │ (same structure) │ │ │
│ │ │ │ └─────────────────────────┘ │ │
│ │ synapse_mrt0rtikize│ │ ┌─────────────────────────┐ │ │
│ │ mas_mrt0rtikize │ │ │ matrix-roglog (ESS) │ │ │
│ │ synapse_t0rt1k │ │ │ (same structure) │ │ │
│ │ mas_t0rt1k │ │ └─────────────────────────┘ │ │
│ │ synapse_roglog │ │ ┌─────────────────────────┐ │ │
│ │ mas_roglog │ │ │ matrix-uretra (ESS) │ │ │
│ │ synapse_uretra │ │ │ (same structure) │ │ │
│ │ mas_uretra │ │ └─────────────────────────┘ │ │
│ └───────────────────┘ └──────────────────────────────┘ │
│ │
│ External LB: <Yandex provisioned IP> │
└────────────────────────────────────────────────────────────┘
```
---
## 6. Implementation Notes
### 6.1 Secrets Management
- ESS `initSecrets` generates 14 credentials. For migration, these MUST be restored from backup (not regenerated).
- `SYNAPSE_SIGNING_KEY` is the most critical — it identifies the server to the federation. Changing it breaks all existing rooms and federation relationships.
- The `matrix*-generated` secret and deployment markers ConfigMap must be applied **before** the first ArgoCD sync, so the ESS chart does not generate new (wrong) ones.
- For fresh ESS instances (new homeservers, not migrations), let `initSecrets` generate them normally.
### 6.2 Image Registry
- ESS pulls from `oci.element.io` (Synapse, Element Web, Element Admin, lk-jwt-service) and `ghcr.io` (matrix-tools, hookshot), and `docker.io` (livekit, postgres, redis).
- `oci.element.io` S3 backend (`oci-element-io-images-storage-prod.s3.eu-central-1.amazonaws.com`) was observed to fail intermittently from Russia with "connection reset by peer". Images eventually pulled on retry, but consider:
- Setting `image.pullPolicy: IfNotPresent` to reduce re-pulls
- Setting up a containerd registry mirror or local pull-through cache for `oci.element.io`
- Pre-pulling images to nodes during initial setup
### 6.3 Resource Limits
Set `resources.requests` and `resources.limits` on all ESS components to prevent the 94% node issue seen in prod:
```yaml
synapse:
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 1Gi
cpu: 1000m
```
Do similar for MAS, element-web, livekit-sfu, etc. ESS chart supports per-component resource configuration.
### 6.4 Storage Classes
| Workload | Storage Class | Reason |
|----------|--------------|--------|
| PostgreSQL (CNPG) | `yc-network-ssd` | Database — needs low latency / high IOPS |
| Synapse media | `yc-network-hdd` (default) | Media files — sequential access, SSD benefit is marginal |
| Prometheus TSDB | `yc-network-ssd` | Time-series DB — random writes benefit from SSD |
| Loki chunks | `yc-network-hdd` | Log storage — sequential writes, HDD is fine |
---
## 7. Next Steps (for next session)
When the new cluster is ready, open a new session and point to this file. The next session should:
1. Read this plan
2. Explore the new cluster (nodes, storage classes, ingress config)
3. Implement Phase 0 (bootstrap GitOps foundation):
- Create `~/infra/yandex-prod/` directory structure
- Write `bootstrap/gitea/` manifests
- Write `bootstrap/argocd/install.sh` + `values.yaml`
- Write `argocd/app-of-apps.yaml`
- Write infrastructure apps (cert-manager, CNPG, monitoring)
- Write ESS apps
- Push to Gitea
4. Execute Phase 1 (backup `mrt0rtikize.ru` from k3s)
5. Execute Phase 2 (restore `mrt0rtikize.ru` to new cluster)
6. Validate and plan DNS cutover