Skip to content
Operating on CI/CD Platform
Operating on CI/CD Platform

Operating on CI/CD Platform

Companion operations guide for CI/CD Platform — Gitea, Tekton, Zot, and Cosign.

Quick Health Check

# All CI/CD pods
kubectl get pods -n gitea -o wide
kubectl get pods -n tekton-pipelines -o wide
kubectl get pods -n zot -o wide

# ArgoCD app status
kubectl get applications -n argocd gitea gitea-extras \
  tekton-pipelines tekton-triggers tekton-dashboard tekton-extras \
  zot zot-extras

# ExternalSecrets sync status
kubectl get externalsecret -n gitea
kubectl get externalsecret -n tekton-pipelines
kubectl get externalsecret -n zot

Healthy state: all pods Running on pc-1, all ArgoCD apps Synced/Healthy, all ExternalSecrets SecretSynced.

$ kubectl get pods -n gitea -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP            NODE   NOMINATED NODE   READINESS GATES
gitea-6d7d457c49-t5hsj   1/1     Running   0          7h40m   10.244.7.57   pc-1   <none>           <none>

$ kubectl get pods -n tekton-pipelines -o wide
NAME                                                 READY   STATUS    RESTARTS        AGE     IP              NODE     NOMINATED NODE   READINESS GATES
el-gitea-listener-7d85fd8b75-8w9z6                   1/1     Running   3 (7h41m ago)   7d7h    10.244.7.226    pc-1     <none>           <none>
tekton-dashboard-774bff7cc-92cmn                     1/1     Running   0               22d     10.244.8.175    mini-3   <none>           <none>
tekton-events-controller-5cbc777ccd-7gvq7            1/1     Running   0               22d     10.244.8.144    mini-3   <none>           <none>
tekton-pipelines-controller-7496c46798-kgphw         1/1     Running   0               7d10h   10.244.12.233   mini-1   <none>           <none>
tekton-pipelines-webhook-75cd84877-tgctj             1/1     Running   0               22d     10.244.12.139   mini-1   <none>           <none>
tekton-triggers-controller-66fd74568d-m8zsv          1/1     Running   0               22d     10.244.8.254    mini-3   <none>           <none>
tekton-triggers-core-interceptors-66456f8cf6-blt4s   1/1     Running   0               22d     10.244.8.181    mini-3   <none>           <none>
tekton-triggers-webhook-55c8dd895f-j95ft             1/1     Running   0               22d     10.244.8.187    mini-3   <none>           <none>

$ kubectl get pods -n zot -o wide
NAME                   READY   STATUS    RESTARTS   AGE     IP             NODE   NOMINATED NODE   READINESS GATES
zot-68c79b95f9-6vbhh   1/1     Running   0          7h41m   10.244.7.202   pc-1   <none>           <none>

Gitea Operations

Mirror Sync Status

# Check mirror sync time via API
GITEA_URL="http://192.168.55.209:3000"
curl -s "$GITEA_URL/api/v1/repos/tekton-bot/frank" | jq '{
  mirror: .mirror,
  updated_at: .updated_at,
  mirror_interval: .mirror_interval
}'

Expected: mirror: true, updated_at within the last 10 minutes.

Force Mirror Sync

# Trigger immediate mirror sync
ADMIN_TOKEN=$(kubectl get secret -n gitea gitea-secrets -o jsonpath='{.data.admin-password}' | base64 -d)
curl -sf -X POST "$GITEA_URL/api/v1/repos/tekton-bot/frank/mirror-sync" \
  -H "Authorization: token $ADMIN_TOKEN"

Gitea Logs

kubectl logs -n gitea deploy/gitea --tail=50
kubectl logs -n gitea deploy/gitea -f  # Follow

Restart Gitea

kubectl rollout restart deploy/gitea -n gitea
kubectl rollout status deploy/gitea -n gitea

Note: Gitea uses strategy: Recreate (RWO PVC) — expect a brief downtime window during restart.

Common Gitea Issues

SymptomCauseFix
OIDC login failsAuthentik provider misconfigured or downCheck kubectl logs -n gitea deploy/gitea | grep oauth
Mirror not updatingGitHub PAT expired or rate-limitedVerify GITHUB_MIRROR_TOKEN in Infisical is valid
Webhook delivery failsALLOWED_HOST_LIST missing cluster DNSAdd *.svc.cluster.local to gitea.config.webhook.ALLOWED_HOST_LIST
Pod stuck in PendingPVC can’t mount (pc-1 down)Check kubectl get pv and node status

Tekton Operations

List Recent PipelineRuns

# All PipelineRuns, most recent first
kubectl get pipelinerun -n tekton-pipelines --sort-by=.metadata.creationTimestamp

# Only failed runs
kubectl get pipelinerun -n tekton-pipelines \
  -o jsonpath='{range .items[?(@.status.conditions[0].status=="False")]}{.metadata.name}{"\t"}{.status.conditions[0].message}{"\n"}{end}'

View Pipeline Logs

# Latest PipelineRun logs (requires tkn CLI)
tkn pipelinerun logs -n tekton-pipelines --last

# Without tkn — find the pod and read logs per step
kubectl get pods -n tekton-pipelines -l tekton.dev/pipelineRun --sort-by=.metadata.creationTimestamp | tail -5
kubectl logs -n tekton-pipelines <pod-name> -c step-clone
kubectl logs -n tekton-pipelines <pod-name> -c step-test
kubectl logs -n tekton-pipelines <pod-name> -c step-build-and-push
kubectl logs -n tekton-pipelines <pod-name> -c step-sign

Cancel a Running PipelineRun

kubectl patch pipelinerun -n tekton-pipelines <name> \
  --type=merge -p '{"spec":{"status":"CancelledRunFinally"}}'

Clean Up Old PipelineRuns

A daily CronJob (apps/tekton/manifests/pipelinerun-ttl-gc.yaml) sweeps PipelineRuns whose status.completionTime is older than 7 days. Runs at 04:30 UTC. Inspect:

# Last GC run
kubectl get cronjob -n tekton-pipelines pipelinerun-ttl-gc
kubectl get jobs -n tekton-pipelines -l job-name | grep pipelinerun-ttl-gc | tail -3

# What it would delete now (manual dry-run via the same logic)
CUTOFF=$(date -u -d "7 days ago" +%Y-%m-%dT%H:%M:%SZ)
kubectl get pipelinerun -n tekton-pipelines \
  -o jsonpath='{range .items[?(@.status.completionTime)]}{.metadata.name}{"\t"}{.status.completionTime}{"\n"}{end}' \
  | awk -F'\t' -v c="$CUTOFF" '$2 < c {print $1}'

Force a sweep now (e.g. for a noisy cleanup before a demo):

kubectl create job -n tekton-pipelines --from=cronjob/pipelinerun-ttl-gc pipelinerun-ttl-gc-manual-$(date +%s)

Why TTL matters beyond clutter: until 2026-05-14 the Layer 25 alert rule used kube_pod_status_ready{condition="true"} per-pod, which treated post-completion task pods as a degradation signal — accumulating PipelineRuns silently raised the false-positive rate. The alert query was rewritten to use kube_deployment_status_replicas_unavailable (Deployment-scoped, naturally excludes task pods), but the TTL is the proper hygiene either way.

Manual delete by criterion, when you want finer control than the CronJob:

# All failed PRs older than 1 day
kubectl get pipelinerun -n tekton-pipelines \
  -o jsonpath='{range .items[?(@.status.conditions[0].status=="False")]}{.metadata.name}{"\n"}{end}' \
  | xargs -r kubectl delete pipelinerun -n tekton-pipelines

EventListener Health

# EventListener pod logs
kubectl logs -n tekton-pipelines -l app.kubernetes.io/managed-by=EventListener --tail=30

# EventListener service reachability (from within cluster)
kubectl run test-curl --rm -it --image=curlimages/curl -- \
  curl -s -o /dev/null -w "%{http_code}" \
  http://el-gitea-listener.tekton-pipelines.svc.cluster.local:8080
# Expect: 200 or 202

Tekton Dashboard

Access at http://192.168.55.217:9097 or https://tekton.cluster.derio.net (Authentik forward-auth).

The dashboard is read-only — it shows PipelineRuns, TaskRuns, and logs. Useful for non-CLI users or quick visual debugging.

Common Tekton Issues

SymptomCauseFix
PipelineRun stuck in PendingPVC can’t provision (pc-1 down or Longhorn unhealthy)Check kubectl get pv and Longhorn UI
ComparisonError in ArgoCDTask YAML uses resources instead of computeResourcesFix the Task manifest and re-sync
report-success not firing$(tasks.status) is "Completed" not "Succeeded"Check when clause accepts both values
Step fails with permission deniedMissing fsGroup: 65534 on PipelineRun pod templateAdd to TriggerTemplate’s taskRunTemplate.podTemplate.securityContext
git config failsHOME=/ is read-only for UID 65534Set HOME=/tekton/home env var on the step
PodSecurity violationMissing securityContext on Task stepsAdd runAsNonRoot, capabilities.drop, seccompProfile
Webhook doesn’t trigger pipelineGitea sends X-Gitea-Event, not X-GitHub-EventUse CEL interceptor, not github interceptor

Zot Registry Operations

Registry Health

# API check (anonymous read)
curl -sk https://192.168.55.210:5000/v2/
# Expect: {} or {"repositories":[]}

# List all repositories
curl -sk https://192.168.55.210:5000/v2/_catalog
# Expect: {"repositories":["test/alpine","..."]}

# List tags for a repo
curl -sk https://192.168.55.210:5000/v2/test/alpine/tags/list

Test Push

# Push a test image (requires credentials)
ZOT_PASS=$(kubectl get secret -n tekton-pipelines zot-push-creds \
  -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq -r '.auths["192.168.55.210:5000"].password')

# Using crane (lightweight OCI tool)
crane auth login 192.168.55.210:5000 -u tekton-push -p "$ZOT_PASS" --insecure
crane push /dev/null 192.168.55.210:5000/test/healthcheck:latest --insecure

TLS Certificate Status

# cert-manager Certificate status
kubectl get certificate -n zot zot-tls
# Expect: Ready=True

# Certificate details
kubectl describe certificate -n zot zot-tls

# Check cert expiry
kubectl get secret -n zot zot-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

Registry Logs

kubectl logs -n zot deploy/zot --tail=50

Restart Zot

kubectl rollout restart deploy/zot -n zot
kubectl rollout status deploy/zot -n zot

Like Gitea, Zot uses strategy: Recreate for RWO PVC safety.

Common Zot Issues

SymptomCauseFix
401 Unauthorized on pushWrong password or htpasswd hash staleRegenerate htpasswd if ZOT_PUSH_PASSWORD changed in Infisical
x509: certificate signed by unknown authoritySelf-signed cert not trusted by clientUse --insecure flag or add cert to trust store
Containerd pull fails on nodesTalos mirror patch not appliedApply patches/phase06-cicd/06-cluster-zot-registry.yaml via Omni
Pod stuck (PVC)pc-1 down or Longhorn unhealthyCheck node and Longhorn status

Cosign Operations

Verify an Image Signature

cosign verify --key apps/tekton/cosign.pub \
  --insecure-ignore-tlog --allow-insecure-registry \
  192.168.55.210:5000/<repo>/<image>:<tag>

Expected: Verification for 192.168.55.210:5000/... -- The following checks were performed: ...

Check if an Image Is Signed

# List signature artifacts for an image
crane ls 192.168.55.210:5000/<repo>/<image> --insecure
# Look for sha256-*.sig tags

Rotate Signing Keys

  1. Generate new key pair: COSIGN_PASSWORD='' cosign generate-key-pair
  2. Store new private key in Infisical as COSIGN_KEY
  3. Commit new cosign.pub to apps/tekton/cosign.pub
  4. Wait for ExternalSecret to refresh (5 minutes) or force sync
  5. New images will be signed with the new key; old signatures remain valid with old public key

End-to-End Webhook Test

To verify the full pipeline chain from Gitea to signed image:

# 1. Trigger a mirror sync (or push directly to Gitea)
curl -sf -X POST "http://192.168.55.209:3000/api/v1/repos/tekton-bot/frank/mirror-sync" \
  -H "Authorization: token $ADMIN_TOKEN"

# 2. Watch for new PipelineRun
kubectl get pipelinerun -n tekton-pipelines -w

# 3. Check Gitea commit status (after pipeline completes)
curl -s "http://192.168.55.209:3000/api/v1/repos/tekton-bot/frank/statuses/<SHA>" \
  -H "Authorization: token $TOKEN" | jq '.[0].state'
# Expect: "success"

GitHub-primary Repos (agentic-stoa) Operations

The original direction (Gitea-primary, Tekton-on-Gitea-push, status-back-to-Gitea) still describes Frank’s own repo. The three repos under agentic-stoa/* (hum, content-factory, stoa-blog) run the inverted direction — see Direction Inversion in the Building post for the architecture. This section covers the day-to-day operational commands specific to that path.

The chain at a glance

GitHub webhook → Caddy on Hop (webhooks.hop.derio.net) → Tailscale mesh →
  el-github-listener (192.168.55.223:8080) → github-pull-sync → <repo>-ci → dual-status

Four moving parts whose health you may want to inspect independently: Caddy’s webhook route on Hop, the el-github-listener EventListener on Frank, the github-pull-sync Pipeline runs, and the per-repo CI Pipelines (hum-ci, content-factory-ci, stoa-blog-ci).

Inspect github-listener events

The EventListener pod logs every webhook it receives, the interceptor decision, and the resulting PipelineRun creation. Two grep patterns cover most of what you’ll want:

# Tail the github-listener pod (last 30 events)
kubectl logs -n tekton-pipelines -l eventlistener=github-listener --tail=30

# Just the trigger-firing decisions (one line per webhook)
kubectl logs -n tekton-pipelines -l eventlistener=github-listener --tail=200 \
  | grep -E "Triggered|interceptor|HMAC"

If a delivery shows up in GitHub’s webhook UI as 200 OK but no PipelineRun appears, the EventListener received it but the interceptor or trigger filter rejected it. Look for interceptor stopped trigger processing in the logs — usually a missing/wrong X-GitHub-Event header (Caddy strips it) or an HMAC mismatch (secret rotation drift between GitHub’s webhook config and the stoa-github-webhook-secret Secret in Frank).

To check the EventListener service is reachable from inside the cluster:

kubectl run el-test --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s -o /dev/null -w "%{http_code}\n" \
  http://el-github-listener.tekton-pipelines.svc.cluster.local:8080
# Expect: 200 (no body posted, but the listener responds 200 on GET)

Manually re-trigger pull-sync

Use cases: GitHub webhook missed delivery (rare — GitHub retries), a transient pull-sync failure that you want to retry without re-pushing the PR, or you’ve just rotated STOA_GITHUB_TOKEN and want to confirm the auth side still works.

# Replay the last delivery from GitHub's webhook UI
# GitHub repo → Settings → Webhooks → click the webhook → Recent Deliveries → Redeliver

# Or fire pull-sync directly with a known SHA (skips the PR/main inference):
kubectl create -n tekton-pipelines -f - <<'EOF'
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  generateName: github-pull-sync-manual-
spec:
  pipelineRef:
    name: github-pull-sync
  params:
    - name: github-repo
      value: agentic-stoa/hum                    # change me
    - name: gitea-repo
      value: agentic-stoa/hum                    # change me
    - name: ref-from
      value: refs/heads/main                     # or refs/pull/<N>/head
    - name: ref-to
      value: refs/heads/main                     # or refs/pull/<N>/head (mirror exact)
    - name: sha
      value: <commit-sha>                        # change me
  workspaces:
    - name: shared-workspace
      volumeClaimTemplate:
        spec:
          accessModes: [ReadWriteOnce]
          storageClassName: longhorn-cicd
          resources:
            requests:
              storage: 1Gi
    - name: ssh-creds
      secret:
        secretName: stoa-bot-ssh-key
        defaultMode: 0400
EOF

After it completes, verify Gitea picked up the SHA:

GITEA_HEAD=$(curl -sf "http://192.168.55.209:3000/api/v1/repos/agentic-stoa/hum/branches/main" \
  -H "Authorization: token $STOA_GITEA_TOKEN" | jq -r .commit.id)
echo "Gitea main: $GITEA_HEAD"

Inspect Caddy logs on Hop when GitHub webhook delivery fails

If GitHub’s UI shows a non-2xx (typically 502 from Caddy or a TLS failure), the issue is upstream of Frank. Three layers to check, in order:

# 1. Caddy access log on Hop — was the request even received?
source .env_hop
kubectl logs -n caddy-system deploy/caddy --tail=50 \
  | grep -E "webhooks.hop.derio.net|192.168.55.223"

# 2. From the Caddy pod, is the upstream reachable over the mesh?
kubectl exec -n caddy-system deploy/caddy -- nc -vz 192.168.55.223 8080
# Expect: succeeded
# Failure usually means Tailscale `--accept-routes` regressed — check tailscale-client DaemonSet:
kubectl logs -n headscale-system ds/tailscale-client --tail=20 | grep -i route

# 3. Is the LB IP actually allocated on Frank?
source .env
kubectl get svc -n tekton-pipelines el-github-listener-lb \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}{"\n"}'
# Expect: 192.168.55.223

If Caddy logs show the request as 502 Bad Gateway, layer (2) is the failure — Hop can’t reach Frank’s LB IP through the mesh. The fix is on Hop (Tailscale flags), not Frank.

Inspect dual-status posts after a CI run

Both posts are visible by API. After <repo>-ci completes for a SHA:

SHA=<commit-sha>
REPO=agentic-stoa/hum

# GitHub side (mandatory post)
gh api repos/$REPO/commits/$SHA/statuses --jq '.[] | select(.context=="tekton/ci") | {state, target_url, updated_at}'

# Gitea side (best-effort post)
curl -sf "http://192.168.55.209:3000/api/v1/repos/$REPO/statuses/$SHA" \
  -H "Authorization: token $STOA_GITEA_TOKEN" \
  | jq '.[] | select(.context=="tekton/ci") | {state, target_url, updated_at}'

If the GitHub side is missing entirely after a CI run, the github-status Task failed and the PipelineRun should be marked failed too (this is the design — github-status is mandatory). If the Gitea side is missing while GitHub is present, the gitea-status Task hit onError: continue (best-effort) — check kubectl logs -n tekton-pipelines <pod> -c step-post-status. The most common cause is tekton-bot no longer being a member of the agentic-stoa Gitea org (re-add via Gitea UI → Organization Members).

Common GitHub-primary issues

SymptomCauseFix
GitHub webhook UI shows 200 but no PipelineRunEventListener accepted but interceptor/trigger filter rejectedCheck kubectl logs -l eventlistener=github-listener for “interceptor stopped” or HMAC mismatch lines
GitHub webhook UI shows 502 Bad GatewayHop’s Caddy can’t reach 192.168.55.223 over meshVerify --accept-routes on Hop’s tailscale-client DaemonSet
Hop’s Caddy returns 401Caddy validates X-Hub-Signature-256 with a stale secretRotate STOA_GITHUB_WEBHOOK_SECRET in Infisical to match GitHub’s webhook config
github-pull-sync fails on git fetchSTOA_GITHUB_TOKEN PAT expired or missing repo scopeRegenerate PAT with repo (gives both fetch read and statuses:write)
github-pull-sync fails on git push to Giteastoa-bot SSH key rotated; Gitea side has stale fingerprintRe-add public key in Gitea → stoa-bot → Settings → SSH Keys
github-status Task posts but PR shows no checkPAT missing Commit statuses: Read and write (fine-grained PAT)Add the scope; the x-accepted-github-permissions: statuses=write header in the 403 response is the smoking gun
gitea-status posts 404tekton-bot isn’t a member of agentic-stoa orgAdd membership via Gitea UI → Organization Members

References