You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most people know me as **cedi** — I'm a **Staff Site Reliability Engineer and Technical Lead** at [Celonis](https://www.celonis.com), where I focus on building resilient distributed systems that scale.
10
+
Most people know me as **cedi**.
11
+
12
+
I'm a **Staff Site Reliability Engineer and Technical Lead** at [Celonis](https://www.celonis.com), where I focus on building resilient distributed systems that scale.
I specialize in building and maintaining **hyperscale distributed systems** that serve hundreds of millions of users, with deep expertise in reliability engineering, observability, and resilience testing. My passion lies in making systems more reliable, teams more effective, and turning incidents into learning opportunities.
19
+
I specialize in building and maintaining **large scale distributed systems** that serve hundreds of millions of users, with deep expertise in chaos engineering, observability, and resilience testing. My passion lies in making systems more reliable, teams more effective, and turning incidents into learning opportunities.
18
20
19
21
With **13 years** of experience in Software Engineering and Site Reliability Engineering, I've designed and operated **distributed systems at global scale** (across millions of servers), led critical infrastructure migrations for major cloud platforms, and built reliability practices that have become organizational standards at companies like Microsoft Azure.
20
22
@@ -34,37 +36,54 @@ With **13 years** of experience in Software Engineering and Site Reliability Eng
34
36
35
37
### Community & Advocacy
36
38
37
-
- Evangelize SRE best practices through talks, documentation, and mentoring
38
-
- Contribute to open-source projects and cloud-native ecosystem
39
-
- Help organize infrastructure for events like Chaos Communication Congress
39
+
#### Conference Speaking
40
+
41
+
Regular speaker at technology conferences on SRE, observability, and Kubernetes topics. Recent talks include:
42
+
43
+
-**[Site Reliability Engineering Explained](https://media.ccc.de/v/gpn21-48-site-reliability-engineering-explained-an-exploration-of-devops-platform-engineering-and-sre)** - Exploring DevOps vs Platform Engineering vs SRE
44
+
-**[Modern Observability with LGTM Stack](https://media.ccc.de/v/gpn21-47-modern-observability-scalable-observability-with-the-lgtm-stack-harnessing-the-power-of-loki-grafana-tempo-and-mimir)** - Scalable observability architecture
45
+
-**[Understanding Alerting](https://media.ccc.de/v/gpn20-22-understanding-alerting-how-to-come-up-with-a-good-enough-alerting-strategy)** - Building effective alerting strategies
40
46
41
-
###During the pandemic, I helped build
47
+
#### Open Source & Community
42
48
43
-
-**jitsi.rocks** — Collection of Jitsi video servers to keep people connected
44
-
-**Open Infrastructure** — Collective building critical infrastructure for education institutions
45
-
-**rC3 virtual world** — Kubernetes stack powering the [rC3 - NOW HERE](https://rc3.world/2021/) virtual event
49
+
-Active contributor to **[SpechtLabs](https://github.com/specht-labs)**, **[Compute Blade Community](https://github.com/compute-blade-community)**, and **[SierraSoftworks](https://github.com/SierraSoftworks)** projects
50
+
-Help organize infrastructure for events like **Chaos Communication Congress**
51
+
-Regular technical conference speaker and community contributor
46
52
47
53
## Personal Projects & Interests
48
54
49
55
### Home Lab & Tinkering
50
56
51
-
- Raspberry Pi K3s cluster with CEPH storage and Stratum 1 NTP/PTP time server
52
-
- Cluster API managed cloud Kubernetes running full Grafana LGTM stack
53
-
- Kernel recompilation and low-level distributed systems exploration
-**CEPH distributed storage** for hands-on distributed systems experience
59
+
-**Stratum 1 NTP/PTP time server** for precision timing
60
+
-**Cluster API** managed cloud Kubernetes running full **Grafana LGTM stack**
61
+
-**Advanced chaos engineering** experiments across the cluster
54
62
55
63
### Analog Photography
56
64
57
65
When I'm not writing YAML, I'm a hobbyist **analog photographer** with a collection of 35mm and medium format cameras (Leica M6, Hasselblad 500 c/m, Canon A1). I develop film at home in my tiny darkroom with a 35mm enlarger.
58
66
59
67
## Engineering Philosophy
60
68
61
-
-**Be excellent to each other** 🤝
69
+
::: tip Core Principle
70
+
**Be excellent to each other** 🤝
71
+
:::
72
+
73
+
::: note Focus Areas
74
+
62
75
- Focus on **fundamentals** over chasing hype
63
76
- Alert on **symptoms**, not vitals
64
77
-**Incidents are learning opportunities**
65
78
- There is no single "root cause"
79
+
:::
80
+
81
+
::: details Essential Reading & Frameworks
82
+
66
83
-[How Complex Systems Fail](https://how.complexsystems.fail) is required reading
67
-
- System architecture exists mostly **in your head** and fails differently than expected ([Above/Below the Line framework](https://snafucatchers.github.io/#2_3_The_above-the-line/below-the-line_framework))
84
+
- System architecture exists mostly **in your head** and fails differently than expected
85
+
- Recommend the [Above/Below the Line framework](https://snafucatchers.github.io/#2_3_The_above-the-line/below-the-line_framework) for incident analysis
0 commit comments