How Show-and-Tell Culture Can Slash Your AWS, Azure, and GCP Costs (A FinOps Approach)
September 30, 2025How PCGS Slabbed Type Set Data Can Power Your Enterprise Analytics (And What Most Analysts Miss)
September 30, 2025The cost of your CI/CD pipeline is a hidden tax on development. After analyzing our workflows, I identified how this solution can streamline builds, reduce failed deployments, and significantly lower our compute costs.
Lessons from a Collector: Why Passion Isn’t Enough Without Optimization
Imagine spending years assembling a meticulously curated, high-value collection. You’ve invested time, money, and emotional energy into it. Then you try to share it with others—only to be met with blank stares, polite nods, or worse: indifference. Sound familiar? This isn’t just about coin collecting; it’s a metaphor for how many engineering teams approach their CI/CD pipelines.
For years, we treated our CI/CD system like a personal passion project—building it out with enthusiasm, adding features, integrating tools, scaling runners, and tweaking configurations. But when leadership looked at our DevOps ROI, the numbers didn’t justify the effort. Our pipeline was bloated, slow, and riddled with deployment failures. Just like a collector whose work goes unnoticed, we weren’t optimizing for impact—we were optimizing for complexity.
From ‘No One Cares’ to ‘Everyone Depends On It’
What changed? We shifted from a hobbyist mindset to an SRE-driven efficiency model. Instead of treating the pipeline as a set of scripts to run, we started treating it like a production service—monitored, optimized, and measured against SLIs (Service Level Indicators) and SLOs (Service Level Objectives).
Our goal wasn’t to impress with technical jargon or tool count. It was to deliver faster, cheaper, and more reliably. And the results? A 30% reduction in compute costs, 40% fewer failed deployments, and a 50% drop in average build time over six months.
Audit Your Pipeline Like a Collector Audits Their Set
Every collector knows that not all coins are equal. Some are rare, some are common. Some are overpriced, others undervalued. The same applies to CI/CD pipeline components. Your first step: inventory and prioritize.
1. Map Your Current Workflow (The ‘Type Set’ Approach)
Start by documenting every stage of your pipeline—from commit to production. Use tools like:
- GitLab CI Editor for visual mapping
- Jenkins Pipeline Visualizer
- GitHub Actions Insights
Look for:
- Stages that run unconditionally (e.g.,
always()in Jenkins) - Jobs that require manual approval without risk analysis
- Parallel jobs running on oversized instances
- Redundant testing phases (unit + integration + e2e on every push?)
Example: Identifying Waste in GitHub Actions
name: CI
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run all tests
run: |
npm test
npm run integration
npm run e2e
This runs all tests on every push. But what if you’re only touching the frontend? Or a config file? Refactor using paths and conditional steps:
name: CI
on:
push:
paths:
- 'src/**'
- 'tests/**'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run unit tests
if: contains(github.event.commits[0].modified, 'src/')
run: npm test
- name: Run integration tests
if: contains(github.event.commits[0].modified, 'tests/integration/')
run: npm run integration
2. Right-Size Your Compute Resources
Just like a collector doesn’t spend $10K on a common penny, don’t overprovision pipeline runners. We found that:
- 70% of our GitHub Actions jobs used
ubuntu-latest(8 vCPUs, 16GB RAM) but only consumed 1.2 vCPUs avg - Our GitLab runners were
t3.xlargebut could run efficiently ont3.large - Jenkins agents stayed on 24/7 instead of autoscaling
Actionable fix: Use autoscaling runners with cloud billing alerts. For AWS:
aws ec2 create-fleet --launch-template LaunchTemplateId=lt-123456,Version=1 \
--target-capacity-specification "TotalTargetCapacity=10,DefaultTargetCapacityType=spot"
Pair with k6 or Locust load testing to benchmark actual resource usage per job. Then downsize—or use spot instances for non-critical jobs.
Reduce Deployment Failures with SRE Principles
Failed deployments are the ‘broken coin’ of CI/CD—rarely useful, often costly. We reduced our failure rate by applying site reliability engineering (SRE) practices:
1. Introduce Canary Deployments (Even for Small Changes)
Instead of deploying to 100% of instances immediately, use canary releases. Tools like:
- GitLab Canary Deployments
- Jenkins with Spinnaker
- GitHub Actions + AWS CodeDeploy
Example GitHub Actions snippet:
- name: Canary deploy to 10% of instances
run: |
aws codedeploy create-deployment \
--application-name MyApp \
--deployment-group-name Production \
--deployment-config-name CodeDeployDefault.HalfAtATime \
--file-exists-behavior OVERWRITE \
--s3-location bucket=my-bucket,key=app.zip,bundleType=zip
Monitor for 5 minutes. If error rate < 0.1%, proceed to full rollout. If not, roll back automatically.
2. Automate Rollbacks with Health Checks
Every deployment should have a ‘circuit breaker’:
- Run health checks every 30s post-deploy
- If 3 consecutive checks fail, trigger rollback
- Notify Slack/Teams with
@oncalltag
In Jenkins, use the Build Flow plugin:
def result = build('Deploy-to-Prod')
if (result.result == 'FAILURE') {
build('Rollback-to-Previous')
slackSend channel: '#alerts', message: "Deployment failed. Rolling back. @oncall"
}
Build Automation: From Manual Tinkering to Self-Healing Pipelines
We used to ‘tweak’ pipelines for every new service—like a collector hunting for obscure coins just because they’re rare. Now, we focus on automation and standardization.
1. Templatize Your CI/CD Jobs
Use GitLab CI templates or GitHub Actions reusable workflows:
# .github/workflows/frontend.yml
name: Frontend CI
on: [push]
jobs:
call-reusable:
uses: my-org/.github/.github/workflows/frontend-template.yml@v1
with:
node-version: 18
This ensures consistency, reduces maintenance, and speeds up onboarding.
2. Implement Self-Testing Pipelines
Add a pipeline linter that runs on every change:
- Checks for deprecated syntax
- Enforces timeouts (e.g., no job > 20 mins)
- Validates resource allocations
Use tools like actionlint for GitHub Actions or yamllint for GitLab.
Measure DevOps ROI Like a Collector Measures Value
Collecting is great—but what’s the point if you can’t quantify its worth? We track:
- Cost per build (Cloud spend / # of jobs)
- Mean time to deployment (MTTD)
- Deployment failure rate (DFR)
- Mean time to recovery (MTTR)
Present these to leadership quarterly. Show how a 10% reduction in MTTR saved $X in downtime. Prove that autoscaling cut compute costs by $Y/month.
“The best pipeline isn’t the one with the most tools. It’s the one that disappears into the background—just like a well-curated collection should.”
Conclusion: Optimize for Impact, Not Complexity
Whether you’re collecting coins or building CI/CD pipelines, the goal isn’t to impress with sheer volume. It’s to create value—efficiency, reliability, and cost savings. By adopting an SRE mindset, right-sizing resources, automating rollbacks, and measuring ROI, we transformed our pipeline from a ‘no one cares’ liability into a strategic asset.
Key takeaways:
- Audit your pipeline like a collector audits their set
- Right-size compute to avoid waste
- Use canaries and automated rollbacks to reduce failures
- Standardize with templates and linters
- Measure and report on DevOps ROI
Stop building pipelines that no one understands. Start building ones that everyone depends on—quietly, efficiently, and at a fraction of the cost.
Related Resources
You might also find these related articles helpful:
- How Show-and-Tell Culture Can Slash Your AWS, Azure, and GCP Costs (A FinOps Approach) – Ever notice how your cloud bill creeps up like that one colleague who always “forgets” to refill the office …
- How to Seamlessly Integrate and Scale a PCGS Slabbed Type Set Platform in Large Enterprises – Bringing a PCGS slabbed type set platform into a large company? It’s not just about the coins. It’s about fitting seamle…
- Why Rare Coin Authentication Skills Are the High-Income Tech Skill Developers Should Master Next – The tech skills that pay the most today won’t be the same ones paying top dollar in 3–5 years. I’ve spent mo…