How a FinOps Approach with Legend Can Slash Your Multi-Cloud AWS/Azure/GCP Costs
September 30, 2025How Enterprise Data Analysts Can Leverage the Power of Developer Analytics for Strategic Advantage
September 30, 2025Let me share something I learned the hard way: CI/CD pipeline costs aren’t just about compute bills. They’re about wasted developer time, failed deployments, and the constant pressure to move faster. As a DevOps lead, I spent months tuning our pipelines. The result? 30% lower costs and a team that trusts their deployments. Here’s how we got there using GitLab, Jenkins, and GitHub Actions – no magic, just practical fixes.
How I Stopped Worrying About CI/CD Costs
Our first pipeline audit was eye-opening. We were measuring the wrong things. Build times? Deployment frequency? Those were just the surface. The real problems were:
- Tests running twice because we couldn’t share artifacts
- Developers waiting 20 minutes for builds to finish
- Production deployments failing 1 in 5 times
- Cloud bills that kept growing despite “optimization efforts”
We flipped the script. Instead of chasing speed metrics, we focused on what actually impacted our team’s productivity and bottom line.
What We Actually Measured
We started tracking three simple things across all our pipelines:
- Build Time: From commit to ready-for-review
- Failure Rate: How often deployments broke production
- Actual Resource Use: What we paid for vs. what we used
The data showed us three culprits: redundant testing, bloated containers, and runners that were way too big for the jobs.
Making Builds Actually Fast
Build automation isn’t about automation for automation’s sake. It’s about getting developers back to what they do best – writing code, not waiting.
GitLab: Stop Reinstalling Dependencies
Our builds were wasting 8 minutes per run just on `npm install`. The fix? GitLab’s cache and artifacts. Now we store `node_modules` between jobs and share built code across stages.
job_build:
stage: build
cache:
paths:
- node_modules/
script:
- npm install
- npm run build
artifacts:
paths:
- dist/
Result? Builds that used to take 12 minutes now take 7. That’s 40% faster, every single run.
Jenkins: Run Tests in Parallel, Not in Line
We had integration tests that took 15 minutes because they ran one after another. Jenkins’ parallel stages let us run them side-by-side.
stage('Test') {
parallel {
stage('Unit Tests') {
steps {
sh 'npm run test:unit'
}
}
stage('Integration Tests') {
steps {
sh 'npm run test:integration'
}
}
}
}
Cut that 15-minute wait to 8 minutes. Simple, but it adds up.
GitHub Actions: Stop Copying Configs
We had 20 repos with slightly different workflows. Maintenance was a nightmare. Reusable workflows fixed that:
jobs:
build:
uses: myorg/ci-templates/.github/workflows/build.yml@main
with:
node-version: '18'
One place to update, 20 repos stay consistent.
When Deployments Actually Work
Nothing kills momentum like a broken deployment. We reduced failures by 25% with two changes:
Canary Deployments: Test in Production (Safely)
Instead of “big bang” releases, we started small:
- First, 10% of production nodes get the update
- Watch error rates, latency, and CPU
- Only if all looks good, roll to 50%, then 100%
Caught a memory leak on 100 nodes instead of 1000. Huge win.
Rollbacks That Actually Work
We built automatic rollback into our pipelines. If a deployment fails health checks, it auto-reverts.
- name: Rollback on Failure
if: ${{ failure() }}
run: kubectl rollout undo deployment/myapp
Most rollbacks now happen before anyone notices. Team confidence skyrocketed.
Spending Less on Compute (Without Breaking Things)
As an SRE, I care about reliability. But reliability shouldn’t cost a fortune.
Right-Sizing: Pay for What You Use
We analyzed a month of pipeline data. Turns out:
- 80% of our jobs ran fine on smaller instances
- We were paying for idle time in 30% of our builds
- Cluster autoscaling was working but tuned too conservatively
We switched to smaller runners, used spot instances for non-critical jobs, and tuned autoscaling. 30% lower cloud costs. Same reliability.
Docker Images That Don’t Waste MBs
Our images were huge because we included dev dependencies and intermediate files. Multi-stage builds fixed that:
FROM node:18 AS builder
# Install dependencies and build app
# ...
FROM node:18-alpine
# Copy only the final built app
# ...
Smaller images mean faster deploys and less storage. Every little bit helps.
Know When Things Go Wrong
A reliable pipeline needs eyes on it. We track:
- Build success rate
- How often we trigger rollbacks
- CPU, memory, disk use during builds
- Total time from commit to deploy
And we alert on what matters: deployment failure rates over 5% get the on-call SRE’s attention immediately. No noise, just the critical stuff.
What Worked for Us
These changes didn’t happen overnight. We prioritized based on impact:
- Cache dependencies → 40% faster builds
- Right-size runners → 30% lower compute costs
- Canary + auto-rollback → 25% fewer failed deploys
- Reusable workflows → less config drift, fewer surprises
The key? Measure what actually affects your team’s productivity and costs. Optimize for those things first.
Whether you’re using GitLab, Jenkins, or GitHub Actions, the principles stay the same: faster builds, reliable deployments, and resources that match your actual needs. That’s how we got to 30% lower costs – and a team that trusts their pipeline.
Related Resources
You might also find these related articles helpful:
- How a FinOps Approach with Legend Can Slash Your Multi-Cloud AWS/Azure/GCP Costs – Ever wonder how your team’s coding habits affect your cloud bill? I’ve spent years helping companies connect…
- Building a High-Impact Training Program for Rapid Tool Adoption in Engineering Teams – Getting engineers up to speed on a new tool isn’t about flashy tutorials or one-off demos. It’s about building a trainin…
- The Enterprise Architect’s Guide to Scalable Tool Integration: A Case Study with Legend – Rolling out new tools in a large organization? It’s not just about flashy features. Real success comes from smooth integ…