DevOps Engineer — AWS, Observability & Security
Who we are:
Collaboration.Ai is a mission-focused, AI-powered software and services company based in Minnesota, with employees, partners, and customers around the world.
We unite people, technology, and purpose to accelerate breakthroughs that transform industries, empower communities, and create a more sustainable future. We collaborate with customer teams across a broad spectrum of public and private sector organizations, helping them navigate complex challenges and drive transformative change.
To learn more about us, visit collaboration.ai
Our product lineup:
NetworkOS is an AI-powered platform that aligns people, purpose, ideas, and expertise in real-time, generating actionable insights to propel movements forward.
CrowdVector is an integrated solution marketplace and innovation management platform that rapidly uncovers new ideas and advances breakthroughs to fuel movements.
About the Role
You’ll own CAI’s cloud infrastructure end-to-end: the AWS footprint, CI/CD pipelines, container orchestration, and observability platforms that let our engineering teams ship with confidence.
This isn’t a “keep the lights on” role. You’ll architect infrastructure that meets federal compliance requirements by design, build out our DataDog monitoring from the ground up, and lay the foundation for AI/ML workloads. You’ll work closely with our Product Dev teams, Security and Compliance, and Platform Architect to keep our infrastructure secure, efficient, and developer-friendly - supporting customers who turn ideas, networks, and data into real outcomes.
High autonomy. Real ownership. Infrastructure that matters.
What You’ll Do
- Architect AWS infrastructure: Design and manage our cloud footprint for scalability, high availability, and security controls aligned with NIST 800-53 and SOC 2
- Build reliable CI/CD pipelines: GitHub Actions with automated quality gates (testing, coverage, security scanning) that let developers ship to Kubernetes with high confidence
- Codify everything in Terraform: Establish IaC standards, enforce peer review, detect drift, and maintain consistent patterns across dev, nonprod, and prod
- Run Kubernetes at scale: Operate and optimize EKS clusters with zero-downtime upgrades, right-sized resources, and security policies
- Build out DataDog observability: Create dashboards, alerts, and integrations that give teams actionable insight into infrastructure health
- Embed security in pipelines: Snyk scanning, container image validation, AWS security baselines, and compliance-ready documentation
- Support AI workloads: Build infrastructure for LLM Ops; compute scaling, model deployment pipelines, and cost optimization
Our Tech Stack
- Cloud: AWS (EKS, RDS, S3, KMS, Secrets Manager, VPC, ALB, CloudTrail, Security Hub)
- IaC: Terraform, GitOps workflows Containers: Kubernetes (EKS), Docker, Helm, ArgoCD
- CI/CD: GitHub Actions, Codecov, Amazon ECR Security: Snyk (SAST, container scanning, dependency scanning), AWS security controls, TLS 1.3
- Observability: DataDog (infrastructure monitoring, dashboards, alerts), OpenTelemetry
- Compliance: NIST 800-53, NIST 800-171, SOC 2, CMMC Level 2, FedRAMP High readiness
- Future: AWS GovCloud (IL2/IL4)
What We’re Looking For
Must-Haves
- 5+ years of hands-on DevOps or Platform Engineering experience with AWS
- Production Terraform (or OpenTofu) expertise for infrastructure as code
- Strong Docker and Kubernetes (EKS) experience in production
- Experience implementing security controls aligned with NIST 800-171 or NIST 800-53
- Hands-on CI/CD pipeline design (GitHub Actions preferred)
- Experience with observability platforms (DataDog strongly preferred)
- Understanding of GitOps practices and deployment automation
- US citizenship (required DoD contracting and FedRAMP compliance)
Nice-to-Haves
- Candidates located in the Minneapolis/ Saint Paul, MN area for coworking opportunities. However remote candidates are encouraged to apply.
- LLM Ops or AI/ML infrastructure experience
- Advanced AWS certifications (DevOps Engineer Professional, Security Specialty)
- High-growth startup or B2B SaaS background
Why Join Collaboration AI?
Modern stack, real problems. Kubernetes, Terraform, DataDog, GitHub Actions — no legacy infrastructure, no manual deployments. You’ll build on tools you actually want to use.
AI-native culture. We build with AI, not just for AI. Claude Code, agentic workflows, and AI-assisted infrastructure automation are how we work daily. You’ll be expected to push the boundaries of what’s possible with AI tooling in your domain and you’ll have the freedom to do it.
Work that matters. >Our customers span defense, pharma, aerospace, global enterprises, and universities; complex organizations where the stakes are real. The infrastructure you build will meet the most demanding compliance requirements in the industry: FedRAMP High, CMMC Level 2, SOC 2.
Own it. Small, senior team. High autonomy. Your infrastructure decisions are visible, valued, and directly tied to company outcomes.