
Preparing for a DevOps or AWS interview? This guide compiles 18 scenario-based questions inspired by real-world challenges and solutions from seasoned engineers. These questions span key areas like Kubernetes, CI/CD, Infrastructure as Code, observability, and cloud migrations—perfect for candidates aiming to showcase hands-on expertise.
⚙️ Containerization & Kubernetes
1. How can you reduce the size of a Docker image without losing functionality?
Answer: Use multi-stage builds, smaller base images like Alpine, .dockerignore
, and consolidate RUN
commands. Consider distroless images for production.
2. What would you do if some Kubernetes nodes are overutilized while others are underutilized?
Answer: Check node taints, affinities, and resource definitions. Apply topology spread constraints and autoscalers.
3. How would you handle container security vulnerabilities in CI/CD?
Answer: Integrate image scanning tools (Trivy, Clair), use minimal base images, enforce Pod Security Policies, and monitor with tools like Falco.
🚀 CI/CD Best Practices
4. Your CI/CD pipeline takes too long. How would you optimize it?
Answer: Parallelize steps, use caching (BuildKit, dependency caches), optimize tests, and improve infrastructure.
5. A production deployment caused bugs. What is your rollback strategy?
Answer: Use kubectl rollout undo
, blue-green deployment, canaries, enhanced monitoring, and feature flags.
6. How would you achieve zero downtime during deployment?
Answer: Implement blue-green deployments with traffic shifting, health checks, and gradual rollout using Istio or Flagger.
🏛️ Infrastructure as Code (IaC)
7. How do you provision isolated AWS environments for multiple clients?
Answer: Use modular Terraform code, workspaces, and CI/CD for automation. Enforce tagging and policy compliance.
8. How do you prevent and fix configuration drift in production?
Answer: Use AWS Config, daily terraform plan
, and IAM restrictions. Remediate with automation or approval workflows.
9. What’s your approach to secure secret management in IaC?
Answer: Use Vault, AWS KMS, ephemeral secrets, Terraform Vault provider, and avoid hardcoding sensitive values.
10. How do you test infrastructure code before production?
Answer: Use tfsec/checkov for static analysis, Terratest for unit/integration testing, and performance/security validation.
🔍 Monitoring & Observability
11. How would you reduce alert fatigue?
Answer: Classify alerts (P1–P4), use dynamic thresholds, composite alerts, and establish SLOs with observability tools.
12. Users report slowness but monitoring looks fine. What next?
Answer: Use distributed tracing (Jaeger), synthetic monitoring, DB profiling, and check hidden bottlenecks.
13. How do you manage logs at scale (TBs/day)?
Answer: Use tiered log storage (hot, warm, cold), Fluentd/Vector, index only critical logs, and compress & route efficiently.
14. Walk through your steps after a 3AM system outage alert.
Answer: Triage alert, check dashboards, review recent changes, engage runbooks, analyze logs/metrics, mitigate, and postmortem.
15. How do you create custom metrics for business-level monitoring?
Answer: Collaborate with stakeholders, instrument code for KPIs, build dashboards, alert on anomalies, and refine via feedback.
☁️ Cloud Architecture & Migration
16. How would you migrate a Java + Oracle app with local file storage to AWS?
Answer: Use App2Container for app, AWS DMS for DB, and S3 for files. Establish hybrid networking and CI/CD pipelines.
17. How do you manage environment-specific configurations securely?
Answer: Use GitOps, ConfigMaps/Secrets, Vault, Helm/Kustomize, schema validation, and config promotion strategies.
18. What’s your approach for managing secrets and configuration in a multi-environment setup?
Answer: Centralized secrets with Vault/KMS, environment overlays with Helm/Kustomize, validation tests, and documented workflows.
🔹 Bonus: What Makes a Great DevOps Candidate?
- Demonstrates deep understanding of tooling
- Solves problems holistically (infra + app)
- Communicates clearly and proactively
- Implements security, automation, and observability from the start