- What Domain 3 Actually Covers
- Monitoring, Logging, and Observability
- Managing Compute Resources
- Managing Storage and Database Solutions
- Managing Networking Resources
- Deployment and Process Automation
- AI-Assisted Operations with Gemini Cloud Assist
- Scheduling Domain 3 Into Your Prep
- Registration, Format, and Timing
- FAQ
- Domain 3 tests day-two operations: monitoring, logging, managing compute/storage/network resources, and deployment automation.
- Cloud Monitoring, Cloud Logging, and alerting policies show up more than any other single topic in this domain.
- The exam has 50-60 questions in 2 hours, so Domain 3 scenario questions need fast, confident triage.
- Gemini Cloud Assist and Cloud Run functions now appear in the current exam guide's operational scope.
What Domain 3 Actually Covers
Domain 3, "Ensuring the successful operation of a cloud solution," is the operations-heavy section of the Associate Cloud Engineer exam. Where Domain 2 is about building a solution, Domain 3 is about keeping it running: watching dashboards, reading logs, managing running compute and storage resources, and rolling out changes without breaking production. If you have ever been paged at 2 a.m. for a full disk or a stuck deployment, this domain will feel familiar.
Google's exam guide groups Domain 3 tasks around four practical clusters: managing compute resources, managing storage and database solutions, managing networking resources, and monitoring/logging solutions. A newer thread running through all four is process automation and observability tooling, including AI-assisted operations. For the full breakdown of how this domain fits alongside the other three, see the complete guide to all four ACE content areas.
Monitoring, Logging, and Observability
This is the single densest topic inside Domain 3. You need working familiarity with:
- Cloud Monitoring: creating dashboards, uptime checks, alerting policies, and notification channels (email, SMS, Pub/Sub, PagerDuty-style integrations).
- Cloud Logging: log routing with sinks, log-based metrics, exclusion filters, and viewing logs across projects with the Logs Explorer.
- Error Reporting and Cloud Trace/Profiler: recognizing when each tool is the right answer for latency versus error-rate versus resource-consumption questions.
- Audit logs: distinguishing Admin Activity, Data Access, System Event, and Policy Denied logs, and knowing which are on by default.
High-Value Skill: Alerting Policy Design
You should be able to read a scenario and identify the correct combination of metric, threshold, and notification channel needed to alert an on-call team before an outage becomes customer-facing.
- Know the difference between a metric threshold condition and a log-based metric condition.
- Understand notification channel setup and how alerting policies tie to uptime checks.
- Be comfortable identifying when Cloud Monitoring, not Cloud Logging, is the right tool.
Expect scenario questions rather than pure definition questions. A typical prompt might describe a spike in 5xx errors on a Compute Engine-backed load balancer and ask which combination of Cloud Monitoring and Cloud Logging configuration will notify the team fastest with the least noise.
Managing Compute Resources
Domain 3 assumes you already deployed compute resources (that was Domain 2's job) and now asks you to keep them healthy. Core skills include:
- Managing instance groups: resizing, updating instance templates, and rolling updates versus rolling restarts.
- Adjusting autoscaling policies for Compute Engine managed instance groups and understanding cooldown periods.
- Managing GKE clusters day-to-day: node pool upgrades, workload scaling, and reading `kubectl` output for troubleshooting.
- Understanding Cloud Run and Cloud Run functions revisions, traffic splitting, and rollback behavior when a new revision misbehaves.
- SSH access, OS Login, and startup/shutdown script troubleshooting on Compute Engine.
Managing Storage and Database Solutions
Operational storage questions focus less on choosing a storage product (that's planning) and more on keeping existing storage healthy and cost-controlled:
- Cloud Storage lifecycle management rules, retention policies, and object versioning cleanup.
- Monitoring Cloud SQL instances: read replica lag, storage autogrowth, and backup/restore procedures.
- Managing Cloud SQL and Spanner maintenance windows without causing unplanned downtime.
- Bigtable and Firestore capacity and performance monitoring basics.
- Import/export and migration jobs using `gcloud` and the console, including troubleshooting failed transfers.
Key Takeaway
If a question describes a database running out of storage or replicas falling behind, look for answers involving monitoring metrics and autoscaling/backup settings, not architecture redesign - that's the operational mindset Domain 3 rewards.
Managing Networking Resources
Networking in Domain 3 is about maintaining connectivity and diagnosing breakage, not designing VPCs from scratch:
- Troubleshooting firewall rules that block expected traffic, including priority and direction conflicts.
- Verifying VPC peering, Cloud NAT, and Cloud VPN/Interconnect connectivity issues.
- Using Network Intelligence Center tools (Connectivity Tests, Firewall Insights) to diagnose problems.
- Managing DNS records in Cloud DNS and diagnosing propagation or misconfiguration issues.
- Reviewing load balancer health checks when backends are marked unhealthy.
This overlaps with the access-control concepts tested in Domain 4, so review both together rather than in isolation - a firewall troubleshooting question can straddle both domains.
Deployment and Process Automation
Domain 3 also covers keeping deployments repeatable and safe:
- Deployment Manager and Terraform basics: applying, updating, and rolling back infrastructure changes.
- Cloud Build triggers, build steps, and troubleshooting a failed CI/CD pipeline.
- Application Design Center's role in generating and maintaining deployable application blueprints.
- Using `gcloud` scripting and Cloud Scheduler for recurring operational tasks like backups or cleanup jobs.
| Operational Task | Primary Tool | What Exam Questions Test |
|---|---|---|
| Alerting on error rate spike | Cloud Monitoring | Correct metric + threshold + notification channel |
| Diagnosing failed traffic between VPCs | Network Intelligence Center | Choosing Connectivity Tests over manual log review |
| Rolling back a bad Cloud Run revision | Cloud Run traffic splitting | Understanding revision-based rollback, not redeployment |
| Repeated infrastructure updates | Terraform / Deployment Manager | Idempotent apply vs. manual console changes |
| Log routing to BigQuery for analysis | Cloud Logging sinks | Correct sink destination and filter syntax |
AI-Assisted Operations with Gemini Cloud Assist
The current exam guide incorporates Gemini Cloud Assist and Gemini CLI as operational aids, alongside Agent Runtime on the Gemini Enterprise Agent Platform and Google Antigravity. For Domain 3 purposes, you don't need deep AI engineering knowledge - you need to recognize where these tools fit into day-to-day operations:
- Gemini Cloud Assist can summarize error patterns from Cloud Logging and suggest remediation steps inside the console.
- Gemini CLI can help generate or explain `gcloud` commands during troubleshooting sessions.
- These tools assist human operators; they don't replace the need to understand what Cloud Monitoring, Logging, and IAM are actually doing under the hood.
Scheduling Domain 3 Into Your Prep
Domain 3 rewards hands-on repetition more than reading. If you're following a multi-week plan like the one in the ACE study guide for passing on the first attempt, place Domain 3 after you've already provisioned resources in Domain 1 and 2 labs, so you have something real to monitor and troubleshoot.
Monitoring and Logging Deep Dive
- Build dashboards and alerting policies on resources from earlier weeks
- Practice Logs Explorer filters and log-based metrics
Compute, Storage, and Network Operations
- Resize instance groups, test rolling updates, and break/fix a Cloud Run revision
- Run Connectivity Tests against a deliberately misconfigured firewall rule
Deployment Automation Practice
- Apply and roll back a Terraform or Deployment Manager configuration
- Set up and troubleshoot a Cloud Build trigger
Two labs beat ten flashcards for this domain. If your time is limited, prioritize Cloud Monitoring and Cloud Logging labs above everything else in Domain 3.
Registration, Format, and Timing
Domain 3 questions are mixed throughout the exam rather than grouped in a section, so mechanics matter as much as content knowledge. The ACE exam runs 2 hours with 50-60 multiple-choice and multiple-select questions, delivered online-proctored or at a Pearson VUE test center. Registration happens through CM Connect/CertMetrics, and the standard exam fee is $125 USD plus tax. For a full cost breakdown including the renewal path, see the ACE certification cost guide.
There are no prerequisites, but Google recommends 6+ months of hands-on Google Cloud experience - and Domain 3's scenario-based troubleshooting questions are exactly where that experience shows. Candidates get up to 4 attempts in a 2-year period, with waiting periods between failed attempts, so it's worth building real operational familiarity rather than rushing a retake. If you're weighing how difficult this section is compared to the rest of the exam, the ACE difficulty guide and pass rate analysis both discuss where candidates commonly lose points.
Before exam day, run a few timed practice sets on the ACE practice test platform so you get comfortable pacing through mixed-domain scenario questions under the 2-hour clock. Repeating troubleshooting-style questions on a full-length practice exam is one of the fastest ways to convert lab experience into exam-ready recall.
Frequently Asked Questions
Google's exam guide doesn't publish exact per-domain percentages, but Domain 3 is one of four roughly equal content areas alongside setup, planning/implementation, and access/security. Treat it as a major, not minor, share of the exam.
No. You need working familiarity with managing GKE clusters operationally - node pools, scaling, and basic troubleshooting - not deep Kubernetes architecture knowledge, which is more of a Professional-level topic.
Both appear frequently and are often tested together in the same scenario, since alerting policies typically depend on metrics or log-based metrics. Study them as a pair rather than separately.
They appear in the current exam guide but as a smaller supplementary topic. Focus your study time on core monitoring, compute, storage, and networking operations first.
Domain 3 assumes resources already exist from Domain 1 and Domain 2 work, and it overlaps with Domain 4 on networking and access troubleshooting. Reviewing all four together, as outlined in the ACE exam domains guide, helps you spot these cross-domain scenario questions.