Prometheus Certified Associate Exam Questions and Answers
What is the minimum requirement for an application to expose Prometheus metrics?
Options:
It must be exposed to the Internet.
It must be compiled for 64-bit architectures.
It must be able to serve text over HTTP.
It must run on Linux.
Answer:
CExplanation:
Prometheus collects metrics byscraping an HTTP endpointexposed by the target application. Therefore, theonly essential requirementfor an application to expose metrics to Prometheus is that itserves metrics in the Prometheus text exposition format over HTTP.
This endpoint is conventionally available at /metrics and provides metrics in plain text format (e.g., Content-Type: text/plain; version=0.0.4). The application can run on any operating system, architecture, or network — as long as Prometheus can reach its endpoint.
It doesnotneed to be Internet-accessible (it can be internal) and isnot limited to Linuxor any specific bitness.
http_requests_total{verb="POST"} 30
http_requests_total{verb="GET"} 30
What is the issue with the metric family?
Options:
Metric names are missing a prefix to indicate which application is exposing the query.
The value represents two different things across the dimensions: code and verb.
verb label content should be normalized to lowercase.
Unit is missing in the http_requests_total metric name.
Answer:
DExplanation:
Prometheus metric naming best practices require thatevery metric name include a unit suffixthat indicates the measurement type, where applicable. The unit should follow the base name, separated by an underscore, and must usebase SI units(for example, _seconds, _bytes, _total, etc.).
In the case of http_requests_total, while the metric correctly includes the _total suffix—indicating it is a counter—it lacks abase unit of measurement(such as time, bytes, or duration). However, forevent counters, _total is itself considered the unit, representing “total occurrences” of an event. Thus, the naming would be acceptable in strict Prometheus terms, but if this metric were measuring something like duration, size, or latency, then including a specific unit would be mandatory.
However, since the question implies that the missing unit is the issue and not the label schema, the expected answer aligns with ensuring metric names convey measurable units when applicable.
What is api_http_requests_total in the following metric?
api_http_requests_total{method="POST", handler="/messages"}
Options:
"api_http_requests_total" is a metric label name.
"api_http_requests_total" is a metric type.
"api_http_requests_total" is a metric name.
"api_http_requests_total" is a metric field.
Answer:
CExplanation:
In Prometheus, the partbefore the curly braces {}represents themetric name. Therefore, in the metric api_http_requests_total{method="POST", handler="/messages"}, the term api_http_requests_total is themetric name. Metric names describe the specific quantity being measured — in this example, the total number of HTTP requests received by an API.
The portion within the braces defineslabels, which provide additional dimensions to the metric. Here, method="POST" and handler="/messages" are labels describing request attributes. The metric name should follow Prometheus conventions: lowercase letters, numbers, and underscores only, and ending in _total for counters.
This naming scheme ensures clarity and standardization across instrumented applications. The metric type (e.g., counter, gauge) is declared separately in the exposition format, not within the metric name itself.
With the following metrics over the last 5 minutes:
up{instance="localhost"} 1 1 1 1 1
up{instance="server1"} 1 0 0 0 0
What does the following query return:
min_over_time(up[5m])
Options:
{instance="localhost"} 1 {instance="server1"} 0
{instance="server1"} 0
Answer:
AExplanation:
Themin_over_time()function in PromQL returns theminimum sample valueobserved within the specified time range for each time series.
In the given data:
For up{instance="localhost"}, all samples are 1. The minimum value over 5 minutes is therefore1.
For up{instance="server1"}, the sequence is 1 0 0 0 0. The minimum observed value is0.
Thus, the query min_over_time(up[5m]) returns two series — one per instance:
{instance="localhost"} 1
{instance="server1"} 0
This query is commonly used to check uptime consistency. If the minimum value over the time window is 0, it indicates at least one scrape failure (target down).
What does the increase() function do in PromQL?
Options:
Calculates the percentage increase of a counter over time.
Returns the absolute increase in a counter over a specified range.
Calculates the derivative of a gauge over time.
Returns the total sum of values in a vector.
Answer:
BExplanation:
Theincrease()function computes thetotal increase in a counter metricover a specified range vector. It accounts for counter resets and only measures the net change in the counter’s value during the time window.
Example:
increase(http_requests_total[5m])
This query returns how many HTTP requests occurred in the last five minutes. Unlike rate(), which provides a per-second average rate, increase() gives theabsolute number of increments.
Which function would you use to calculate the 95th percentile latency from histogram data?
Options:
quantile_over_time(0.95, http_request_duration_seconds[5m])
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
percentile(http_request_duration_seconds, 0.95)
topk(0.95, http_request_duration_seconds)
Answer:
BExplanation:
To calculate a percentile (e.g., 95th percentile) from histogram data in Prometheus, the correct function ishistogram_quantile(). It estimates quantiles based on cumulative bucket counts.
Example:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
This computes the 95th percentile request duration across all observed instances over the last 5 minutes.
What is considered the best practice when working with alerting notifications?
Options:
Minor alerts are as important as major alerts and should be treated with equal care.
Have as few alerts as possible by alerting only when symptoms might become externally visible.
Have as many alerts as possible to catch minor problems before they become outages.
Make sure to generate alerts on every metric of every component of the stack.
Answer:
BExplanation:
ThePrometheus alerting philosophyemphasizessignal over noise— meaning alerts should focus only onactionable and user-impacting issues. The best practice is toalert on symptoms that indicate potential or actual user-visible problems, not on every internal metric anomaly.
This approach reduces alert fatigue, avoids desensitizing operators, and ensures high-priority alerts get the attention they deserve. For example, alerting on “service unavailable” or “latency exceeding SLO” is more effective than alerting on “CPU above 80%” or “disk usage increasing,” which may not directly affect users.
Option B correctly reflects this principle: keep alerts meaningful, few, and symptom-based. The other options contradict core best practices by promoting excessive or equal-weight alerting, which can overwhelm operations teams.
The following is a list of metrics exposed by an application:
http_requests_total{code="500"} 10
http_requests_total{code="200"} 20
http_requests_total{code="400"} 30
http_requests_total{verb="POST"} 30
http_requests_total{verb="GET"} 30
What is the issue with the metric family?
Options:
Metric names are missing a prefix to indicate which application is exposing the query.
The value represents two different things across the dimensions: code and verb.
Answer:
BExplanation:
Prometheus requires that asingle metric name represents one well-defined thing, and all time series in that metricshare the same set of label keysso the value’s meaning is consistent across dimensions. The official guidance states that metrics should not “mix different dimensions under the same name,” and that a metric name should have aconsistent label schema; otherwise, “the same metric name would represent different things,” making queries ambiguous and aggregations error-prone. In the example, http_requests_total{code="…"} expressesper-status-code request counts, while http_requests_total{verb="…"} expressesper-HTTP-method request counts. Because some series have only code and others only verb, thevalue changes its meaning across label sets, violating the consistency principle for a metric family. The correct approach is to exposeone metric with both labels present on every series, e.g., http_requests_total{code="200", method="GET"}, ensuring every time series has the same label keys and the value always means “count of requests,” sliced by thesamedimensions. A missing application prefix is optional and not the core issue here.
Which kind of metrics are associated with the function deriv()?
Options:
Counters
Gauges
Summaries
Histograms
Answer:
BExplanation:
Thederiv()function in PromQL calculates theper-second derivativeof a time series using linear regression over the provided time range. It estimates theinstantaneous rate of changefor metrics that can both increase and decrease — which are typicallygauges.
Because counters can only increase (except when reset), rate() or increase() functions are more appropriate for them. deriv() is used to identify trends in fluctuating metrics like CPU temperature, memory utilization, or queue depth, where values rise and fall continuously.
In contrast,summariesandhistogramsconsist of multiple sub-metrics (e.g., _count, _sum, _bucket) and are not directly suited for derivative calculation without decomposition.
What are Inhibition rules?
Options:
Inhibition rules mute a set of alerts when another matching alert is firing.
Inhibition rules repeat a set of alerts when another matching alert is firing.
Inhibition rules inject a new set of alerts when a matching alert is firing.
Inhibition rules inspect alerts when a matching set of alerts is firing.
Answer:
AExplanation:
Inhibition rulesin Prometheus’sAlertmanagerare used tosuppress (mute) alertsthat would otherwise be redundant when a higher-priority or related alert is already active. This feature helps avoid alert noise and ensures that operators focus on the root cause rather than multiple cascading symptoms.
For example, if a “DatacenterDown” alert is firing, inhibition rules can mute all “InstanceDown” alerts that share the same datacenter label, preventing redundant notifications. Inhibition is configured in the Alertmanager configuration file under the inhibit_rules section.
Each rule defines:
Asource match(the alert that triggers inhibition),
Atarget match(the alert to mute), and
Amatch condition(labels that must be equal for inhibition to apply).
Only when the source alert is active are the target alerts silenced.
Which of the following PromQL queries is invalid?
Options:
max by (instance) up
max on (instance) (up)
max without (instance) up
max without (instance, job) up
Answer:
BExplanation:
Themaxoperator in PromQL is anaggregation operator, not abinary vector matching operator. Therefore, the valid syntax for aggregation uses by() or without(), not on().
✅max by (instance) up → Valid; aggregates maximum values per instance.
✅max without (instance) up and max without (instance, job) up → Valid; aggregates over all labels except those listed.
❌max on (instance) (up) → Invalid; the keyword on() is only valid inbinary operations(e.g., +, -, and, or, unless), where two vectors are being matched on specific labels.
Hence, max on (instance) (up) is a syntax error in PromQL because on() cannot be used directly with aggregation operators.
What is the name of the official *nix OS kernel metrics exporter?
Options:
Prometheus_exporter
node_exporter
metrics_exporter
os_exporter
Answer:
BExplanation:
Theofficial Prometheus exporterfor collecting system-level and kernel-related metrics from Linux and other UNIX-like operating systems is theNode Exporter.
TheNode Exporterexposes hardware and OS metrics including CPU load, memory usage, disk I/O, network traffic, and kernel statistics. It is designed to provide host-level observability and serves data at the default endpoint :9100/metrics in the standard Prometheus exposition text format.
This exporter is part of the official Prometheus ecosystem and is widely deployed for infrastructure monitoring. None of the other listed options (Prometheus_exporter, metrics_exporter, or os_exporter) are official components of the Prometheus project.
What are the four golden signals of monitoring as defined by Google’s SRE principles?
Options:
Traffic, Errors, Latency, Saturation
Requests, CPU, Memory, Latency
Availability, Logging, Errors, Throughput
Utilization, Load, Disk, Network
Answer:
AExplanation:
TheFour Golden Signals—Traffic, Errors, Latency, and Saturation—are key service-level indicators defined by Google’s Site Reliability Engineering (SRE) discipline.
Traffic:Demand placed on the system (e.g., requests per second).
Errors:Rate of failed requests.
Latency:Time taken to serve requests.
Saturation:How “full” the system resources are (CPU, memory, etc.).
Prometheus and its metrics-based model are ideal for capturing these signals.
What should you do with counters that have labels?
Options:
Investigate if you can move their label value inside their metric name to limit the number of labels.
Make sure every counter with labels has an extra counter, aggregated, without labels.
Instantiate them with their possible label values when creating them so they are exposed with a zero value.
Save their state between application runs so you can restore their last value on startup.
Answer:
CExplanation:
Prometheus counters with labels can causemissing time seriesin queries if some label combinations have not yet been observed. To ensure visibility and continuity, therecommended best practiceis toinstantiate counters with all expected label values at application startup, even if their initial value is zero.
This ensures that every possible labeled time series is exported consistently, which helps when dashboards or alerting rules expect the presence of those series. For example, if a counter like http_requests_total{method="POST",status="200"} has not yet received a POST request, initializing it with a zero ensures it is still exposed.
Option A is incorrect — label values should never be encoded into metric names.
Option B adds redundancy and does not solve the initialization issue.
Option D is discouraged; counters should reset naturally upon restart, reflecting Prometheus’sephemeral metric model.
Which Prometheus component handles service discovery?
Options:
Alertmanager
Prometheus Server
Pushgateway
Node Exporter
Answer:
BExplanation:
ThePrometheus Serveris responsible forservice discovery, which identifies the list of targets to scrape. It integrates with multiple service discovery mechanisms such as Kubernetes, Consul, EC2, and static configurations.
This allows Prometheus to automatically adapt to dynamic environments without manual reconfiguration.
What is a difference between a counter and a gauge?
Options:
Counters change value on each scrape and gauges remain static.
Counters and gauges are different names for the same thing.
Counters have no labels while gauges can have many labels.
Counters are only incremented, while gauges can go up and down.
Answer:
DExplanation:
The key difference between acounterand agaugein Prometheus lies in how their values change over time. Acounteris a cumulative metric thatonly increases—it resets to zero only when the process restarts. Counters are typically used for metrics like total requests served, bytes processed, or errors encountered. You can derive rates of change from counters using functions like rate() or increase() in PromQL.
Agauge, on the other hand, represents a metric that cango up and down. It measures values that fluctuate, such as CPU usage, memory consumption, temperature, or active session counts. Gauges provide a snapshot of current state rather than a cumulative total.
This distinction ensures proper interpretation of time-series trends and prevents misrepresentation of one-time or fluctuating values as cumulative metrics.
What popular open-source project is commonly used to visualize Prometheus data?
Options:
Kibana
Grafana
Thanos
Loki
Answer:
BExplanation:
The most widely usedopen-source visualization and dashboarding platformfor Prometheus data isGrafana. Grafana provides native integration with Prometheus as a data source, allowing users to createreal-time, interactive dashboardsusing PromQL queries.
Grafana supports advanced visualization panels (graphs, heatmaps, gauges, tables, etc.) and enables users to design custom dashboards to monitor infrastructure, application performance, and service-level objectives (SLOs). It also provides alerting capabilities that can complement or extend Prometheus’s own alerting system.
WhileKibanais part of the Elastic Stack and focuses on log analytics,Thanosextends Prometheus for long-term storage and high availability, andLokiis a log aggregation system. None of these tools serve as the primary dashboarding solution for Prometheus metrics the way Grafana does.
Grafana’s seamless Prometheus integration and templating support make it thede facto standard visualization toolin the Prometheus ecosystem.
What is the difference between client libraries and exporters?
Options:
Exporters are written in Go. Client libraries are written in many languages.
Exporters expose metrics for scraping. Client libraries push metrics via Remote Write.
Exporters run next to the services to monitor, and use client libraries internally.
Exporters and client libraries mean the same thing.
Answer:
CExplanation:
The fundamental difference betweenPrometheus client librariesandexporterslies inhowandwherethey are used.
Client librariesare integrated directly into theapplication’s codebase. They allow developers toinstrumenttheir own code to define and expose custom metrics. Prometheus provides official client libraries for multiple languages, including Go, Java, Python, and Ruby.
Exporters, on the other hand, arestandalone processesthat run alongside the applications or systems they monitor. They use client libraries internally to collect and expose metrics from software that cannot be instrumented directly (e.g., operating systems, databases, or third-party services). Examples include the Node Exporter (for system metrics) and MySQL Exporter (for database metrics).
Thus, exporters are typically used forexternal systems, while client libraries are used forself-instrumented applications.