Skip to content

feat: add middleware backend analytics#1933

Open
alanpeixinho wants to merge 2 commits into
kernelci:mainfrom
profusion:feat/user-metrics
Open

feat: add middleware backend analytics#1933
alanpeixinho wants to merge 2 commits into
kernelci:mainfrom
profusion:feat/user-metrics

Conversation

@alanpeixinho

@alanpeixinho alanpeixinho commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What it is

  • Adds a middleware custom metric to log anonymous user visitor metrics for the endpoint access.

How to test

  • Start a docker compose dev stack
  • Check prometheus metric port (default 8081) to check that metrics are being stored (mainly unique visitors, and endpoints_by_client).

Closes #1928 #1929

@bhcopeland

Copy link
Copy Markdown
Member

Not familiar with GoatCounter. Looking at Google and screenshots, it seems to do the job, but my worry is that it's another application layer we have to host. Would it make sense to export this to Prometheus instead? Or look into a module like GoatCounter that has a Prometheus endpoint we can scrape into a dashboard. A good use case for Prometheus is that it's a single endpoint for our graphs.

@alanpeixinho alanpeixinho marked this pull request as draft June 10, 2026 13:47
@alanpeixinho

Copy link
Copy Markdown
Contributor Author

Hi @bhcopeland , changed it to draft, because it is more of a discussion starter than a proper ready to production PR.
But yes, we are discussing the pros/cons of just trying to track this ourselves and send to Prometheus. The biggest problem we see on this is to make sure we are keeping .
But you made a good point, we might be able to capture analytics in a privacy compliant library, and store it in prometheus, having the best of both worlds. I will take a look on this.

@alanpeixinho

Copy link
Copy Markdown
Contributor Author

Not familiar with GoatCounter. Looking at Google and screenshots, it seems to do the job, but my worry is that it's another application layer we have to host. Would it make sense to export this to Prometheus instead? Or look into a module like GoatCounter that has a Prometheus endpoint we can scrape into a dashboard. A good use case for Prometheus is that it's a single endpoint for our graphs.

I can see two paths we could take here:

  1. We could implement an exporter for the analytics sqlite metrics to prometheus.
  2. Or, since we already have grafana dashboard, we could limit our analytics to just nginx log available info (might give us at least unique users (which we could derive in a similar fashion to GC), pages, browsers, OS and origin).

What you think @bhcopeland .

@alanpeixinho alanpeixinho changed the title feat: add self-hosted GoatCounter analytics feat: add middleware backend analytics Jun 11, 2026
@alanpeixinho

Copy link
Copy Markdown
Contributor Author

Not familiar with GoatCounter. Looking at Google and screenshots, it seems to do the job, but my worry is that it's another application layer we have to host. Would it make sense to export this to Prometheus instead? Or look into a module like GoatCounter that has a Prometheus endpoint we can scrape into a dashboard. A good use case for Prometheus is that it's a single endpoint for our graphs.

I can see two paths we could take here:

  1. We could implement an exporter for the analytics sqlite metrics to prometheus.
  2. Or, since we already have grafana dashboard, we could limit our analytics to just nginx log available info (might give us at least unique users (which we could derive in a similar fashion to GC), pages, browsers, OS and origin).

What you think @bhcopeland .

Made some changes to just a simple custom metric on our already existing prometheus logger, we have less rich information, than with a proper analytics, but we might have the necessary information to our needs, and properly integrated with our grafana metrics analytics.

@alanpeixinho

alanpeixinho commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

A screenshot sample of dashboard for the new metrics. (extending the current grafana dashboard)
@bhcopeland @tales-aparecida you guys have any suggestions here?

metrics-improvement

@tales-aparecida

Copy link
Copy Markdown

I think this is doing exactly what I had in mind! Looking great

@mentonin mentonin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good and fairly comprehensive for backend visitor metrics, my comments are mostly nits or suggestions.

Comment on lines 91 to +92
"kernelCI_app.middleware.logServerErrorMiddleware.LogServerErrorMiddleware",
"kernelCI_app.middleware.backendRequestMetricsMiddleware.BackendRequestMetricsMiddleware",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use a module and re-export the middleware classes to improve readability

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer explicit naming the class, module exporting in python can lead to far more confusion

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how kernelCI_app.middleware.backendRequestMetricsMiddleware.BackendRequestMetricsMiddleware is better than kernelCI_app.middleware.BackendRequestMetricsMiddleware, but not a blocker.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is explicit, changing module exports in runtime introduces quite a lot of noise for a simple string. Or you think differently?

Comment on lines +16 to +17
UNIQUE_VISITOR_TTL_SECONDS = 48 * 60 * 60
UNIQUE_VISITOR_SALT_BYTES = 32

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 days?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mostly overshoot the necessary time (maybe too much), just to be sure we don't loose cached values.

Comment on lines +24 to +32
[
"endpoint",
"method",
"status_class",
"browser",
"os",
"device",
"referrer_domain",
],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may result in very high cardinality

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions?

@mentonin mentonin Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can test it and deal with it later when/if issues arise.
For sanity checks, we could look into the cardinality and relevance of each item and see if we should drop anything:

  • endpoint: around 30?, grows with API surface, very relevant, could be filtered or processed into smaller bins if needed
  • method: 9, I think only one value is valid for most endpoints, which endpoints need this? Are invalid requests filtered out before or after this point? Could bloat cardinality unnecessarily if we track invalid requests.
  • status_class: 5, relevant for tracking availability and server errors
  • browser: 8. Relevant for tracking build targets and separating browsers, bots and non-browser consumers (kci-dev). We could reduce browser granularity, or track specific tools (kci-dev sets a specific user-agent).
  • os: 7, I don't think it is very relevant besides device type
  • device: 3 (no "unknown"), very relevant
  • referrer_domain: virtually infinite, a handful in practice. I think we could reduce to internal, direct, external and maybe add specific tracking of other sources later?

Potential cardinality of over 2 million with 10 referrers, 75600 if we only have one method per endpoint and 3 referrers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more sound thing to do here is to start collecting some information. This way we have a proper idea of which columns might shown problems, and then act on them.

Comment on lines +35 to +45
DASHBOARD_UNIQUE_VISITORS_TOTAL = Counter(
"dashboard_unique_visitors_total",
"Daily unique backend visitors",
)

DASHBOARD_UNIQUE_VISITORS_BY_ENDPOINT_TOTAL = Counter(
"dashboard_unique_visitors_by_endpoint_total",
"Daily unique backend visitors deduplicated per endpoint by rotated Redis salt",
["endpoint"],
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name might be confusing. I associate "visitors" with website access, while this tracks api usage. I would rather have a name using "backend" or "api", or maybe replacing "visitors" with something like "consumers"

@alanpeixinho alanpeixinho Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tales-aparecida do you have any thoughts here? Do you think using "consumers" to address what we have been calling "visitors" would be clearer for the intended audience?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll say: copy the names from well-established frameworks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what I see, visitors is by far the most used term here, even for backend-only tracking.
Client is sometimes adopted as well, and could be an alternative.

Comment on lines +67 to +68
def record_client(**kwargs) -> None:
DASHBOARD_BACKEND_REQUESTS_BY_CLIENT.labels(**kwargs).inc()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like kwargs here, the dict keys are known and must be the same as DASHBOARD_BACKEND_REQUESTS_BY_CLIENT labels.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +253 to +256
if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent:
return "mobile"
if "android" in normalized_user_agent:
return "mobile"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent:
return "mobile"
if "android" in normalized_user_agent:
return "mobile"
if any(s in normalized_user_agent for s in ["mobile", "iphone", "android"]):
return true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearer in my opinion, and keeps each specific return value easily traceable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will disagree on this one, despite liking to keep the code more declarative, a nested loop might be less readable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer a single return "mobile" exit point

Comment on lines +227 to +230
if "curl/" in normalized_user_agent or "wget/" in normalized_user_agent:
return "HTTP Client"
if "python-requests/" in normalized_user_agent:
return "HTTP Client"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in get_device

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread backend/kernelCI_app/middleware/backendRequestMetricsMiddleware.py
Comment thread backend/kernelCI_app/middleware/backendRequestMetricsMiddleware.py
proxy_send_timeout 240s;
send_timeout 240s;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting change for an otherwise unchaged file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alanpeixinho alanpeixinho force-pushed the feat/user-metrics branch 2 times, most recently from 70a06a7 to ee0f4ce Compare June 15, 2026 15:35
@alanpeixinho alanpeixinho requested a review from mentonin June 15, 2026 15:40
@alanpeixinho alanpeixinho marked this pull request as ready for review June 15, 2026 17:22
Comment thread docs/monitoring.md Outdated
- **Metrics Path**: `/metrics/`
- **Scrape Interval**: 15 seconds

## Client Analytics & Privacy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we must move this into a proper Privacy Policy file and link to it from the frontend for full compliance. I suggest going through with these changes and tracking the Privacy Policy as a new issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending on this, we might need to include contact information on the PRIVACY.md doc, which contact should we include here @bhcopeland ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's should be some tracker for this already. I vaguely recall that we had planned to use the same framework the Linux Foundation website is using

Comment on lines +24 to +32
[
"endpoint",
"method",
"status_class",
"browser",
"os",
"device",
"referrer_domain",
],

@mentonin mentonin Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can test it and deal with it later when/if issues arise.
For sanity checks, we could look into the cardinality and relevance of each item and see if we should drop anything:

  • endpoint: around 30?, grows with API surface, very relevant, could be filtered or processed into smaller bins if needed
  • method: 9, I think only one value is valid for most endpoints, which endpoints need this? Are invalid requests filtered out before or after this point? Could bloat cardinality unnecessarily if we track invalid requests.
  • status_class: 5, relevant for tracking availability and server errors
  • browser: 8. Relevant for tracking build targets and separating browsers, bots and non-browser consumers (kci-dev). We could reduce browser granularity, or track specific tools (kci-dev sets a specific user-agent).
  • os: 7, I don't think it is very relevant besides device type
  • device: 3 (no "unknown"), very relevant
  • referrer_domain: virtually infinite, a handful in practice. I think we could reduce to internal, direct, external and maybe add specific tracking of other sources later?

Potential cardinality of over 2 million with 10 referrers, 75600 if we only have one method per endpoint and 3 referrers

Comment on lines +253 to +256
if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent:
return "mobile"
if "android" in normalized_user_agent:
return "mobile"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer a single return "mobile" exit point

  * Add custom metrics to estimate unique visitors.
  * Add custom metrics to estimate usefull client information (browser,
    os, device)
  * Add PRIVACY.md file

Signed-off-by: Alan Peixinho <alan.peixinho@profusion.mobi>
@alanpeixinho alanpeixinho added Backend Most or all of the changes for this issue will be in the backend code. Metrics Related to open metrics, measurements or usage data labels Jun 16, 2026
  * Add custom metrics to estimate unique visitors.
  * Add custom metrics to estimate usefull client information (browser,
    os, device)

Signed-off-by: Alan Peixinho <alan.peixinho@profusion.mobi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Most or all of the changes for this issue will be in the backend code. Metrics Related to open metrics, measurements or usage data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track dashboard user behavior

4 participants