feat: add middleware backend analytics#1933
Conversation
|
Not familiar with GoatCounter. Looking at Google and screenshots, it seems to do the job, but my worry is that it's another application layer we have to host. Would it make sense to export this to Prometheus instead? Or look into a module like GoatCounter that has a Prometheus endpoint we can scrape into a dashboard. A good use case for Prometheus is that it's a single endpoint for our graphs. |
|
Hi @bhcopeland , changed it to draft, because it is more of a discussion starter than a proper ready to production PR. |
I can see two paths we could take here:
What you think @bhcopeland . |
a1021c6 to
3f808e6
Compare
Made some changes to just a simple custom metric on our already existing prometheus logger, we have less rich information, than with a proper analytics, but we might have the necessary information to our needs, and properly integrated with our grafana metrics analytics. |
|
A screenshot sample of dashboard for the new metrics. (extending the current grafana dashboard)
|
|
I think this is doing exactly what I had in mind! Looking great |
mentonin
left a comment
There was a problem hiding this comment.
Overall looks pretty good and fairly comprehensive for backend visitor metrics, my comments are mostly nits or suggestions.
| "kernelCI_app.middleware.logServerErrorMiddleware.LogServerErrorMiddleware", | ||
| "kernelCI_app.middleware.backendRequestMetricsMiddleware.BackendRequestMetricsMiddleware", |
There was a problem hiding this comment.
We should probably use a module and re-export the middleware classes to improve readability
There was a problem hiding this comment.
I prefer explicit naming the class, module exporting in python can lead to far more confusion
There was a problem hiding this comment.
I don't see how kernelCI_app.middleware.backendRequestMetricsMiddleware.BackendRequestMetricsMiddleware is better than kernelCI_app.middleware.BackendRequestMetricsMiddleware, but not a blocker.
There was a problem hiding this comment.
Because it is explicit, changing module exports in runtime introduces quite a lot of noise for a simple string. Or you think differently?
| UNIQUE_VISITOR_TTL_SECONDS = 48 * 60 * 60 | ||
| UNIQUE_VISITOR_SALT_BYTES = 32 |
There was a problem hiding this comment.
I have mostly overshoot the necessary time (maybe too much), just to be sure we don't loose cached values.
| [ | ||
| "endpoint", | ||
| "method", | ||
| "status_class", | ||
| "browser", | ||
| "os", | ||
| "device", | ||
| "referrer_domain", | ||
| ], |
There was a problem hiding this comment.
This may result in very high cardinality
There was a problem hiding this comment.
Do you have any suggestions?
There was a problem hiding this comment.
I think we can test it and deal with it later when/if issues arise.
For sanity checks, we could look into the cardinality and relevance of each item and see if we should drop anything:
- endpoint: around 30?, grows with API surface, very relevant, could be filtered or processed into smaller bins if needed
- method: 9, I think only one value is valid for most endpoints, which endpoints need this? Are invalid requests filtered out before or after this point? Could bloat cardinality unnecessarily if we track invalid requests.
- status_class: 5, relevant for tracking availability and server errors
- browser: 8. Relevant for tracking build targets and separating browsers, bots and non-browser consumers (kci-dev). We could reduce browser granularity, or track specific tools (kci-dev sets a specific user-agent).
- os: 7, I don't think it is very relevant besides device type
- device: 3 (no "unknown"), very relevant
- referrer_domain: virtually infinite, a handful in practice. I think we could reduce to
internal,direct,externaland maybe add specific tracking of other sources later?
Potential cardinality of over 2 million with 10 referrers, 75600 if we only have one method per endpoint and 3 referrers
There was a problem hiding this comment.
I think the more sound thing to do here is to start collecting some information. This way we have a proper idea of which columns might shown problems, and then act on them.
| DASHBOARD_UNIQUE_VISITORS_TOTAL = Counter( | ||
| "dashboard_unique_visitors_total", | ||
| "Daily unique backend visitors", | ||
| ) | ||
|
|
||
| DASHBOARD_UNIQUE_VISITORS_BY_ENDPOINT_TOTAL = Counter( | ||
| "dashboard_unique_visitors_by_endpoint_total", | ||
| "Daily unique backend visitors deduplicated per endpoint by rotated Redis salt", | ||
| ["endpoint"], | ||
| ) | ||
|
|
There was a problem hiding this comment.
I think the name might be confusing. I associate "visitors" with website access, while this tracks api usage. I would rather have a name using "backend" or "api", or maybe replacing "visitors" with something like "consumers"
There was a problem hiding this comment.
@tales-aparecida do you have any thoughts here? Do you think using "consumers" to address what we have been calling "visitors" would be clearer for the intended audience?
There was a problem hiding this comment.
I'll say: copy the names from well-established frameworks
There was a problem hiding this comment.
For what I see, visitors is by far the most used term here, even for backend-only tracking.
Client is sometimes adopted as well, and could be an alternative.
| def record_client(**kwargs) -> None: | ||
| DASHBOARD_BACKEND_REQUESTS_BY_CLIENT.labels(**kwargs).inc() |
There was a problem hiding this comment.
I don't like kwargs here, the dict keys are known and must be the same as DASHBOARD_BACKEND_REQUESTS_BY_CLIENT labels.
| if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent: | ||
| return "mobile" | ||
| if "android" in normalized_user_agent: | ||
| return "mobile" |
There was a problem hiding this comment.
| if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent: | |
| return "mobile" | |
| if "android" in normalized_user_agent: | |
| return "mobile" | |
| if any(s in normalized_user_agent for s in ["mobile", "iphone", "android"]): | |
| return true |
There was a problem hiding this comment.
This is clearer in my opinion, and keeps each specific return value easily traceable
There was a problem hiding this comment.
I will disagree on this one, despite liking to keep the code more declarative, a nested loop might be less readable
There was a problem hiding this comment.
I would still prefer a single return "mobile" exit point
| if "curl/" in normalized_user_agent or "wget/" in normalized_user_agent: | ||
| return "HTTP Client" | ||
| if "python-requests/" in normalized_user_agent: | ||
| return "HTTP Client" |
There was a problem hiding this comment.
see comment in get_device
| proxy_send_timeout 240s; | ||
| send_timeout 240s; | ||
| } | ||
|
|
There was a problem hiding this comment.
Formatting change for an otherwise unchaged file
70a06a7 to
ee0f4ce
Compare
| - **Metrics Path**: `/metrics/` | ||
| - **Scrape Interval**: 15 seconds | ||
|
|
||
| ## Client Analytics & Privacy |
There was a problem hiding this comment.
I believe we must move this into a proper Privacy Policy file and link to it from the frontend for full compliance. I suggest going through with these changes and tracking the Privacy Policy as a new issue.
There was a problem hiding this comment.
Extending on this, we might need to include contact information on the PRIVACY.md doc, which contact should we include here @bhcopeland ?
There was a problem hiding this comment.
There's should be some tracker for this already. I vaguely recall that we had planned to use the same framework the Linux Foundation website is using
| [ | ||
| "endpoint", | ||
| "method", | ||
| "status_class", | ||
| "browser", | ||
| "os", | ||
| "device", | ||
| "referrer_domain", | ||
| ], |
There was a problem hiding this comment.
I think we can test it and deal with it later when/if issues arise.
For sanity checks, we could look into the cardinality and relevance of each item and see if we should drop anything:
- endpoint: around 30?, grows with API surface, very relevant, could be filtered or processed into smaller bins if needed
- method: 9, I think only one value is valid for most endpoints, which endpoints need this? Are invalid requests filtered out before or after this point? Could bloat cardinality unnecessarily if we track invalid requests.
- status_class: 5, relevant for tracking availability and server errors
- browser: 8. Relevant for tracking build targets and separating browsers, bots and non-browser consumers (kci-dev). We could reduce browser granularity, or track specific tools (kci-dev sets a specific user-agent).
- os: 7, I don't think it is very relevant besides device type
- device: 3 (no "unknown"), very relevant
- referrer_domain: virtually infinite, a handful in practice. I think we could reduce to
internal,direct,externaland maybe add specific tracking of other sources later?
Potential cardinality of over 2 million with 10 referrers, 75600 if we only have one method per endpoint and 3 referrers
| if "mobile" in normalized_user_agent or "iphone" in normalized_user_agent: | ||
| return "mobile" | ||
| if "android" in normalized_user_agent: | ||
| return "mobile" |
There was a problem hiding this comment.
I would still prefer a single return "mobile" exit point
* Add custom metrics to estimate unique visitors.
* Add custom metrics to estimate usefull client information (browser,
os, device)
* Add PRIVACY.md file
Signed-off-by: Alan Peixinho <alan.peixinho@profusion.mobi>
ee0f4ce to
477a759
Compare
* Add custom metrics to estimate unique visitors.
* Add custom metrics to estimate usefull client information (browser,
os, device)
Signed-off-by: Alan Peixinho <alan.peixinho@profusion.mobi>

What it is
How to test
Closes #1928 #1929