- 14 Jan, 2021 8 commits
-
-
Andrew Newdigate authored
Simplify service aggregated apdex calculation See merge request gitlab-com/runbooks!3111
-
Andrew Newdigate authored
-
Igor Wiedler authored
Remove handler label from registry dashboards See merge request gitlab-com/runbooks!3110
-
João Pereira authored
-
Craig Furman authored
Add kube resources to monitoring service See merge request gitlab-com/runbooks!3096
-
Andrew Newdigate authored
refactor(metrics): preparation for aggregation sets See merge request gitlab-com/runbooks!3106
-
Andrew Newdigate authored
Rearranges recording rules into aggregation sets to reduce churn in upcoming MR
-
Bob Van Landuyt authored
Add a filters into stage group dashboards Closes gitlab-com/gl-infra/scalability#740 See merge request gitlab-com/runbooks!3100
-
- 13 Jan, 2021 10 commits
-
-
Hendrik Meyer authored
Add diagnostic links for Cloudflare See merge request gitlab-com/runbooks!3108
-
Quang-Minh Nguyen authored
-
Cameron McFarland authored
Corrective Action: gitlab-com/gl-infra/production#3317 - find project with pages domain See merge request gitlab-com/runbooks!3104
-
Cameron McFarland authored
-
Hendrik Meyer authored
-
Sean McGivern authored
Document gitlab-org/grafana-dashboards as archive for deleted dashboards See merge request gitlab-com/runbooks!3107
-
Igor Wiedler authored
-
Quang-Minh Nguyen authored
-
Quang-Minh Nguyen authored
-
Craig Furman authored
SLIs for thanos memcached See merge request gitlab-com/runbooks!3097
-
- 12 Jan, 2021 10 commits
-
-
Anthony Sandoval authored
-
Andrew Newdigate authored
Fix registry logging links See merge request gitlab-com/runbooks!3102
-
Andrew Newdigate authored
-
Craig Furman authored
Rethink chef-client staleness alerts See merge request gitlab-com/runbooks!3101
-
Craig Furman authored
Only alert on stale chef client runs on instances chef-client is enabled on. Add an alert for when chef client has been disabled for over 24h, much longer than the 5h staleness alert. This allows us to distinguish between unexpected stale runs and intentional pauses, while alerting when a pause might have been accidentally left in place. For example, this scenario arises every time the CI runner manager fleet is updated: https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/runner/update-gitlab-runner-on-managers.md#procedure-description. Remove the duplicate/alternative ChefClientStale alert for simplicity. That alert had an incident_project configured: move it to the now-only ChefClientStale alert, and change the incident_project to production from infrastructure. Infrastructure issue alerts are harder to discover than production issue alerts, because Woodhouse posts to `#incident-management` with news of new production issue alerts but not Infrastructure ones. This change effectively makes the alert slightly noisier: on-calls often pay closer attention to incident-management than the non-paging alerts channel.
-
Craig Furman authored
By error, then by staleness
-
Quang-Minh Nguyen authored
-
Quang-Minh Nguyen authored
-
Quang-Minh Nguyen authored
-
Quang-Minh Nguyen authored
-
- 11 Jan, 2021 3 commits
-
-
Craig Furman authored
-
Igor Wiedler authored
Update container registry HTTP metrics labels See merge request gitlab-com/runbooks!3094
-
Craig Furman authored
-
- 07 Jan, 2021 8 commits
-
-
Craig Miskell authored
Corrects links to moved documentation for Redis Closes #57 See merge request gitlab-com/runbooks!3098
-
John Skarbek authored
-
Jan Urbanc authored
Add schema for rails.api logs See merge request gitlab-com/runbooks!2886
-
Craig Furman authored
Add feature_flag.json Closes gitlab-com/gl-infra/production#3130 See merge request gitlab-com/runbooks!3012
-
Bob Van Landuyt authored
Sync grafana and kibana time ranges Closes gitlab-com/gl-infra/scalability#769 See merge request gitlab-com/runbooks!3095
-
Shinya Maeda authored
-
Quang-Minh Nguyen authored
-
Quang-Minh Nguyen authored
-
- 06 Jan, 2021 1 commit
-
-
Henri Philipps authored
add script for finding duplicate repos See merge request gitlab-com/runbooks!2784
-