mozilla :: #missioncontrol

13 Jul 2017
05:29cloudops-ansibledata/telemetry-streaming build #65: deployed to prod.
18:14digitaraldwlach: I am working on pulling the error_aggregates into the quantum dashboard. Did you try to calculate MTBF using the fields in the aggregate?
18:15wlachdigitarald: what do you mean by mtbf? mean time between failure?
18:16digitaraldwlach: yes, that is how some metrics are aggregated for quantum
18:17wlachhmm I'm not sure how that would work with the error_aggregates schema, we only should aggregates of the various error measures per unique set of dimensions over the time interval
18:17digitaraldwlach: the data currently comes from https://bugzilla.mozilla.org/show_bug.cgi?id=1371813
18:18digitaraldhttps://s3-us-west-2.amazonaws.com/telemetry-public-analysis-2/bsmedberg/daily-latency-metrics/20170609.json
18:22frankwe can do MTBF currently using error_aggregates
18:22wlachmaybe mtbf is stored directly in the error_aggregates table? to be honest I am not that familiar with the measures aside from the crash aggregates
18:23frankdigitarald: it has both usage_hours and input_event_response_coalesced_ms_main_above_250, just SUM(usage_hours) / SUM(input_event_response_coalesced_ms_main_above_250)
18:24digitaraldfrank: right, that is what I just started doing as well and it looks as expected
18:24frankperfect, I'm eventually going to be adding client_counts so we can have the rest of the quantum RC in missioncontrol as well
18:25digitaraldthanks for confirming; my mind was baselined with our crash aggregate, crashes per 1000 usage hours
18:25digitaraldfrank: so affected client I can't get yet? what's the timeline?
18:26frankdigitarald: well with Mauro out the rest of the month it's just me, and I haven't put any priority towards it
18:26frankI don't even think there's a bug for it
18:27digitarald running things in stmo again, gives me time to think
18:27frankdigitarald: is it quantum blocking to have those numbers in mission control?
18:28frankdigitarald: i.e. when the quantum_rc dataset isn't enough
18:37digitaraldfrank: if there is a bug to have Mission Control reporting Quantum RC, then this should be a depedency
18:37frankdigitarald: I'll dig around, wlach do you know of any related bugs?
18:37digitaraldI don't want to switch only a subset of graphs over
18:37wlachfrank: hmm let me check
18:38frankdigitarald: can you send me a link to those plots?
18:38digitaraldfrank: http://health.graphics/quantum
18:38frankah okay, those
18:39wlachfrank: all I have is https://bugzilla.mozilla.org/show_bug.cgi?id=1369775
18:39frankwlach: Thanks!
18:39wlachI feel like we are conflating "mission control" with "the soft real time error_aggregates" dataset
18:40frankwlach: I agree
18:40frankwlach: presumably the "mission control" dashboard should have all of the release criteria showing
18:40wlachfrank: yup that is the plan
18:40wlachI guess part of the problem is that there is no mission control dashboard working yet, so people just imagine the project to be this vague amorphous thing that will solve all their problems
18:41frankwlach: you mean it won't solve all my problems?!?!?!
18:41digitaraldoh, wow; stmo fell over with https://pastebin.mozilla.org/9027023
18:41wlachlol, we will see
18:41frankwlach: but yes, you are correct, though from what I saw already it should be great :)
18:41wlachfrank: I am pretty confident it will be useful
18:42frankdigitarald: that is an odd error, what is the query?
18:43frankdigitarald: all queries are failing
18:43* frank pings robotblake
18:43digitaraldfrank: next run worked
18:43robotblakeNode died :(
18:50frankwlach: qq - is the missioncontrol-api already running?
18:51wlachfrank: sort of https://data-missioncontrol.dev.mozaws.net
18:51wlachthere are some endpoints which can fetch stuff, but they are slow and kind of broken
18:52frankwlach: are you working on that?
18:53wlachfrank: yup, I'm working on a smaller set of somewhat restricted endpoints for an MVP version
18:54frankwlach: okay, because we are probably going to need to hook in to that API for experiments viewer :)
18:54wlachfrank: ooh tell me more
18:54frankwlach: basically there is a need for some real-time reporting in experiments viewer
18:55franke.g. clients and crashes, mainly
18:55wlachok, so "given this experiment + this set of other dimensions what is the crash rate?" sort of thing?
18:55frankso my thought was have another real-time table just for our experiments stuff, and use mission-control API to query it from our frontend
18:55frankwlach: right!
18:55frankbasically just "experiment + branch"
18:56frankwe can't add the experiment branch to error_aggregates though because it can double count :(
18:56wlachmakes sense, we could also consider hacking such a view into missioncontrol itself, then linking to it from the experiments viewer
18:56frankwlach: that would be even better :)
18:56frankwell, maybe better
18:57frankI think it's better, but PMs may disagree
18:57wlachbased on my experience with treeherder, having a bunch of other random people depending on your API is a bit of a pain
18:57frankwlach: right, probably is. OTOH I think the API could become a central theme in making real-time powered dashboards
18:58frankbut that could be a longer-term goal, and near-term we could just link
18:58wlachfrank: yeah, I think longer term an API would be nice. just from a maintenance standpoint it would be easier for us to cross-link
18:59wlachthe frontend technology stack in missioncontrol is very similar to that of experiments-viewer, so it should be easy for a developer familiar with one to contribute to the other if they have a pet feature they want
18:59frankwlach: okay, that is good to know. Once the MVP is up and running, does it seem like it may be difficult to power that dashboard using a separate dataset?
19:00wlachyou mean access another parquet/athena dataset from mc? I don't see why not, we're just using hive/presto, which you can point at anything
19:05frankwlach: right yeah, just wanted to make sure it wouldn't be hard from mc-api perspective
19:06wlachfrank: tbh I'm not sure what the best long term plan for an mc api is. I emphatically don't want to turn it into a redash-replacement
19:07frankokay okay :) I see your point
19:07frankas the owner of the aggregates service I completely see what you mean
19:07wlachfrank: but other than that I'm open to suggestions on what it should do :)
19:08frankwlach: well, other options are we use pyhive/botocore to query presto/athena directly
19:08frankjust kind of get rid of an api altogether, right?
19:09wlachfrom experiments viewer?
19:09wlachI imagine it does something like that already?
19:10frankwlach: no, we have an ETL job that takes parquet data and spits it into postgres
19:11wlachah ok, I was planning to do basically exactly that for mc (well, except storing in redis instead of postgres at least initially)
19:11wlachI almost wonder if these should be seperate projects
19:11frankwlach: so the etl job would run what, every 15-30 minutes?
19:12frankwlach: i wonder that too
19:12frankthe frontends are not even so different
19:12frankbut I think joining them together would make it much harder to make progress for each
19:13wlachfrank: yeah something like that -- I was thinking of refreshing some set of predefined aggregations every 5 minutes, but only querying for the freshest data and updating the aggregates
19:14wlachfrank: yeah I think trying to join them together at this point would be a bit of a distraction
19:14wlachrobhudson has been reviewing some of my pr's, so at least he's getting familiar with it... maybe we can revisit this topic at the end of the quarter
19:16frankwlach: that seems like it might be a good suggestion
19:18frankokay, to sum up - for experiments, we can either use an API and draw the plots in the viewer, or link to missioncontrol
19:18frankwlach: if we were to pursue the second path, what is the current timeline for the MVP?
19:19wlachgood question, I think by early next week I should have something end-to-end demoable
19:20frankoh dang!!
19:20wlachI thought I was blocked on getting an etl layer working with celery (which I've finished, but am waiting on devops to integrate), but I think I've figured out how to get something working without that for now
19:21frankthat's great. In that case linking sounds very plausible
19:22wlachwell you'd need to modify mc to allow querying for or displaying the data you want (or get me to do it)
19:22wlachbut yeah, it's not like you should have to wait weeks and weeks or anything
19:23frankwlach: since basically it's only crashes + counts, the display should already be available, right? Just need to point it at the right data
19:24wlachbut you want the data faceted by experiment, right?
19:24frankwlach: right, right
19:24frankso it would be a few changes
19:31wlachyeah probably nothing too crazy
19:33frankwlach: do you know if there would be some way to just hot link those plots into the experiments viewer?
19:33wlachfrank: we could do an iframe or something
19:33wlachbut seems like more hassle than it's worth
19:34frankagreed, probably is
22:26digitaraldwlach frank: I dropped the ball on the chat; was an issue created for the client_count?
22:27frankdigitarald: it was, see bug 1380753
22:27digitaraldnice. Just wanted to update @sphilp on having MI data in the dashboard.
22:27frankthere is no metabug for quantum RC in MC in general
22:28wlachmight be good to file one
22:28wlachI would except I'm leaving quite shortly
22:28frankwlach: let's handle it tomorrow then
22:28frankwe can coordinate which bugs need to be filed to get to that point
22:30digitaraldwlach: are you expecting?
22:30wlachdigitarald: no I am not pregnant :)
22:31wlach(I'm guessing you meant something else)
22:31digitaraldI used the double meaning ;)
22:31wlachleaving quite shortly => leaving quite shortly for the day
22:31wlachI will be back tomorrow and have no plans for long pto before sept :)
22:31wlach(at which point I will be in Italy for 2 weeks)
22:32digitaraldlate summer is the best time to travel in Italy
22:35wlachmy first time there, looking forward to it
22:36wlachanyway, should run. night all!
14 Jul 2017
No messages
   
Last message: 71 days and 5 hours ago