mozilla :: #datapipeline

14 Jul 2017
01:28franksu: that is an enumerated histogram, not a boolean
01:28franksu: see https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#2509
01:29franksu: you are looking for HTTP_TRANSACTION_IS_SSL
01:32sufrank: oh my bad, yes, you're right, I meant to say HTTP_TRANSACTION_IS_SSL
01:33subut is it normal for a boolean histogram to have 3 bins? with the 3rd bin always empty?
01:55franksu: yes, that is an extra bin for overflow
01:55sufrank: awesome! that makes sense... and is the convention 0=False 1=True?
01:55franksu: yup! you got it
02:11su^_^
18:54mreidcame across this recently http://statistics.zone/
18:56joywould the jmsepath for fx_migration_bookmarks_jank_m be histograms.fx_migration_bookmarks_jank_ms ?
18:56joyor rather, payload.histograms.fx_migration_bookmarks_jank_ms
18:56joy?
19:03mreidwlach: you were looking at the jmespath stuff at one time, right?
19:03wlachmreid: yup
19:04mreiddo you know how it'd work in joy's case?
19:04wlachI reviewed mdoglio's work and fixed a few things
19:04mreidif I had to guess I&#39;d say payload.histograms.<histogram name>
19:04joymreid: seems sensible
19:04wlachmreid: joy: let me check
19:04joywill try
19:04mreidI&#39;m hoping I don&#39;t need to guess :)
19:04joyif my results are missing
19:04joywould it be becasue that field is not recorded
19:04joyor my jsmepath spec is wrong
19:05wlachmreid: joy: yes I think that should work http://python-moztelemetry.readthedocs.io/en/stable/api.html#dataset
19:05wlachI think we should update the &quot;hello world telemetry&quot; example to use the new dataset api
19:06mreid+1
19:09frankwlach: I think there&#39;s a bug for that
19:10frankwlach: it is contained in this one: bug 1373291
19:10firebothttps://bugzil.la/1373291 NEW, nobody@mozilla.org Update Custom Analysis with Spark
19:10frankbut definitely the `select` is something that trips people up all the time
19:11joywlach: in the API, it says payload.simpleMeasurements but i dont see the payload name in my raw json (when i go to about:telemetry and click on raw json)
19:11joynot everything is prefixed with payload, e.g. environment
19:11joyso how would i know when to prefix with payload?
19:12frankjoy: I&#39;m not sure we have that documented anywhere
19:13joyalso if the jmsepath spec matches to no field (coz of typo) will i get an error or Nones ?
19:13wlachI think &quot;payload&quot; matches to the content of the ping
19:15frankthere&#39;s some weirdness with the payload
19:15franksee main_summary: https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L336
19:15frankI remember this tripped me up
19:15frankjoy: my best bet says you&#39;ll get None
19:16joyso it should be payload.histograms.
19:16frankjoy: yes, it seems so
19:16frankbut child histograms are where things change
19:16joyso if i say use the Dataset api *without* the select call, and then inspect the returnd pings
19:17joycould i answer all my jmsespec questions :)
19:17joyand not trouble anyone
19:19mreidjoy: yes, you should be able to
19:19mreidif you just get a few example records, you should see the full json structure
19:59joyis top level creationDate
19:59joyprofile creation date to the minute resolution?
20:50joyis there an http://www.arewestableyet.com/ based on UT?
20:54frankjoy: pretty sure that&#39;s chutten&#39;s stuff on the right
20:54frankwhich is from UT
20:54chuttenOf arewestableyet? Yes
20:54joybut it says no on top?
20:54chuttenThe stuff on the left is socorro normalized by blocklist volume
20:55joy&quot;These numbers are crash data only: they&#39;re not derived from Unified Telemetry.&quot;
20:55chuttenAs for &quot;is there an arewestableyet based on UT?&quot; that&#39;s what telemetry.mozilla.org/crashes is supposed to be
20:58frankis that not your stability dash on arewestableyet, chutten?
20:58chuttenIt is indeed
20:58frankokay, so that big scary orange sign is wrong
20:59chuttenIt&#39;s valid on the LHS, which is what it&#39;s over
20:59frankright, but it is still very confusing
20:59frankgiven saptarshi and I were both confused
21:00chuttenFair enough
21:01chutten(Un)fortunately I don&#39;t have anything to do with arewestableyet.com
21:02frankchutten: also fair :)
21:03frankI&#39;m going to see if there&#39;s a bug component, because I&#39;m in a bug filing mood today
21:04chuttenfrank: There&#39;s a github
21:05frankI support Socorro :: Webapp isn&#39;t the right component then
21:05joythanks all
21:05franksuppose*
21:05chuttenhttps://github.com/mozilla/magdalena/tree/master/static/dashboard
21:05chuttenfrank: ^
21:14joyis there a keyed histograms json file somewhere?
21:26frankjoy: they are in histograms.json
21:26frankjoy: they just have &quot;keyed&quot;: &quot;true&quot;
21:26joyaah
21:26joyfrank: thanks again ...
21:26frankjoy: happy to help :)
21:31ilanafrank, i hear you have an example script for pulling shield experiment data
21:31frankilana: I do, let me dig it up
21:31ilanawoo hoo
21:33frankilana: here&#39;s what I used: https://gist.github.com/fbertsch/39dad170978669251184908f3a3ce051
21:33frankthe first part collects SHIELD pings
21:34ilanagreat, thanks!
21:34frankthe second part finds those user&#39;s telemetry data
21:34ilanathat&#39;s the tricky part :)
21:34frankilana: if you can, use main_summary/hbase
21:34frankwe didn&#39;t for that because the fields weren&#39;t there yet
21:34ilanaI see
21:34ilanaok
21:35ilanafrank, this was to append all scalars?
21:35frankbut we&#39;ve added a lot since then, so maybe they are for you?
21:35frankilana: can you clarify?
21:35ilanathe output of this script
21:36ilanagathered everything in the histograms path
21:36ilanas
21:36ilanaetc
21:36ilanaplus some profile info
21:36frankno, this was some specific columns
21:36ilanabasically i&#39;m wondering why the specific fields are enumerated
21:36ilanaoh, ok.
21:36ilanainstead of getting the whole ping
21:36ilanathat&#39;s totally fine
21:36ilanasuper helpful, thank you!
21:36frankilana: no problem, good luck!
21:36ilanafrank: sorry, one more
21:36ilanaq
21:36frankoh no
21:36frankyou&#39;re cut off
21:37ilanathis script joined shield ids with their telemetry ids and output a new fancy file
21:37frankyup
21:37ilanabut we also have a bucket of data somewhere that feeds the dashboards
21:37ilanadoes that use your script as well?
21:38frankilana: I don&#39;t *think* so
21:38ilanahm, ok
21:38ilanawho would know? sunahsuh?
21:38frankafaik this just fueled a few custom analysis
21:38ilanasure
21:38frankilana: which dashboards?
21:38ilanathe internal shield ones
21:38ilana&quot;experiment viewer&quot;
21:38ilanahttps://moz-experiments-viewer.herokuapp.com/?ds=46&metrics=ALL&next=%2F%3Fds%3D46%26metrics%3DALL%26next%3D%252F%253Fds%253D11%2526metrics%253DALL%2526pop%253DALL%2526scale%253Dlinear%2526showOutliers%253Dfalse%26pop%3Dcontrol%252Cscreenshots-enabled%26scale%3Dlinear%26showOutliers%3Dfalse&pop=control%2Cscreenshots-enabled&scale=linear&showOutliers=false
21:38frankoh right , yeah no that is different
21:38ilanaoh
21:39ilanawhere is that data magically hiding
21:39frankilana: in re:dash!
21:39frankilana: table is `experiments`
21:39ilanaWOAH
21:39frankand it has the same schema as `main_summary`
21:39frankbut with experiment_id` and `experiment_branch` columns
21:39sunahsuhilana: we also have the raw pings separated out
21:39ilanaand it&#39;s refueled every day?
21:39frankilana: yup
21:39ilanasunahsuh, what do you mean
21:39ilanathey are all located somewhere i could grab them?
21:40ilanaraw
21:40frankyup
21:40frankDataset.from_source(&quot;telemetry-cohorts&quot;)
21:41sunahsuhyeah, instead of &quot;Dataset.from_source(&quot;telemetry&quot;)&quot; it&#39;s &quot;Dataset.from_source(&quot;telemetry-cohorts&quot;)&quot;
21:41ilanaWOW
21:41ilanaokay, and they&#39;re indexed by experiment?
21:41sunahsuhyp
21:41sunahsuh(yep
21:41sunahsuh*yep
21:41franksunahsuh: we really should make an example notebook for this
21:42ilanafrank: I can do that if you&#39;d like
21:42frankilana++
21:42ilanai have to do it anyway
21:42sunahsuhyesplz
21:42ilanait seems like pulling from there is a lot easier than having to revamp frank&#39;s script, no?
21:42frankabsolutely yes
21:43ilanaok, fantastic!
21:43frankbut my script is a bit different, it uses the SHIELD pings
21:43ilanathanks tons, everyone
21:43frankwe don&#39;t touch those in the pipeline
21:43frankwell, at least for this experiments stuff so far
21:43ilanasure
21:43ilanait just seems like that join probably sucks
21:43sunahsuhbtw, that dataset includes all pings, not just main pings
21:43ilanaperfect
21:44frankI&#39;m not totally sure about the relationship between SHIELD pings and our experiment annotations
21:45franksunahsuh: do you know - if a user sends a SHIELD ping, will their main ping also include an experiment annotation?
21:45frankare there cases where the former exists but not the latter?
21:45sunahsuhmy guess would be no, unless shield includes the experiment block
21:46sunahsuher, environment, i mean
21:46sunahsuhso, it&#39;s possibly just main pings in that set
21:47frankhmm, I&#39;m a bit confused, but I&#39;ll confer with you more next week
21:47sunahsuhhuh, nope, i guess not
21:48ilanakeep me in the loop if there&#39;s something i shoudl know :)
21:48sunahsuhi see hsield pings as well as main
21:48ilanagreat
21:48sunahsuhso, i wouldn&#39;t worry?
21:49franksunahsuh: I guess I&#39;m thinking more from client-side
21:49frankif I&#39;m a client, and I send a SHIELD ping
21:49frankam I guaranteed that my main ping will contain an experiment annotation for whichever SHIELD studies I sent hte SHIELD ping for?
21:50frankor are there SHIELD studies that are not experiment annotations?
21:50sunahsuhafaik the shield add-on annotates all its experiments
21:50frankwait, aren&#39;t there surveys?
21:51franki.e. if I&#39;m part of a survey, not an experiment, then I&#39;ll send a SHIELD ping, but no experiment annotation
21:51frankI&#39;m going to read more about this next week, have a great weekend everyone :)
21:52sunahsuhyeah, i&#39;d check the shield add-on code :)
21:52sunahsuhlater!
15 Jul 2017
No messages
   
Last message: 7 days and 13 hours ago