mozilla :: #datapipeline

14 Jul 2017
01:28franksu: that is an enumerated histogram, not a boolean
01:28franksu: see
01:29franksu: you are looking for HTTP_TRANSACTION_IS_SSL
01:32sufrank: oh my bad, yes, you're right, I meant to say HTTP_TRANSACTION_IS_SSL
01:33subut is it normal for a boolean histogram to have 3 bins? with the 3rd bin always empty?
01:55franksu: yes, that is an extra bin for overflow
01:55sufrank: awesome! that makes sense... and is the convention 0=False 1=True?
01:55franksu: yup! you got it
18:54mreidcame across this recently
18:56joywould the jmsepath for fx_migration_bookmarks_jank_m be histograms.fx_migration_bookmarks_jank_ms ?
18:56joyor rather, payload.histograms.fx_migration_bookmarks_jank_ms
19:03mreidwlach: you were looking at the jmespath stuff at one time, right?
19:03wlachmreid: yup
19:04mreiddo you know how it'd work in joy's case?
19:04wlachI reviewed mdoglio's work and fixed a few things
19:04mreidif I had to guess I&#39;d say payload.histograms.<histogram name>
19:04joymreid: seems sensible
19:04wlachmreid: joy: let me check
19:04joywill try
19:04mreidI&#39;m hoping I don&#39;t need to guess :)
19:04joyif my results are missing
19:04joywould it be becasue that field is not recorded
19:04joyor my jsmepath spec is wrong
19:05wlachmreid: joy: yes I think that should work
19:05wlachI think we should update the &quot;hello world telemetry&quot; example to use the new dataset api
19:09frankwlach: I think there&#39;s a bug for that
19:10frankwlach: it is contained in this one: bug 1373291
19:10firebot NEW, Update Custom Analysis with Spark
19:10frankbut definitely the `select` is something that trips people up all the time
19:11joywlach: in the API, it says payload.simpleMeasurements but i dont see the payload name in my raw json (when i go to about:telemetry and click on raw json)
19:11joynot everything is prefixed with payload, e.g. environment
19:11joyso how would i know when to prefix with payload?
19:12frankjoy: I&#39;m not sure we have that documented anywhere
19:13joyalso if the jmsepath spec matches to no field (coz of typo) will i get an error or Nones ?
19:13wlachI think &quot;payload&quot; matches to the content of the ping
19:15frankthere&#39;s some weirdness with the payload
19:15franksee main_summary:
19:15frankI remember this tripped me up
19:15frankjoy: my best bet says you&#39;ll get None
19:16joyso it should be payload.histograms.
19:16frankjoy: yes, it seems so
19:16frankbut child histograms are where things change
19:16joyso if i say use the Dataset api *without* the select call, and then inspect the returnd pings
19:17joycould i answer all my jmsespec questions :)
19:17joyand not trouble anyone
19:19mreidjoy: yes, you should be able to
19:19mreidif you just get a few example records, you should see the full json structure
19:59joyis top level creationDate
19:59joyprofile creation date to the minute resolution?
20:50joyis there an based on UT?
20:54frankjoy: pretty sure that&#39;s chutten&#39;s stuff on the right
20:54frankwhich is from UT
20:54chuttenOf arewestableyet? Yes
20:54joybut it says no on top?
20:54chuttenThe stuff on the left is socorro normalized by blocklist volume
20:55joy&quot;These numbers are crash data only: they&#39;re not derived from Unified Telemetry.&quot;
20:55chuttenAs for &quot;is there an arewestableyet based on UT?&quot; that&#39;s what is supposed to be
20:58frankis that not your stability dash on arewestableyet, chutten?
20:58chuttenIt is indeed
20:58frankokay, so that big scary orange sign is wrong
20:59chuttenIt&#39;s valid on the LHS, which is what it&#39;s over
20:59frankright, but it is still very confusing
20:59frankgiven saptarshi and I were both confused
21:00chuttenFair enough
21:01chutten(Un)fortunately I don&#39;t have anything to do with
21:02frankchutten: also fair :)
21:03frankI&#39;m going to see if there&#39;s a bug component, because I&#39;m in a bug filing mood today
21:04chuttenfrank: There&#39;s a github
21:05frankI support Socorro :: Webapp isn&#39;t the right component then
21:05joythanks all
21:05chuttenfrank: ^
21:14joyis there a keyed histograms json file somewhere?
21:26frankjoy: they are in histograms.json
21:26frankjoy: they just have &quot;keyed&quot;: &quot;true&quot;
21:26joyfrank: thanks again ...
21:26frankjoy: happy to help :)
21:31ilanafrank, i hear you have an example script for pulling shield experiment data
21:31frankilana: I do, let me dig it up
21:31ilanawoo hoo
21:33frankilana: here&#39;s what I used:
21:33frankthe first part collects SHIELD pings
21:34ilanagreat, thanks!
21:34frankthe second part finds those user&#39;s telemetry data
21:34ilanathat&#39;s the tricky part :)
21:34frankilana: if you can, use main_summary/hbase
21:34frankwe didn&#39;t for that because the fields weren&#39;t there yet
21:34ilanaI see
21:35ilanafrank, this was to append all scalars?
21:35frankbut we&#39;ve added a lot since then, so maybe they are for you?
21:35frankilana: can you clarify?
21:35ilanathe output of this script
21:36ilanagathered everything in the histograms path
21:36ilanaplus some profile info
21:36frankno, this was some specific columns
21:36ilanabasically i&#39;m wondering why the specific fields are enumerated
21:36ilanaoh, ok.
21:36ilanainstead of getting the whole ping
21:36ilanathat&#39;s totally fine
21:36ilanasuper helpful, thank you!
21:36frankilana: no problem, good luck!
21:36ilanafrank: sorry, one more
21:36frankoh no
21:36frankyou&#39;re cut off
21:37ilanathis script joined shield ids with their telemetry ids and output a new fancy file
21:37ilanabut we also have a bucket of data somewhere that feeds the dashboards
21:37ilanadoes that use your script as well?
21:38frankilana: I don&#39;t *think* so
21:38ilanahm, ok
21:38ilanawho would know? sunahsuh?
21:38frankafaik this just fueled a few custom analysis
21:38frankilana: which dashboards?
21:38ilanathe internal shield ones
21:38ilana&quot;experiment viewer&quot;
21:38frankoh right , yeah no that is different
21:39ilanawhere is that data magically hiding
21:39frankilana: in re:dash!
21:39frankilana: table is `experiments`
21:39frankand it has the same schema as `main_summary`
21:39frankbut with experiment_id` and `experiment_branch` columns
21:39sunahsuhilana: we also have the raw pings separated out
21:39ilanaand it&#39;s refueled every day?
21:39frankilana: yup
21:39ilanasunahsuh, what do you mean
21:39ilanathey are all located somewhere i could grab them?
21:41sunahsuhyeah, instead of &quot;Dataset.from_source(&quot;telemetry&quot;)&quot; it&#39;s &quot;Dataset.from_source(&quot;telemetry-cohorts&quot;)&quot;
21:41ilanaokay, and they&#39;re indexed by experiment?
21:41franksunahsuh: we really should make an example notebook for this
21:42ilanafrank: I can do that if you&#39;d like
21:42ilanai have to do it anyway
21:42ilanait seems like pulling from there is a lot easier than having to revamp frank&#39;s script, no?
21:42frankabsolutely yes
21:43ilanaok, fantastic!
21:43frankbut my script is a bit different, it uses the SHIELD pings
21:43ilanathanks tons, everyone
21:43frankwe don&#39;t touch those in the pipeline
21:43frankwell, at least for this experiments stuff so far
21:43ilanait just seems like that join probably sucks
21:43sunahsuhbtw, that dataset includes all pings, not just main pings
21:44frankI&#39;m not totally sure about the relationship between SHIELD pings and our experiment annotations
21:45franksunahsuh: do you know - if a user sends a SHIELD ping, will their main ping also include an experiment annotation?
21:45frankare there cases where the former exists but not the latter?
21:45sunahsuhmy guess would be no, unless shield includes the experiment block
21:46sunahsuher, environment, i mean
21:46sunahsuhso, it&#39;s possibly just main pings in that set
21:47frankhmm, I&#39;m a bit confused, but I&#39;ll confer with you more next week
21:47sunahsuhhuh, nope, i guess not
21:48ilanakeep me in the loop if there&#39;s something i shoudl know :)
21:48sunahsuhi see hsield pings as well as main
21:48sunahsuhso, i wouldn&#39;t worry?
21:49franksunahsuh: I guess I&#39;m thinking more from client-side
21:49frankif I&#39;m a client, and I send a SHIELD ping
21:49frankam I guaranteed that my main ping will contain an experiment annotation for whichever SHIELD studies I sent hte SHIELD ping for?
21:50frankor are there SHIELD studies that are not experiment annotations?
21:50sunahsuhafaik the shield add-on annotates all its experiments
21:50frankwait, aren&#39;t there surveys?
21:51franki.e. if I&#39;m part of a survey, not an experiment, then I&#39;ll send a SHIELD ping, but no experiment annotation
21:51frankI&#39;m going to read more about this next week, have a great weekend everyone :)
21:52sunahsuhyeah, i&#39;d check the shield add-on code :)
15 Jul 2017
No messages
Last message: 71 days and 2 hours ago