mozilla :: #activity-stream

9 Jan 2017
14:41nanjursula: howdy
14:41ursulahey nanj
14:42nanjursula: lmk once #1926 grabs the r+, i'll take care of the uplift
14:42ursulasounds good
14:42ursulai'll bug jkerim about it
14:43nanjursula: awesome! i'll also do some versioning change here
14:43nanjursula: how was your weekend? i hope you feel better now :)
14:44ursulai'm feeling a little better. this cold weather isn't helping though
14:45nanjyes, this weather is annoying, dandan caught a very bad cold on the weekend
14:46nanji'm surprised that i survived somehow
15:44as-github-botk88hudson: Hey! Someone just assigned you a PR for review:
15:46k88hudsonnanj: uh oh, better stay away
15:46k88hudson(from dandan)
15:48k88hudsondamn, it's like everyone is getting this cold
15:51nanjk88hudson: heh, i will be fine, already battle tested back in hawaii :)
15:51k88hudsonthats good!
15:51k88hudsonmaybe we should go for ramen some time this week just in case
15:51nanjsounds like a plan
15:53* dmose had ramen yesterday. it helped.
15:53dmosejkerim: you mentioned last week that you had written some data scrubbers in python. can you link me to them?
15:54nanjdmose: nice, what's the weather like down there? is it cold?
15:55dmosetoday it's nice, 62' F
15:55dmosewill be the high
15:56dmosethough it's raining all this week
16:01nanjwow, it's quite nice there
16:03nanjwe've gotta some tough days recently, quite a few folks got cold in this warm-inside chill-outside weather
16:03as-github-botursula: Hey! Someone just assigned you a PR for review:
16:04dmosenanj: plenty of folks around here got sick too; i think a lot is just the differential from what one is used to
16:08as-github-botrlr: Hey! Someone just assigned you a PR for review:
17:05ursulananj: patch was merged, uplift can happen. is it possible to test that release with that tag in it before it goes out to see if the pings are going through?
17:05nanjursula: got it
17:06nanjyup, we can install the pre-release addon, and use it a bit to see if the new metrics flow in
17:11ursulasounds good
17:19nanjursula: uplifted, it's building the pre-release version now
17:20nanjursula: so that's the only commit to uplift, correct?
17:21nanjursula: cool
17:21ursulathanks nanj
17:22nanjde nada
17:29nanjursula: the build is done, i am using the pre-release one now
17:29nanjyou might need to re-install the pre-release addon here
17:29ursulaahh just realized actually, we might not see pings for 24h
17:29ursulathe pings only send every 24h
17:30nanjursula: oh, and the 24h isn't configurable?
17:31ursulaerr i don't think so
17:34nanji see. that means we need one more day to verify it
17:35nanjunless we can figure out some other workaround
18:56jkerimtspurway: i cant make it to triage, someone will need to take notes in my place
18:56nchapmanthere's a template at the bottom
18:56tspurwayok jkerim
18:56nchapmanit's probably worth checking to see if it's been kept up to date
18:56tspurwayi cant make it either
18:56tspurwayso k88hudson is running it
18:58k88hudsondmose: what room is everyone in
18:58k88hudsonfor triage
20:08_6a68nanj | emtwo: hi! should we chat in this channel?
20:09emtwo_6a68: yep!
20:09nanj_6a68: yup, let's do this right here
20:09_6a68ok, nanj I think you were asking about the timestamp on the envelope/wrapper vs the timestamp in the inner event/payload
20:10_6a68is that right?
20:10nanj_6a68: yes, i assume that we'll use the inner one, right?
20:10nanjas both timestamps are required
20:11_6a68yup, the inner time is the time of the measured event, while the outer time is the time at which test pilot sends the ping
20:12nanjexactly, so we'll use the inner one here unless it's otherwise specified
20:12_6a68yeah, that way we can correlate GA pings with telemetry/ping-centre pings
20:13nanj_6a68: yeah, makes sense. so my seconed question is, in terms of the "object" filed in the "testpilotEvent
20:14* _6a68 nods
20:14nanjaccording to the schema, it's a string field, do we want some size limit on it?
20:15nanjalso the case for other string fields
20:15_6a68the 'event' and 'object' fields should be relatively short. the idea is that 'object' follows English grammatical usage, where the 'object' is the thing receiving the user's action
20:16_6a68So it might be some kind of identifier for a particular button or link
20:16nanjjust wanted to confirm this, if they're open ended, we can use unlimited string type accordingly
20:16_6a68right, let me see what the existing redshift schema looks like, one sec
20:19nanjwe usually use the limited size types if possible, as ping-centre is a public service, taking arbitrary input might lead to some security risks
20:21_6a68right. I don't see a redshift schema for the new unified object/event format, so it might take me a little bit to investigate that question. how about just make it 255 chars for now? that seems to be the default used in the existing testpilot redshift schema
20:21nanjsounds good
20:23nanjyeah, we usually do 255 chars or like for the potential extensions
20:25nanj_6a68: once you figure out the proper object sizes, could you update the schema so that i can reference it?
20:25_6a68nanj: yeah, I'll follow up as soon as I get more details
20:25_6a68marina: you mentioned an issue with the common schema, would you mind explaining that again?
20:28_6a68er, emtwo ^^
20:28nanj_6a68: awesome! ty!
20:30emtwo_6a68: it's a minor issue but basically the event_type in the common schema here:
20:30emtwois the same as what you've got here:
20:30emtwobecause the common schema assumed one ping contained only one event.
20:31emtwoso basically I'll probably just change how the common schema works and maybe have a common schema for events and a different one thats more like a "header" and it includes a topic and client id
20:31nanjemtwo: i think as we can handle the multiple events payload, we might want to change that assumption
20:31emtwoyes exactly
20:31emtwoso ill make that change and probably I'll ask _6a68 for review to make sure it looks ok
20:32emtwoshould be a very minor change
20:32_6a68emtwo: I wasn't totally clear on the usage of those common schema keys, so I kind of hacked around them in the schema tests. what's the intended use of the event_type field?
20:32nanjjust like the timestamp, the inner can overwrite the the outer one
20:34emtwo_6a68: event_type is basically what you called "event" so for example it could be "click" or "search" and so on
20:34emtwobut I just wanted to make sure it's required in the schema because that's what we would be looking at to auto-generate dashboards
20:35nanj_6a68: the "event_type" in the common field is just like the "event" you've defined for testpilotEvent, we assumed that was a required field in each ping-centre payload
20:35_6a68aha, right
20:35_6a68something to note is that we are reusing the 'testpilottest' topic for all experiments
20:35nanjso that we could auto generate some dashboard on that premise
20:36_6a68right. so if you're planning to just have one table per topic, you'd need to flatten the whole tuple to get uniqueness (a click for experiment A has little to do with a click for experiment B)
20:37nanj_6a68: yup, makes sense. i've also noticed the "test" field, looks like you guys wanted to use it as the experiment identifier :)
20:37emtwonanj: _6a68 perhaps in that case an experiment identifier should be required too?
20:37as-github-botjkerim: Hey! Someone just assigned you a PR for review:
20:38_6a68Yup, I liked the idea of a generic testpilottest ping because it means we don't need to land new ping-centre schemas with each new experiment update, but I think we may have some conflicting assumptions about how the dashboard auto-generator will work
20:44_6a68emtwo | nanj: could you explain a little bit how you're planning to auto-generate dashboards from the common schema fields? maybe I can update the testpilot schemas to work more with the grain of what you have in mind
20:44nanjemtwo: that's right. we need an experiment id for each testpilot test(including itself). _6a68: so is the "test" field in the schema defined for this?
20:45emtwo_6a68: so in theory the dashboards would have some assumptions about what the data looks like so it can, for example, perform queries grouping by those required fields to show number of events per day for a given experiment, let's say
20:45_6a68nanj: careful, there is a 'testpilot' schema just for testpilot itself, and a 'testpilottest' schema used by all the experiments. in the 'testpilottest' case, the 'test' field is the experiment's ID
20:47emtwothat would be the "experiment id" I suppose? ^
20:48emtwoand it's required so that's good :D
20:48_6a68emtwo: yes, the 'test' field is the experiment ID. and what you said about assumptions on the data format makes sense, but maybe we should both write down some examples to ground the discussion a bit?
20:48nanj_6a68: gotcha, those are two different schemas
20:52nanj_6a68: so the "payload" and "variants" are defined as the generic objects in the "testpilottest" schema
20:52emtwo_6a68: I think the schemas you have for testpilot and testpilottest are good. I don't want to change them. I just want to change the common schema so that, for example, you aren't forced to have event_type like here: when you've already got "event"
20:53_6a68emtwo: sure, the question is whether the ping-centre common schema should bend to accommodate test pilot, or maybe test pilot should transform its format to fit ping-centre? I am leaning towards the latter
20:54_6a68emtwo: as an example, check out these object / method pairs for minvid:
20:55nanj_6a68: one more question here, can we define the schema for the "payload" and "variants"? like what you've done for the "testpilotEvent"
20:56_6a68nanj: so, 'payload' and 'variants' are optional and will generally vary per-experiment. I think it would be fair for the auto-dashboard code to ignore them. how does redshift handle denormalized blobs? it would be ideal if individual experiment teams could optionally extend dashboards to use that data, but the structure will vary
20:56nanjbecause redshift isn't a doc-oriented store, it needs to know the table schema beforehand
20:56emtwo_6a68: hmmm I like that for minvid.... In my mind event_type would capture either object or method and then further detail can be optional
20:56_6a68nanj: for instance, 'payload' for minvid contains a bunch of data points about the location of the player window on the screen. and 'variants' is used to report on which of several A/B experiments might be running for a given user at a given time
20:57emtwoI kind of like how minvid is doing it too
20:58_6a68nanj: in that case, maybe the auto-generated dashboards can just treat payload and variants as opaque blobs (for now)? that way we get the basic stuff that generically works for any experiment (like, "button A was clicked at time X"), while still enabling very motivated teams to do more
20:58as-github-botk88hudson: Hey! Someone just assigned you a PR for review:
20:59_6a68nanj: the whole thing will be stringified when it comes down the wire, I'm not sure if you've been planning on _not_ parsing (or parsing, then re-stringifying) parts of the packet. would that work?
21:01_6a68emtwo: I think we're in agreement. it seems like basic dashboards could be auto-constructed from the addon name / event name / object name tuple, and it would be fine with me to change the testpilot code to accommodate ping-centre's schema
21:03nanj_6a68: can we treat them as the blob/text objects? co's i noticed in the schema test, they are actual objects
21:03_6a68emtwo: for instance, if the common schema were extended to have both event_type and event_target, where event_target corresponds to the 'object' key in the testpilottest schema, then it'd be easy to change the testpilot format to accommodate ping-centre
21:03_6a68nanj: yeah, for you to treat them as text objects, you'd need to selectively deserialize (or reserialize) fields
21:04_6a68nanj: to take a step back, the primary data point for experiment-submitted 'testpilottest' events is the experiment name, event, object tuple: "minvid / button A / clicked"
21:05_6a68the idea of the 'payload' is to capture extra data that an experiment might want to measure
21:05nanj_6a68: yup, those fields are fine
21:05_6a68the idea of the 'variants' is the capture extra info about a multivariate or A/B test an expeirment might be running
21:05emtwo_6a68: you mean this one should be "event_target" ?
21:06_6a68emtwo: yup, or whatever term you'd prefer instead of 'event_target' or 'object'
21:07emtwo_6a68: I like event_target, though I imagine it may not make sense for all types of events
21:07emtwobut that's ok it can be empty
21:08_6a68emtwo: fwiw I think event / object came out of the unified ping discussion, but I can't find a good link to a document that defines that convention
21:09emtwo_6a68: oh good to know it's be thought through! we actually also do something similar in activity stream. we've got "event" and "source"
21:09emtwoso yea I am fine with that
21:10nanj_6a68: can we do another quick video chat to clarify that?
21:10_6a68emtwo: I have to follow up with sunah suh about some telemetry redshift changes, I can ask about that unified ping convention as well
21:10nanj_6a68: if you're free now :)
21:10emtwo_6a68: ok great!
21:11_6a68nanj: sure
21:11_6a68in marina's room again?
21:11nanj_6a68: yup, let's borrow her room :)
21:11_6a68ok. emtwo, you are welcome to jump on the call as well :-)
21:12emtwook I will join but I may have to leave in a few min
21:14as-github-botursula: Hey! Someone just assigned you a PR for review:
21:37nanj_6a68: actually, redshift does support the jSON type
21:38nanji'll put this in the github issue as the reference
21:39_6a68nanj: oh, that's interesting! good find
21:40nanjyeah, we haven't tried that yet. It'll be more flexible if we can leverage that
21:47nanjursula: hey, i'm back. so how did it go?
21:48nanjursula: if everything is good to go, i'd like to cut this release today :)
21:49ursulago for it nanj
21:49nanjlet's do this
22:00as-github-botncloudioj: Hey! Someone just assigned you a PR for review:
23:41dmoseanyone around?
10 Jan 2017
No messages
Last message: 221 days and 21 hours ago