mozilla :: #datapipeline

15 May 2017
15:14trinkfyi, the May 01 sprint real-time data platform packages are available: https://people-mozilla.org/~mtrinkala/packages/
16:00* spenrose is face-muted to preserve precious Balkan bandwidth
16:12Dextertrink, good news is that we have a theory about the rood cause of the duplicates
16:12Dexterroot*
16:13Dexter(Firefox being open when shutting down the OS -> 1363345)
16:14trinkyeah if there are only say two, due to our current data partitioning we would probably miss them anyway (even with the a larger window)
16:14trinkmmm, *the larger 24 hour window
16:15Dextermh, I see, good that you mentioned that. Most of them are just two, with a few of them being 3
16:15Dexter(2 as the original + one duplicate)
18:06joyharter: what is the location to the gitbook?
19:00harterjoy: rendered docs: https://mozilla.github.io/firefox-data-docs/
19:00harterjoy: repo: https://github.com/mozilla/firefox-data-docs
19:00harterlet me know if you have a particular need / project in mind!
19:00joyharter: thanks much
19:11robotblakefrank: Got a few?
19:11robotblakeI had a quick question about mobile_clients / android_events
19:11frankrobotblake: I do
19:12robotblakeMy room
19:12frankkk give me 2
19:29trinkmreid: whd: Bug 1365012
19:29firebothttps://bugzil.la/1365012 NEW, nobody@mozilla.org Add direct to parquet output for telemetry.duplicate messages
19:31mreidtrink: you beat me to it
19:32trinkyeah, that has the need cfg too
19:32trink*needed
19:34whdexcellent
19:37mreidcool
19:37mreidadded a quick comment about the date format
19:38trinkyeah, that is just an example whd will also adjust the max_file_age and batch_dir
19:38trinktoo
19:38whdthis is true
19:39whdalthough the date format one I haven't seen
19:39whdI will adjust it per mreid comment
19:39trinkyeah, so we don't have to embed it in a field (or we can convert an existing field to the format we want)
19:40whdthat is useful
19:40mreid+1
19:41whdmreid: about to redeploy stage, can we throw in https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/49 ?
19:47mreidwhd: that'd be nice to get that on to stage
19:47mreidwhd: safe to assume we'll be monitoring for any increase in validation failures on stage?
19:47whdno
19:48mreidheh, ok
19:48mreidthen we should test it first :)
19:48whdI did test it in stage
19:48mreidoh
19:48mreidwait
19:48mreidI thought you were deploying it to stage now?
19:48whdwell
19:48whdyes
19:49whdbut I developed it by hacking the schema on the stage dwl box
19:49mreidah ha
19:49trinkI am running a test on the schema box
19:49whd+1
19:49whdshould be in the vicinity of 0.00072%
19:50whdI only tested that -1 was thrown out and non-negative was not
19:50whdalthough I also got hit by the lovely new doc id de-duplication, so I can confirm that works too ;)
19:50mreidhaha cool
19:53trinkyeah, the error rates look fine
19:53mreid
19:54mreidPR merged
19:54whdk
16 May 2017
No messages
   
Last message: 131 days and 4 hours ago