mozilla :: #datapipeline

16 May 2017
13:59trinkfyi, we now store the metadata about the duplicates removed during ingestion s3://net-mozaws-prod-us-west-2-pipeline-data/telemetry-duplicates-parquet
13:59mreidchutten: ^^
14:00mreidthis should significantly reduce the "that one client" problem
14:00franktrink, mreid: Did we change dupe window or partition strategy?
14:01trinknot yet
14:01mreidfrank: something > nothing
14:01frankokay cool. Makes sense to start sending what we have over since we're sure it's not giving many false-positives
14:01frankmreid: agreed
14:01trinkthere should be bugs filed
14:02mreidya, we've verified that there's no significant risk of data loss
14:02trinkI can easily expand the window to handle 24+ hours
14:02franktrink: what kind of memory footprint will that have?
14:02frankcompared to the current 4 hour window
14:02trink6x
14:03frankscales linearly, eh :)
14:03trinkyeah, it will be optimized for our topology also
14:04trinkso sets of small partitioned filters
14:05trink'smaller' ;)
14:05trinknot small
14:05frankit's all relative anyways
14:09gfritzschedoes anyone have time to help investigate this: https://bugzilla.mozilla.org/show_bug.cgi?id=1364243#c6
14:09firebotBug 1364243 FIXED, ehsan@mozilla.com We may be blowing up our telemetry ping size due to increased BHR submissions
14:10gfritzschethe budget dashboard looks a bit noisy right now, so i can't tell right away (https://metrics.services.mozilla.com/telemetry-budget-dashboard/)
14:11gfritzschethis is quantum flow related
14:12chuttentrink: mreid: Excellent. When should I start seeing a drop in mainping volumes?
14:15trinkchutten: it went in yesterday before UTC midnight so today's dataset should be 100%
14:16trink100% == as good as it will get for the moment
14:17trinkabout a 30% reduction... there are plans to bring it up to about 80-90%
14:22chuttenNoice
14:31gfritzschemreid: see a few lines above
14:49Dexterrobotblake, kudos, Presto is (much more) usable now :)
14:51robotblake\o/
14:52frankrobotblake++
15:42sunahsuhbleh, i think our S3 version of steps/airflow.sh is out of date with the github version
15:43sunahsuhwhd: i think we need to add that to our deployment flow
15:46sunahsuhoh, nm, there's one in telemetry-test-bucket that we're apparently using in dev
15:46sunahsuhupdating that now
15:50chuttenI will be unable to make Data Club today, having been pulled into an e10s-multi meeting
15:53rvitillogfritzsche: could you redirect feedback for 1328230 to harter please?
16:05frankcan someone post a link to this crash graph?
16:07mreidfrank: https://people-mozilla.org/~sguha/mozilla/crashgraphs/
16:55mreidgfritzsche: I will try to find someone to help out with that BHR-related bug
17:24wlachchutten: I believe it was recorded. probably worth watching. I suspect you would find it relevant to your interests :)
17:25trinksunahsuh: Bug 1353110 any clarification on how you want to spec the dimensions, if you want it to show up in the current dimension files in addition to the experiment dimensions or will you be happy with wygiwyg?
17:25firebothttps://bugzil.la/1353110 NEW, mtrinkala@mozilla.com Land pings with telemetry experiment annotations into new source
17:27trinkthe current dimensions spec looks like this https://github.com/mozilla-services/puppet-config/blob/e0bec9e8880062952dae98c14a6f2a372da55548/pipeline/modules/pipeline/files/schema/schema.telemetry.json
18:15sunahsuhheh, i don't have access to that file :)
18:15sunahsuhtrink:
18:15trinkhttps://irccloud.mozilla.com/pastebin/zBS86RXa/
18:21trinkalso, what type of error handling would you like (say one of the many writes fail) as there is no concept of a partial failure on output as generally it is a 1 to 1 relationship
18:21mreidtrink: sunahsuh: I think we want something like this for experiments for data layout:
18:21mreidhttps://irccloud.mozilla.com/pastebin/RUeivf9j/
18:26sunahsuhyeah, what mreid said -- that's totally sufficient
18:27sunahsuh*most* of these experiments will be pretty small, i think
18:27trinkinteresting, why do we bother with the field restriction and numerous partitions on the standard data?
18:27mreidbecause more garbage is likely to appear in genpop :)
18:28trinkexperiments are also untrusted user data ;)
18:28mreidtrue
18:30trinkyour call, I just needed to know if I a second spec was required or if it was just an extension of the original (like appending the 2 fields)
18:32sunahsuhsorry, what do you mean by second spec/original?
18:33trinkthe two specs above
18:33sunahsuhoh, yeah, i'd probably prefer the second
18:33trinkit is not one or the other it is both
18:34trinkthe first will continue to write the current s3 data
18:34trinkthe second will write the new experiment partitioned data
18:35trinkso there will be 1 + #experiments output
18:39mreidyeah, so existing "telemetry" output won't be affected
18:39mreidnew "experiments" output should use the simplified spec above
18:41trinkyes, (with the implementation detail that it will all happen in the same plugin, encoding once and writing N times)
18:42mreidyep, that makes sense
18:42sunahsuhyes
18:42trinkok, I should be good (testing this afternoon and will have a PR for you tomorrow)
18:43mreid
18:43sunahsuhawesome, thanks trink :)
18:55amiyaguchiharter: I noticed that the daily search rollup failed again last night because of intricacies with gists
18:56amiyaguchiharter: do you think porting over the notebook as is to mozetl would be appropriate, at least for making the rollup code more accessible?
19:28harterLet me reference the case law on the topic :)
19:28harteramiyaguchi: ^
19:29harterIf I remember correctly, I don't have any issues with that, so long as it's annotated properly
19:29frankamiyaguchi: that is pretty much what I did with the tab_spinner job
19:32harteramiyaguchi: any thoughts on https://github.com/mozilla/python_mozetl/issues/34?
19:34amiyaguchiharter: it can probably be fixed by pinning dependencies, it worked for me before
19:34hartercool.
19:37harteramiyaguchi: here's the previous discussion https://github.com/mozilla/mozilla-reports/pull/23
19:38amiyaguchiharter: thanks, do you also want to be added to the airflow alerts for the job?
19:38harteramiyaguchi: yes please
19:39amiyaguchiwill do
19:40amiyaguchifrank: do you have a link to the bug for porting the search roll-up to mozetl?
19:41frankamiyaguchi: I do, let me find it
19:41frankamiyaguchi: bug 1364530
19:41firebothttps://bugzil.la/1364530 NEW, nobody@mozilla.org Migrate Search ETL job to python_mozetl
19:42frankalso, amiyaguchi++ for doing it
19:42amiyaguchia mailbox full of alerts will do that to ya
19:53sunahsuhwhd: any chance we can get an airflow deploy today?
19:53whdwe can
19:54sunahsuhawesome, let me know when you get to it, thanks!
19:55whdI have pressed button 1
19:55whdsee #datatools for irc notifications
19:55sunahsuhoh fab, thanks
19:55whdnp
20:25robotblakeharter: It looks like there may have been a long running EMR cluster of yours that got nuked about an hour ago
20:26harterhow long?
20:26harterrobotblake: ^
20:26robotblakeI'm not sure, just wanted to give you a heads up that if you had a cluster running that you were still using it may have gotten nuked
20:27hartercool, everything seems fine on my end, any need for me to investigate further?
20:27harteri.e. my fault or random fluke?
20:27harterrobotblake: ^
20:28robotblakeWe've got some tooling (called reaper) that nukes neglected instances in dev, you may have gotten an email that it was going to kill stuff first though
20:28robotblakeNothing else you need to do though :)
20:30whdI sent an email to fx-data-platform about reaper
21:27frankhope it doesn't kill the aggregates service
21:27frankI have no idea where that machine lives, I was just given access to it o_0
21:28* frank goes to peek behind the curtain
22:20jgauntfrank: I'm about to try to use the notebook you shared with kamyar today - will it be pointless trying with a 3 machine cluster?
22:26frankjgaunt: yes, that thing is a monster. I used a 30 node cluster
22:26jgauntcould I do the same? if so, for what duration would you advise ?
22:27frankjgaunt: it took roughly 1 hour per day of raw data I processed
22:35jgauntfrank: ty! question - do I have a authentication issue and do you know how to solve? I got this when I tried to store_main_pings()
22:35jgauntCaused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: FBCDD6E670E60E77), S3 Extended Request ID: faR3oY4G4TE8oYrGSMxJ7ZF7IbL8S/bQGLN1E8zJ+RsnbnLbRUcVD0nzWrso71DSNMORRzPW4bc=
22:36frankjgaunt: you don't have access to telemetry-test-bucket, perhaps? are you running on an ATMO machine?
22:37jgauntI can aws s3 ls telemetry-test-bucket/ and see what's there
22:37jgauntyes, ATMO
22:38jgauntor would access also be denied if I asked for a bad date?
22:39jgauntI am going up to 20170516
22:39jgaunt^frank
22:44frankjgaunt: you shouldn't really be getting access denied
22:44frankjgaunt: why don't you just try a different location
22:45franke.g. net-mozaws-prod-us-west-2-pipeline-analysis/jgaunt
22:45jgauntI've already changed the path after getting a more specific error
22:45jgauntbut I will try that one
22:45jgauntno idea how aws permissions work
22:51kamyardo i need to find my s3 bucket id to be able to output my atmo results?
22:52kamyarlike into 's3://telemetry-private-analysis-2/some/path'
23:05jgauntkamyar: if we're working on the same notebook my notes could be relevant;
23:05jgauntI got an error with the existing path
23:06jgauntwhen I tried s3://mozilla-si/user/jgaunt/shield-analyses/etc. I was getting access errors
23:06jgauntso I made a /jgaunt/ folder under telemetry-private-analysis-2/
23:06jgauntand have yet had the script choke on the same error
23:07kamyarjgaunt: ah! I see. I honestly have no idea how to access my s3 bucket. Do I have one already?
23:07jgauntthey are not made in advance, I have been using the web console but I bet the CLI has a way as well
23:07jgauntif you want one quick I can add it
23:09kamyarjgaunt: that'd be great
23:10jgauntkamyar: would you like s3:// telemetry-private-analysis-2/kamyar ? lmk otherwise
23:11kamyarjgaunt: yes, that's good
23:11jgauntkamyar: it's there now
17 May 2017
No messages
   
Last message: 99 days and 4 hours ago