mozilla :: #datapipeline

19 Apr 2017
00:44amiyaguchifyi, airflow jobs are failing because of bootstrap errors
01:00robotblakeamiyaguchi: :(
01:03amiyaguchirobotblake: looks like bintray is still causing issues
01:07robotblakeamiyaguchi: I think I know what the issue is
01:07robotblakeGimme a moment
01:09amiyaguchicool, I think it might be affecting atmo too
01:10robotblakeYeah, it would affect both
01:11robotblakeI think it should be fixed now
01:11frankbintray sux
01:17robotblakefrank: How long until those airflow jobs retry?
01:19amiyaguchirobotblake: 30 minute retry delay
01:19robotblakeCool, I'll keep an eye on them then
01:36robotblakeLooks like they're running now
01:48amiyaguchirobotblake: thanks!
13:15mreidsunahsuh: ping
13:15mreidI'm looking at telemetry-airflow #103
13:16mreidI think it's time to factor out a common "run view" script
13:16sunahsuhheh yeah...
13:16mreiddo you have time to tackle that?
13:16mreidor should we punt to a follow up
13:17mreidwe can just stick the class name into the environment
13:18sunahsuhyeah, it's a bit silly (and also, we can add that "run p2h" stuff on all the relevant jobs then)
13:18sunahsuhbut, i am personally going to punt until after i'm done with experiment analysis, which is probably not until some time in may :)
13:22mreidI'll file an issue for follow-up
13:23sunahsuhoh look:
13:23firebotBug 1290140 NEW, Refactor Airflow's telemetry-batch-view runners
13:23sunahsuhmreid: ^
13:23sunahsuhbefore you file another issue
13:24sunahsuhproposal to spend a day in SF tackling 1-point bugs :)
13:24mreidwe should also have one Scala class for running these things
13:29frankwe should have a scoreboard for 1-point bugs sunahsuh
13:29sunahsuhin sf?
13:29sunahsuhor just overall?
13:30frankhaha in sf
13:30* frank is not being realistic
13:30mreidhow 'bout a SF bug-squashin-party metabug?
13:30mreidthen we can track candidate bugs in the meantime
13:31frankmreid: I like that one
13:31sunahsuh <-- do bugzilla titles support emoji?
13:32mreidone way to find out
13:32franksunahsuh: I almost did the gun with the bug but it was a but morbid
13:35mreidmorbid enough? :)
13:35frankalmost there
13:40sunahsuhi&#39;m gonna deploy those airflow changes
13:42mreidsounds good. Looks like nothing is running right now
13:54sunahsuhemojis get stripped out of bugzilla titles :(
13:54firebotBug 1357749 NEW, [meta] Data Platform SF Bug Squash Party
14:01frankaww man
14:01frankput a bug into bugzilla
14:01frank&quot;emojis not displaying properly in titles&quot;
14:03gfritzschewhat is the status of the pre-release longitudinal?
14:03gfritzsche&quot;planned, QX&quot;?
14:04firebotBug 1328659 NEW, Prepare MySQL database for emoji by performing utf8mb4 conversion
14:05gfritzschei found bug 1318709, which seems stuck?
14:05firebot NEW, Create longitudinal dataset with 100% of pre-release data
14:05sunahsuhfrank: ^ actual work chatter :P
14:05gfritzschesorry! :D
14:06frankgfritzsche: it is not going to happen
14:06frankwe&#39;ve been trying to decide what to do about it
14:06gfritzscheok, we need something to fill that gap
14:06frankbasically I&#39;ve put in a PR with the requisite changes, but Spark won&#39;t actually run for the full prerelease longitudinal
14:07frankgfritzsche: yeah, hard to say when it&#39;s going to be available. Unless we put eng resources into fixing spark
14:07gfritzschei see :-/
14:08gfritzsche&quot;Thoughts on this?&quot;, ~3 months ago, on the spark issue
14:08frankha, right
14:09frankgfritzsche: what&#39;s the use case you know about for it?
14:09gfritzschefrank: bug 1356181, which was the current trigger behind the bug 1356232 discussion
14:09firebot ASSIGNED, hsivonen Gather telemetry for isindex usage
14:09firebot NEW Introduce a default_value property for scalars
14:10frankhaha, it&#39;s all coming together now
14:10gfritzschecurrently we can&#39;t do this in re:dash with pre-release data
14:10frankI am now the blocker for both ends of it :(
14:10gfritzscheyeah, trying to untangle this today :)
14:10gfritzschei think he is fine for now, but checking back with him
14:10frankgfritzsche: &quot;can&#39;t do this in re:dash with pre-release data&quot; - what is &quot;this&quot;?
14:11gfritzschefrank: we don&#39;t have opt-in/pre-release data in there anymore, no?
14:11frankgfritzsche: no, we don&#39;t. You mean running a query on that data in STMO
14:11gfritzscheand if we do, there was a sampling issue?
14:11frankgfritzsche: okay, I&#39;m adding all scalars to main_summary
14:12frankthat could be the fix
14:12frankgfritzsche: bug 1353105
14:12gfritzscheisn&#39;t that a little expensive to query?
14:12firebot NEW, Automatically Add All Scalars to main_summary
14:12frankgfritzsche: not bad if you&#39;re just picking one column
14:12frankand choose a sample_id or two and it&#39;s actually fast
14:12gfritzscheah, ok
14:13gfritzschecheers then, that would solve some future issues :)
14:13frankokay, the scalars should be available there sometime next week
14:14gfritzscheto be fair, the conversation for this particular issue went a bit weird... there was no real &quot;how can i solve my problem?&quot; request, instead some random commenting
14:14gfritzschebut the scalar querying seems to need some addressing :)
14:14frankgfritzsche: yeah, it did feel a bit odd
14:16frankgfritzsche: are you going to close 1356232 in favor of 1353105 then?
14:16franks/are you/should we
14:17gfritzschefrank: that sounds good to me
14:17gfritzschewe can file a more concrete, separate bug on the TMO model or just follow up separately
14:17gfritzscheshall i file that one, you close this one?
14:17frankgfritzsche: sounds like a plan :)
14:22franksunahsuh: sadly, the emoji bug moved from p1 to p3 :(
14:35Dexterhey frank, qq: how does adding scalars to the main summary help with bug 1356232? Wouldn&#39;t that require people to still write SQL queries?
14:35firebot WONTFIX, Introduce a default_value property for scalars
14:35frankDexter: yes, you are correct. It&#39;s the interim solution while we eventually build something into TMO. Probably the pseudo-probe I mentioned
14:36Dextercool, ok! I just wanted to make extra-sure about that
14:36DexterFYI, gfritzsche ^
14:36Dexter(also thanks )
14:40frankanytime :)
14:53gfritzschefiled bug 1357771
14:54firebot NEW, Enable probe comparisons to total number of sessions etc.
15:42joyin pyspark sql do i query main_summar or main_summary_v3
15:44mdogliojoy: use with the right dataset prefix
15:45joyso i shouldnt do spark.sql(&quot;select ... from main_summary&quot;) ? returns a df
15:48joymdoglio: true, but if i wanted to do things like spark.sql(&quot;..&quot;) i need to register the &#39;df&#39; as an sql table
15:48joy spark.sql(&quot;select ... from main_summary&quot;) would have sidestepped that and i can query main_summary immediately
15:48mdogliosure why not
15:48joybut therea re two: main_summary and main_summaru_v3
15:48joywhich ought i use?
15:49mdoglio301 to mreid or vitillo ^
15:50mdogliojoy: I *think* you can also use spark.sql specifying the source parquet file/partition
15:55frankjoy: those are both the same
15:55frankmain_summary == main_summary_v3
15:59mdogliojoy: btw you can register the df as a temp table
15:59mdoglioand then use .sql()
16:00frankmdoglio: all tables are available in in sqlContext
16:00frankbecause we are using the centralized metastore
16:00mreidjoy: I think you can access all the same tables as s.tmo these days
16:00mreidas such, please use &quot;main_summary&quot;*
16:01mdogliooh, so we are actually using the hive metastore for good!
16:01frankyup :)
16:01* mdoglio should go on paternity leave more often
16:01firebotBug 1355790 FIXED, remove or rename the &quot;main_summary_20161030&quot; table in hive
16:01mreidwhich I wasn&#39;t sure if it was fixed or not
16:01mreidso frank++ too :)
16:01mreidfrank: did you ever find out where that table came from?
16:02frankmreid: no, I grepped bash_history and all logs on the metastore
16:02frankand nothing came up
16:02frankit&#39;s a phantom
16:02frankI suppose someone could have used beeline from another machine
16:03mreidit was probably something being used for testing
16:03frankdefinitely was, but why did it suddenly reappear?
16:03mreidis it back?
16:04frankno, why did it suddenly reappear when it did
16:04frankit best not be back
16:04frankwho we gonna call?
16:05mreidthere is still a weirdo table in there
16:05* mreid reopens
16:06frankI just saw that
16:07mreidso violent in here today :-/
16:07frankeverything is going wrong
16:08frankmain_summary is returning nothing atm
16:08mreidit&#39;s because &quot;main_summary&quot; means &quot;default.main_summary_20161030_v3&quot; right now
16:08mreidif you select from main_summary_v3 there should be data
16:09mreidI see 0 for main_summary_v3 too
16:10frankwtf is going on, the metastore logs say that should be pointing at s3://telemetry-parquet/main_summary/v3
16:11mreiddata is there
16:11mreidrobotblake: yt?
16:14franktable info looks good?
16:17robotblakeWait, so is it pointing at 20161030 or is it empty?
16:17robotblakeLooks like the last cron failed
16:18robotblakeI&#39;m running it again right now
16:20frankrobotblake: where are you seeing that the cron failed?
16:21mreidrobotblake: &quot;main_summary_v3&quot; is empty (which is new)
16:21mreidI&#39;m not sure if &quot;main_summary&quot; is pointing at main_summary_20161030_v3 or not
16:21robotblakeThe mail spool, I hadn&#39;t looked at the logs in a while, looks like we need to route stderr to stdout to get those into the .log files
16:22robotblakeIt picks what file to use based on most recent uploaded I believe, sound correct frank?
16:22frankyes, that&#39;s right
16:23mreidrobotblake: do you know where the main_summary_20161030_v3 one came from?
16:31robotblakeI don&#39;t find that version of main_summary in the logs
16:33robotblakeAnd I can&#39;t get into the metastore master anymore :|
16:35mreidweird x2
16:36frankI have searched and search in those logs
16:36frankit is not in there
16:36franknot in bash_history
16:36frankI&#39;ve been trying to find hive logs but can&#39;t find anything
16:36mreidspontaneous generation of datasets
16:40frankI like to think that Hive has evolved into a sentient being that thought we needed the dataset
16:42robotblakeApparently we&#39;ve solved disaster recovery for that dataset and didn&#39;t even know it!
16:42frankhaha SLA of 10 9&#39;s if you don&#39;t want it
16:43sunahsuhy&#39;all, i think it might have been me -- the dates match up for a PR I merged into main summary at least
16:43sunahsuhwhich was the one that kicked off that schema change discussion
16:43franksunahsuh: but how would that have updated the metastore?
16:43mreidah ha!
16:44* mreid is relieved
16:44sunahsuhfrank: i was loading it in, with a data from before the schema change and data from after
16:44franksunahsuh: manually?
16:44sunahsuhto see if it would work the way we&#39;d expect
16:44sunahsuhi guess
16:44sunahsuhalthough we&#39;d expect that to show up in bash_history
16:45frankyeah that is super duper weird
16:45frankrobotblake: we need hive logging
16:45frankespecially since we are now loading datasets from other machines using beeline
16:57robotblakebash history is a terrible indicator
16:58robotblakeIf I was logged in at the same time as sunahsuh, and I closed my terminal after her, it&#39;d blow away her history
16:58sunahsuhhuh, TIL
17:01frankyeah, interesting. Either way audit logs should be available
17:24amitHi All, I am trying to run a code on After launching the cluster and trying to run the code , it just says &quot;Kernel Starting , please wait&quot; and nothing happens post that . Would be great if someone can help with this.
17:25frankamit: try 1. Restarting firefox
17:31amitfrank: Did not work. Can I try something else ?
17:32frankamit: yes, but first check in the js console to see if there&#39;s any errors
17:32frankdo you know how to do that?^
17:34amitfrank: I would need your help here
17:41mreidamit: also, please try in chrome
17:45amitmreid: Will try that as well. Thanks
17:54frankharter: ping
18:15frankmreid: amit is having the same issue again, and it didn&#39;t work on chrome either
18:20mreidhow long have you waited for the kernel to respond?
18:27frankmreid: 8889 works, 8888 does not
18:34robotblakeamit: Are you on Windows?
18:41amitrobotblake: I am on mac
18:41robotblakeYou&#39;re the second person this has happened to that had it fixed by using a different port
18:42amithopefully it will keep working
18:44frankrobotblake: It is super weird. I wonder if a local jupyter/ipython config is messing with it?
18:44frank8888 is default
18:44robotblakeHadn&#39;t thought of that, but that&#39;d do it
19:13mreidrobotblake: frank: looks like &#39;main_summary&#39; still points at that 20161030 table :-/
19:13frankrobotblake: did you rerun p2h?
19:13mreidthe main_summary_v3 table is back to working now though
19:14robotblakeIt&#39;s running right now
19:14robotblakeIt takes a while :(
19:16frankhow many hours has it been?
19:16frankwe really should fix this up
19:22robotblakemain_summary_v3 took 10496.5 seconds
19:25robotblakeLooks like main_summary started loading about 10 minutes ago
19:26mreidah ok
19:51frankwhd: I don&#39;t seem to have access to, can you add me?
19:51whdfrank: github user?
19:51frankwhd: fbertsch
19:54whdfrank: I added you to cloudservices-engineers, which I think will give you access
19:54frankwhd: works :) thanks!
21:50joywhats the spark sql way to do something like &quot;where channel like &#39;%foo%&#39; which matches any channel with &#39;foo&#39; in it
21:54amiyaguchijoy: df.where(&quot;channel like &#39;%foo%&#39;&quot;) will work
21:54joyamiyaguchi: thanks
23:09harterfrank: pong - though I imagine that was related to the l10l for pre-release?
23:22amiyaguchiwoo, I have spark writing to a local mino instance
20 Apr 2017
No messages
Last message: 157 days and 17 hours ago