mozilla :: #taskcluster

19 May 2017
00:00garndtthis was more exploratory at this point than anything, not as much about actually switching focus to implement something
00:00garndtI like to know optoins first
00:22KWiersogarndt: finally got your patch merged to m-c
00:22garndtthanks sir
13:03catleehmm...nightly decision tasks failed
13:04catleeah, that's the first signing task it tried to submit
13:05catleemaybe we should have all tasks depend on an initial dummy task first, and then resolve the dummy after we're done submitting the entire task graph
13:10catleeso we don't end up with a partially submitted graph that begins executing
13:22garndthrm, I'm checking that out catlee
13:24garndtI think a scope was missed when introducting the priority support
13:27garndtcatlee: is there a way to retrigger it? I think I fixed the role there
13:27catleegarndt: I'm not sure
13:28catleeI could try retriggering the decision task
13:28garndtI think there is a button in the hook to run it now (I think at least)
13:28catleeis there a way to re-submit the old graph?
13:29catleewe will duplicate a lot of jobs
13:29garndtthere is not a way to resubmitt the previous decision task but not reschedule the duplicate jobs
13:59catleegarndt: can you think of a way to avoid this kind of problem in the future?
14:00garndtI'm working on drafting up a process for change management (which includes notification, review, etc) for things like this. Would you mind reviewing it when I finish it up? we can post it up on mana/wiki
14:01garndtI certainly do not want to repeat previous mistakes
14:02catleewell, I'm thinking on a technical level
14:02catleewas also thinking about releases
14:02catleewe have large complex graphs
14:02catleeand I think ideally nothing starts until the entire graph is submitted
14:02garndtah, well it should be easy enough for all tasks to be dependent on the decision task completing
14:03garndtfor some reason I thought we were doing that, but maybe not
14:03catleehm, that's true
14:04catleelooks like we're not doing that for nightlies
14:05catlee only depends on the docker image task
14:05garndtwe're not doing that anywhere from what I can see, and I think I know why
14:06garndtit's how the dependency relationship is defined.... it's not that Task A defines that it depends on Task B, but rather Task B defines that task A depends on it I think
14:06garndtso what this would look like is the decision task having a dependency list of every task in the graph (a very large list)
14:06garndtwhich almost certainly break things
14:07garndtso what I was thinking is probably not a solution here...
14:09garndtso there are multiple failure modes here, this particular one being a scope issue. I hope we can find a way to address all of them with a nice far as the scope issue I wonder if we can if some scope validation can be done ahead of time...but that would just be a one off solution to a very particular problem
14:09garndtI agree that having partially submitted decision tasks is terrible....something we haven't figured out how to solve completely yet
14:10garndtI would love to work through some ideas of how we can solve this in a nice way
14:14catleeI thought the dependencies went the other direction
14:15catleeso the decision task D defines nightly task N that depends on D
14:18garndthrm, you are right. I was reading it wrong
14:18garndtI think I'm reimagining a previous debate about how that should work
14:18garndtso yea...I don't see why these all can't depend on the decision task
14:35dustincatlee: the graph is rooted at the decision task
14:35dustinso that image task should depend on the decision task
14:36dustinoh, that logic needs to be smarter, and add a dep on the decision task if all of the existing deps are to already-completed tasks
14:36dustinfile a bug? that shouldn't be too hard to add
14:47jhfordok, the sun is a little too nice to be still inside. See you all monday!
14:51dustinhave a good weekend
14:55jmahergarndt: pmoore: grenade: does the win10-gpu type use gpu instances specifically? I am seeing |Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR| in the list; if these are the same gpu instances as we use for win7 we can ask the graphics team to see why this might be blacklisted for win10 specifically
14:56jmaherfor win7, we use c3 and g2 instances in aws; I want to make sure that is the same for win10
14:57grenadejmaher: the top of the build log should contain a line like this: [taskcluster 2017-05-09T18:11:36.988Z] "instance-type": "g2.2xlarge",
14:57pmoorejmaher: gecko-t-win10-64-gpu uses g2.2xlarge
14:57jmaheroh, I see "g2.2xlarge",
14:57grenadeif the log says the instance type was g2.2xlarge, then it has a gpu
14:57jmaherok, thanks!
14:57jmaherso this could be a driver issue in the os; or the hardware isn't supported fully
14:58jmaherlet me get a bug on file; this might fix a lot of webgl;reftest issues:
14:58grenadethe instances have a recent working nvidia driver installed
15:00grenadenote: it was added to the manifests recently, but the base ami has always had the driver installed
15:00jmahergrenade: this is great; is this what is on the win7 instance?
15:10jmahergrenade: pmoore: can you confirm or help me figure out the driver used for win7 in OCC?
15:11jmaheroh, wait, you already did say it was the same
15:11grenadeits the same one
15:11grenadepossibly an earlier version
15:11grenadeit's baked into the ami only
15:11jmaherdo we not use OCC for win7-tc tests?
15:14jmaherok, bug filed:
15:14firebotBug 1366288 NEW, win10 vm in AWS fails webgl tests with: WebGL2 context because of blacklist entry: FEATURE_FAILURE_U
15:47camdEli: Hey buddy, you cool if I merge now? :)
15:48Elicamd: go for it
15:56camdEli: thanks man. :)
16:37bstackgarndt: the one thing I've noticed so far is that jobs that are themselves retriggered, lose the reason field that we were talking about
16:37bstackalthough perhaps we can make all retriggers work like this too and that won't be the case in the future as they'll get bbb reasons
16:39garndtperhaps this is just an edge case that if people know about it (like joel) that we can work around for the sake of getting backfilling going again?
16:39garndtthe retrigger happens through buildbot, and I'm not sure how to adjust things to pass that reason back to TH again
16:39bstackyeah, sounds good enough for now
16:40bstacklet's just do it all through tc anyway
16:40bstackI _think_ that will fix this and it is already an edge case
16:40bstackalthough relying on this reason field seems a bit flaky :/
16:41myki pushed a branch to a repo that taskcluster tests, and the branch's tip commit didn't get annotated with a comment from taskcluster, but taskcluster did end up testing that commit; is it still supposed to annotate such commits with comments?
16:43bstackmyk: it sets commit statuses rather than commenting unless something goes wrong these days I believe
16:43bstackgarndt: ok, let's do this the brittle way now and schedule work soon to do this the right (aka jonas) way
16:43garndtbstack: well you can do it your way of always looking at the decision task and finding the buildername there
16:43garndtI'm not sure if I know what jonas' way is, but cool!
16:44bstackmy way is more work
16:44bstackjonas' way is moving everything to action.json and doing backfill from a push perspective rather than a job perspective
16:45mykbstack: hmm, does github expose those anywhere? i only found out by opening a pull request for the branch, which showed the taskcluster status of the commit; but there isn't any indication in the page showing the commit itself (which is where the comment used to show up)
16:46bstackthey show up on the branch page with the little checkmarks and such
16:46bstackI've asked them to add that info the the commit page itself and they've said "sure, we'll get around to it!"
16:46bstackwhich I assume means some time after pigs fly
16:47bstacko.chameau is actually working on some stuff right now that can probably be updated to allow you to comment on your commits again
16:47bstackmost people didn't want it afaict
16:47bstackif we had a taskcluster dashboard for your repo would you like to use that? Or would the comment be best?
16:49bstackI don't think we've commented in about 6 months now
16:51mykbstack: ah, the branch commits page! i see it now
16:52mykbstack: i actually don't love the comment; it's disruptive and overkill (relative to a small icon indicating status)
16:52bstackok, great :)
16:52bstacksounds like this works ok then
16:52mykbstack: it was just the only way i used to be able to get from a commit to its taskcluster status (without opening a pull request)
16:53mykbstack: yes, it works great, i just needed to look for it in the right place; thanks!
16:53mykbstack: to your question, i would love a taskcluster dashboard for my repo; after i couldn't find a comment on the commit page, i went looking for such a dashboard, clicking around all over taskcluster; and commented here only when i couldn't find one :-)
16:57bstackok, good to know. I think we'd like to get one cooked up eventually
16:58bstackthe code is pretty much all there, just need a frontend at some point
16:58* bstack scans room for any contributors looking for a fun project
17:17dustinbstack: round tuit?
17:17dustinor RFC?
17:17dustinright, round tuits are gone :)
17:18bstackyeah, that's a good idea
17:18bstackI'll try to do that after this whole backfill thing is patched up
17:19bstackthe b in bstack is for backfilling
17:19bstacklittle known fact
17:58davehunt^ on running ./mach taskgraph tasks --json -p parameters.yml for the first time...
17:58davehunt649,331 lines of JSON
17:59bstackhaha, yep
17:59bstackthere's a lot of tasks
18:00* davehunt has been reading docs all day
18:00bstackare the docs pretty useful? We're trying to improve them all of the time
18:05davehuntbstack: I must admit, I found the docs at a little hard to get into, but I backed up to and that helped
18:05bstackah, nice
18:05bstackok, we'll keep that in mind
18:05davehuntI have a small patch of typo fixes that I'll submit once I'm done, too
18:07davehuntalso, more examples in would be useful.. perhaps a walkthrough of adding a task/type.. I'm not done yet, and I've been reading the docs in sequence, rather than starting with the how-tos (that's where I am now)
18:07davehuntoverall, I'm impressed and appreciate the level of documentation
18:11bstackpatches would be awesome
18:12bstackalso just keep notes on what can be better and send it to us. d.ustin is doing a lot of docs update stuff now
18:30dustindavehunt: updates to that
18:33davehuntwhat's the term for taking a Firefox build and repacking it with an addon bundled?
18:33* davehunt is searching on MDN
18:35catleepartner repack
18:35davehuntcatlee: thanks, is there docs on that?
18:36catlee maybe?
18:37catleetooling is in here:
18:37davehuntlatter is 404 for me, guessing no permissions
18:38catleeah, could be
18:39davehuntk, I'll pick this up next week.. have a great weekend all!
18:39catleedavehunt: mkaply knows a lot more about that stuff
18:40bstackgarndt: ok, is up for review. I'm going to lunch and then we can figure out if it makes sense when I'm back!
18:40mkaplydavehunt: Feel free to ping me
18:40davehuntmkaply: awesome, I certainly will! :D
19:39jmahergarndt: just looked over some logs on my hardware loaner and the specific resize failing tests in bug 1326425 run ok on the win10-ix machine
19:39firebot NEW, browser_ext_browserAction_popup_resize.js fails to run on windows 10
19:39jmaherso the problem is not as dire- we just need to figure out what needs to get fixed
19:42jmaherwith little exception I think that resize issue and printer issue will solve all browser-chrome and devtools tests for win10
19:56bstackgarndt: ok, just waiting on tests to run on treeherder patch now before I ask for review, but then I think we're good
19:56bstackshould we think about shipping today or wait till monday?
19:57garndtmy only concern is that there really isn't a good way to test backfilling without a few pushes going through that have SETA optimize things out, right?
19:57garndtmaybe jmaher has opinions
19:58bstackyeah... this is one of the primary reasons I don't like changing anything to do with this stuff
19:58bstackat some point you just have to push it all and hope
19:59garndtthere isn't a way to run `mach taskgraph action` locally to find out if the final graphs that are generated look ok based on a job id for a BB job scheduled by BBB?
20:00bstackoh, I mean I do that stuff
20:00bstackand it looks good
20:00garndtoh ok
20:01bstackat least, afaict
20:01garndtso ti's just the piece connecting it together
20:01bstackthat's the other half of the issue is that I'm not super in-the-know about what is correct and not :p
20:01garndtI don't think there are rules there :) break them all!
20:02garndtso if you tested locally and it appears to be ok, and it's a matter of connected the dots between TH and the action task...hrm, I don't see anything too crazy being changed that could go wrong...
20:02garndtbut hey, I've always been wrong before!
20:02bstackI do find that I wish we formalized the defition of a "task graph" and had a service we could query intelligently rather than download the whole thing and operate on it
20:02bstackit is a big file to download at a coffeeshop
20:03garndtI hope the coffee at least keeps you awake during the transfer!
20:03bstackI've been very wrong with all of this backfill stuff before is the thing
20:03bstackgarndt: do we have a recent example job to test it on?
20:03bstackI want to generate a graph for the review so we can review it
20:04garndthrm, did jmaher link to one in the bug he opened about not being able to backfill?
20:05* bstack peeks
20:06bstackfound one
20:10bstackoh, there aren't any fixes in here for add task
20:10bstackjust backfill
20:10jmaherhey, done with my meeting, can I help find anything?
20:11bstackI think I can fix that
20:11garndtoh right! glad you caught that
20:11garndtbecause add task needs to really add the BBB task
20:11bstackjmaher: I might have something for you to sanity check in a moment
20:12bstackoh and there's retrigger :/
20:12garndtnah, I don't think you need to worry about retrigger...I think it takes the details of that buildbot job and reruns it just how it was
20:12garndtbut that's in buildbot land, and I'm not exactly sure
20:12garndtwe can test it :)
20:12garndtone second
20:14garndthrm.... I don't know
20:15garndtit might not be working
20:15garndtthis probably just follows the same logic that backfills do with pulse_actions/mozci
20:15jmaherI am around for at least an hour, probably 1.5 hours- so happy to sanity check
20:15garndtoh no
20:16garndtit looks like it just took some time
20:16jmaherthis is a good afternoon for me to hack/randomize
20:16garndtthose cpp grey ones are my retriggers
20:17garndtso I dont' think retriggers are an issue
20:18bstackjmaher: is one of the task definitions generated by running the new backfill stuff for a bbb job (the job is
20:18bstackdoes that task make sense for that job?
20:21garndthrm, we should porbably try to retrigger a BBB task to see if that BB jobs shows up in TH...I don't know if we've tried running a BBB task again
20:21* garndt goes to find one
20:21jmaherbstack: is that add new job or backfill?
20:21jmahergarndt: that works
20:22jmaherbstack: did it fill other revisions, or just the one; overall that looks like accurate data for one revision
20:22jmaherbstack: there should be 4 revisions total with jobs scheduled by doing backfill on that
20:22bstackyep, there are five other tasks in the directory
20:22bstackor rather 4
20:22bstackoh, there are 3 others. hmmm
20:23jmaherok, that sounds right
20:23bstackah good
20:23jmaher4 total; 1 you gave me + 3 others
20:25jmahergarndt: I think we can get quite far with a few small root cause fixes for the win10 stuff; there are some stragglers out there- maybe they are intermittent or will be fixed; possibly one or two other fixes to do
20:26jmaherthe webgl stuff seems to be moving along with
20:27garndtthat's great to hear!
20:28jmaherstill, it will be a bit of work to figure out the printer and resize/windowing differences
20:28garndtyea :\
20:28garndtthose seem to be tricky
20:29jmaherpossibly comparing win10-ix vs win10-vm would be a good next step
20:29jmaherI can give instructions for running the tests as I did in win10-ix
20:34bstackochameau: sorry backfill stuff is taking a bit longer than I hoped. Forgot about one part of it. I'll get to your stuff soon!
20:50ochameaubstack: sure, no urgency. I won't get back to it seriously until monday now
20:51dustinmaybe we should call it bstackfilling
21:03garndtI like it
21:04bstack+1 I hate it
21:07myki have taskcluster's github integration configured for mozilla/qbrt, and i tried enabling it for mykmelez/qbrt to test changes before pushing them upstream; but my tasks fail to submit because i don't have a create-task scope for aws-provisioner-v1/win2012r2, and qbrt runs tests on windows:
21:08mykwould it be possible to gain such a scope? i&#39;m ok with a low-priority one, such as queue:create-task:lowest:aws-provisioner-v1/win2012r2
21:08myk(i would also need queue:scheduler-id:taskcluster-github, i believe)
21:22garndtmyk: think I gave that repo the right scopes now
21:27mykgarndt: it works, thanks!
21:28mykgarndt: in the future, if other contributors want similar scopes, should i direct them to this irc channel?
21:29garndthrm, people that have forked this repo and want ci on their fork?
21:38mykgarndt: yes, exactly
21:40garndtso we&#39;re not in a position yet to be offering CI support to too many things outside of mozilla (meaning contributors that fork repos and want CI in a travis like way). I was obvious enough to add it for your fork, but it&#39;s an exception rather than rule right now
21:41garndtso I think it&#39;ll be on a case by case basis if we can do something there, of course always opting to try to do what helps, but there are limits
21:42garndtideally for CI, they should open up a PR on the mozilla repo to get tests run
22:01mykgarndt: understood about not being able to support forks at will; i&#39;m more thinking about core contributors like cvan (who is another mozilla employee, although they might not all be), not people who fork and aren&#39;t contributing back to the core
22:03mykmyk: in any case, i appreciate the access, and i&#39;ll be conservative about suggesting it to anyone else!
22:03garndtyup, we&#39;re here to do what we can, just need to set the expectations early :)
22:04garndtI love seeing people use it!
22:05mykgarndt: out of curiosity, you mentioned not being in a position *yet*; does that mean you expect that to change in the future?
22:05garndtno expectations, hopes really
22:06garndtI think things will be clearer later this year into next once we are done migrating from buildbot to taskcluster and finish supporting the needs of firefox completely
22:06garndtit&#39;s what we&#39;re working towards
22:21mykgarndt: understand, and i&#39;ll plan accordingly; i too look forward to seeing what else taskcluster can do!
22:23garndtsky&#39;s the limit
23:38sfinkis there something special with retriggered jobs wrt pulse? I&#39;m watching pulse messages for completed jobs, and I&#39;m getting the initial job but not the retriggered ones
20 May 2017
No messages
Last message: 96 days and 4 hours ago