mozilla :: #releng

17 May 2017
00:02arrRyanVM: are there a few instances I can terminate?
00:04arrKWierso: ^^
00:04arrafter another go round the logic error carousel, we think we have something new to test, but we're not spinning up more instances
00:05RyanVMi'm still terminating as I hit them
00:05RyanVMgod these windows l10n jobs take forever
00:05arrRyanVM: right looking to see if there are any that aren't doing anything that I can kill
00:06RyanVMhard for me to tell
00:06RyanVMhttps://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=build&type=b-2008-spot
00:06RyanVMbeyond what's there, which isn't entirely real-time IIRC
00:07arrRyanVM: how many jobs do you have running right now?
00:07arrhrm, there's a whole lot there under 5m
00:08RyanVM~30 or so across different branches
00:08arrRyanVM: or, conversely, want to try landing some more stuff?
00:08arrKWierso: was talking about opening up autoland
00:09RyanVMI can merge inbound and autoland around
00:09arrmy concern is that if this doesn't work, we burn a bunch
00:09RyanVMthat'll spawn more jobs without opening the floodgates
00:09arrokay
00:11RyanVMpushed, we'll see what happens
00:11RyanVMarr: I take it you'll want to know the instance # of any new failures?
00:11arryes, please
00:11arrfirst we'll see if they even spin up and take jobs
00:12RyanVMbiab
00:17arrRyanVM: thanks, that was enough to spin up some new instances
00:23arrRyanVM: well, good news, it looks like these are spinning up and have the files
00:24arrRyanVM: so let us know if anything craps out
00:24RyanVMsweeet
00:24RyanVMgoing back and forth at the moment, but nothign burning so far
00:25RyanVMlots of windows l10n jobs finally finishing too
00:25RyanVMnthomas: i assume you're on point for re-enabling windows nightly updates at some point - i'll let you know when they're all finished
00:28* arr wonders how we have an 8 hour backlog with the trees closed
00:31arrer, and it just jumped an hour
00:33KWiersoarr: try's got a handful
00:33arrKWierso: a handful of?
00:33arrwe're only testing b-2008 not y-2008 right now
00:34arrif we don't see failures on b-2008, we'll also roll out y-2008
00:34arror would you prefer we just try it now?
00:34KWiersothere are talos jobs on try that have been waiting for 9 hours, 45 minutes
00:35arrKWierso: ah, gottcha
00:35arrKWierso: weird, since we haven't touched try at all
00:35arrdo we just have a high try load?
00:36KWiersonot high at all, afaict
00:36arrthere's only 46 instances up
00:36KWiersobut the talos jobs in https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d4f70b1f14ac09bb935f2ad545f954bc44aeba9&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&group_state=expanded&selectedJob=99462909 have apparently been running for 500+ minutes
00:36arrtalos doesn't run in AWS
00:37KWiersoafaict, these are what are giving you the 8/9 hour backlog
00:38arrKWierso: it's not expected to have jobs running for 500+ minutes, is it?
00:38arrthat seems... broken
00:38KWiersoyeah, I'm not sure what happened there
00:38arr(FYI, we haven't touched anything talos related)
00:38RyanVMarr: no burning build jobs so far :)
00:38KWiersomaybe we killed some machine and whatever tracks the jobs lost track of them?
00:38arrI remember philor saying he was rebooting a lot of something... don't remember if that was t-w732-ix, though
00:43RyanVMit was\
00:47arrRyanVM: are we confident enough to open some of the trees?
00:48RyanVMhah, right as Wes takes off for the day
00:48RyanVMyolo, let's open autoland
00:48arr\m/
00:49KWierso|afkTomcat|afk: I am so sorry :)
00:51RyanVMit's open
00:52RyanVMseeing a lot of burning Windows tc(S) jobs
00:53RyanVMnot this issue, of course
00:53RyanVMbut concerning nonetheless
00:53RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99544866&repo=mozilla-inbound https://treeherder.mozilla.org/logviewer.html#?job_id=99544888&repo=autoland https://treeherder.mozilla.org/logviewer.html#?job_id=99544887&repo=autoland
00:54RyanVMarr: we'll let the autoland queue drain for a bit and see where things are at, then we can make a decision on inbound
00:54arrRyanVM: sounds like a plan!
01:06RyanVMarr: i assume you're seeing a crap ton of new instances spinning up
01:07RyanVM12 pushes to autoland so far @ 4 builds a piece
01:10travis-cibuild-puppet#1350 (master - a19e86d : Mark Cornmesser): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/233057204)
01:20arrRyanVM: yeah, we've kicked off quite a few
01:22philorrebooting t-w732-ix was about bug 1365008, pending was about orphans from shutting off non-e10s talos
01:23RyanVMarr: i want to go at least another 15-20min before considering reopening anything else
01:23RyanVMsaw some windows builds take as long as 35min to die
01:23arrRyanVM: wfm
01:33RyanVMgood lord, autoland queue is still draining
01:35philoryeah, but at least we're totally incapable of dealing with it reopening after a closure, instead of only partially incapable
01:35philorbe embarrassing if TC could deal but buildbot couldn't, or the reverse
01:40arrnthomas: neither Q nor markco have write access to build-cloud-tools... if something goes catastrophically wrong, can you help them land or back stuff out?
01:40markcowe are regenerated all golden windows AMIs this evening. new and improved. If there is any issues we will rollback.
01:40arr(nthomas: also, can we get them access to push?)
01:41RyanVMi'm not seeing any obvious reasons to keep inbound closed at this point
01:42arrRyanVM: \o/
01:42philorbacklog of 58 windows build jobs?
01:43philordunno how big the taskcluster backlog is, but it's a half hour long
01:43RyanVMphilor: I'm not in a particular rush either if you want to wait
01:44philorRyanVM: nah, screw it, nobody pushes there anyway, we can look good by opening it without taking much risk
01:44RyanVMdone
01:45philortc's catching up, despite fun things like 45 minute backlogs of android tests
01:45RyanVMsurprisingly little build bustage on autoland so far, though I suppose I just jinxed that
01:45arrRyanVM: you and your big mouth :D
01:45RyanVMalways getting me in trouble!
01:46philormaybe our third system will be cunning enough to know that when it has spun down everything, and it sees demand for a couple of something, it should start a couple hundred rather than a couple
01:46RyanVMphilor: who do I ping about updating reftest-stylo expectations?
01:46RyanVMmbrubeck fixed one of them, but I see others
01:47RyanVMxidorn this time of day?
01:47xidornRyanVM: ?
01:48nthomasarr: Im around, will have a look at access now
01:48RyanVMxidorn: sorry, this is a little outside my usual area of expertise - are you the person to ping about update stylo expectations on autoland?
01:48xidornRyanVM: yep
01:48RyanVMgreat, in that case https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6a25c340d83401d253b04b9c15c40ed5039c4132&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=pending&filter-resultStatus=running
01:48arrnthomas++
01:49xidornRyanVM: ok, will fix
01:49RyanVMthx
01:50arrI'm going to go turn into a pumpkin now
01:50xidornRyanVM: in general, me and heycam cover asia/pacific timezone, manishearth and bholley cover the u.s. and emilio covers europe timezone
01:51nthomasq should be good already
01:51RyanVMgood to know, thanks
01:51xidornRyanVM: but since this is usually simple, so I (and also probably others) may have extended coverage for that :)
01:51nthomaswed need to get an admin to add mark
01:52markconthomas: should i open a bug?
01:52nthomassounds like a good plan
01:53xidornRyanVM: it seems there are still tests running, and new unexpected result may show up. I'd like to hold on and wait the tests to finish first
01:53RyanVMok
01:54* xidorn wonders why autoland is so much behind at the moment
01:54nthomasarr++ for the great comment on the bug
01:55nthomasand everyone who worked on it today
01:55RyanVMxidorn: because it's catching up to being closed all day
02:27RyanVMnthomas: windows l10n repacks are done
02:27RyanVMbunch of funsize jobs still going
02:28nthomashmm, wonder how long that takes. better to give people a partial if you can
02:28* nthomas needs a waterfall display
02:30nthomasRyanVM: once theyre done am I clear to update balrog to point to that ?
02:30RyanVMyes
02:31nthomasok, Ill keep an eye out
03:14* nthomas feeds the hamsters
03:39nthomasRyanVM: updates are back on
03:39RyanVMsweet :)
03:40nthomastheres something odd with nl on win64 2017-05-12, and possibly some other locales that follow that in the chunk, but meh
03:40nthomaspeople should have updated past that already
05:17gerard-majaxstill no new build ? :(
06:24nthomasif youre not getting an update then please flip app.update.log to true and catch the aus5.mozilla.org request
06:24nthomasbbl
07:05gerard-majaxhm nice
07:05gerard-majaxmp4 playback on twitter broken due to CSP: https://pastebin.mozilla.org/9021961
07:06pascalcgerard-majax: Not getting an update either, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1365508
07:07gerard-majaxpascalc, check my pastebin, it's fun :)
07:07pascalcI saw it
07:13jcristaurule 590 in balrog i think
07:13jcristaukeeps linux at 20170514100406
07:13gerard-majaxpascalc, https://bugzilla.mozilla.org/show_bug.cgi?id=1365512
07:14pascalcgerard-majax, wfm but let's move the discussion to the #nightly channel :)
07:18jcristaucommented on pascalc's bug. maybe nthomas is still around?
07:38nthomashrm
07:40nthomasI think youre right, just wondering wth we have a linux specific rule there
07:40nthomasoh, because Linux comes from tc and has a different buildid for the previous build
07:41nthomasjust removed it
07:43jcristauthanks!
07:44nthomassorry about that
08:12gerard-majaxnthomas, it's downloading new update now
08:26pascalcI got the new update and another bug that was reported in the nightly channel yesterday is fixed, it probably was the same cause as bug 1364878
09:14travis-cibuild-buildbot-configs#2621 (master - d885525 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/233157701)
09:23travis-cibuild-buildbot-configs#2622 (production - 3513a62 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/233160314)
10:50arrhm, is callek still out on leave?
10:52mtabarahe returns on the 22nd
10:52arrmtabara: you know who would have knowledge of slaveapi-dev in his absence?
10:53mtabarano idea. maybe ping catl.ee for a good redirect?
12:43jlorenzobhearsum: hi! fyi, I left the details on how to gradually test APK publishing on Google Play in this PR https://github.com/mozilla-releng/pushapkscript/pull/19
12:43bhearsumthank you!!
12:57203A9AVIVbuild-puppet#1351 (master - c8a0f14 : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/233224285)
12:57557A7JHD0build-puppet#1352 (production - 7f6fba2 : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/233224289)
13:02catleeRyanVM: yay nightly rendering problems
13:02AutomatedTesteraobreja|buildduty: I didnt raise a bug about my issue with win8, are the new AMIs rolled out?
13:02bhearsumjust let me know if we need to freeze updates
13:03AutomatedTesterI tried a new build this morning on that try push and got the same issue
13:03AutomatedTestershould I raise a bug now?
13:04catleeAutomatedTester: win8?
13:04AutomatedTestercatlee: yea, getting an error that it cant remove a file (which isnt part my patch)
13:04AutomatedTestercatlee: https://treeherder.mozilla.org/#/jobs?repo=try&revision=73819d0d2a85
13:05AutomatedTestercatlee: pmoore suggested I come here
13:06pmooreit is true, i did :)
13:07AutomatedTesterpmoore: while you're here... I think the build succeeded by mozharness failed
13:07AutomatedTester05:44:28 FATAL - superExc=rerr
13:07AutomatedTester05:44:28 FATAL - TaskclusterRestFailure: EF7FLr6pTrixoS0rR60mnA does not correspond to a task that exists.
13:07AutomatedTester05:44:28 FATAL - Are you sure this task exists?
13:10RyanVMcatlee: define "rendering problems" :)
13:12catleeRyanVM: the url bar goes grey when you mouse over it
13:12catleemaybe that's on purpose
13:12RyanVMthat's a known issue
13:12RyanVMone sec
13:12catleeok
13:13RyanVMfallout from bug 1352366
13:13RyanVMbug 1365275 covers a specific issue w/ LWT, but I know Linux had grey location bar issues too from it
13:14catleegreat, thanks
13:15catleehad me worried for a sec
13:15RyanVMcatlee: looks like the patch in bug 1365275 covers the Linux issue too
13:15* catlee filed https://bugzilla.mozilla.org/show_bug.cgi?id=1363853 a few days ago, and has been testing patches on try
13:15RyanVMnice :)
13:16pmooreAutomatedTester: my best guess is this is an issue with the mozharness step that publishes taskcluster artifacts
13:16pmoorefrom https://archive.mozilla.org/pub/firefox/try-builds/dburns@mozilla.com-73819d0d2a8501da57b90b7f145c100597f3c627/try-win64-debug/try-win64-debug-bm76-try1-build778.txt.gz i see
13:16pmoorehttps://irccloud.mozilla.com/pastebin/wnxjUBwC/
13:16pmooreso it looks like mozharness has generated this taskId
13:16pmoorehowever, later it complains that it doesn't exist (which it doesn't)
13:17pmoorebut catlee might know how this is meant to work
13:18pmoorehttps://irccloud.mozilla.com/pastebin/g1zRGmVQ/
13:19pmooreAutomatedTester: looks like this comes from here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/mozharness.py#270
13:19wcostaaselagea|buildduty: ping
13:20catleepmoore: yes, it's supposed to create / upload artifacts to that task
13:21catleecan you tell if it succeeded in creating the task?
13:21catleeI wonder if it lost its TC creds
13:22pmoorecatlee: ah, as in, it might have used temporary creds, that expired?
13:22catleepmoore: no, as it the creds might have disappeared from disk
13:22aselagea|builddutywcosta: hi there@
13:22garndtthere is a bug that wcosta has patched yesterday related to BBB jobs and uploading of artifacts. but this is windows 8, so not bbb right?
13:22catleewe've been having problems with other stuff going away
13:22pmooregarndt: this is BBB i believe - this is win8
13:22catleethat's not bbb
13:23catleeyet
13:23catleealso, not win8
13:23catleewin2k8
13:23wcostaaselagea|buildduty: regarding bug 1364072, I am confused, I just tested it in a loaner machine and it works. The first version of this patch had exactly this bug, but the second version (supposedly) fixes it
13:24garndtright sorry, win2k8
13:24wcostawhere's firebot, my friend?
13:24pmoorethere he is!!!
13:24wcostathere you are
13:24catleebug 1364072
13:24wcostafirebot: bug 1364072
13:24firebothttps://bugzil.la/1364072 REOPENED, wcosta@mozilla.com Install mercurial robustcheckout in the build machines
13:24firebothttps://bugzil.la/1364072 REOPENED, wcosta@mozilla.com Install mercurial robustcheckout in the build machines
13:25wcostafirebot: good boy
13:25* firebot smiles
13:25pmoorefirebot: naughty boy! where were you?
13:25firebotpmoore: Sorry, I've no idea what 'naughty boy! where were you' might be.
13:25pmooretsssk
13:25aselagea|buildduty:))
13:26AutomatedTesterpmoore: catlee what do you want me to do?
13:26aselagea|builddutywcosta: have you tested your patch on a builder or a tester?
13:27wcostaaselagea|buildduty: on a builder
13:27aselagea|builddutyI backed that patch out yesterday since arr mentioned puppet errors
13:28aselagea|builddutywhen I looked at the e-mails, I noticed that puppet was indeed failing on the yosemite pool
13:28aselagea|builddutyso on testers
13:28wcostaaselagea|buildduty: wasn't due to bug 1365428?
13:28firebothttps://bugzil.la/1365428 ASSIGNED, dcrisan@mozilla.com virtualenv.pp missing case for Darwin
13:28pmoorecatlee: who is the expert on mozharness steps that publish taskcluster artifacts? can we point AutomatedTester at someone that knows this stuff?
13:28catleehm
13:28catleeyou sure this wasn't triggered from bbb?
13:28catleepmoore: wcosta is now :)
13:28aselagea|builddutywcosta: the error seems to be related to robustcheckout, so your patch
13:29catleeI see '04:38:54 INFO - "taskId": "OWSPT7f0SLqPNXzSQqQg5A", ' and '04:38:54 INFO - "upload_to_task_id": "IxUqPRU1R2u1AAXFia11Bg", '
13:29catleewhere did those come from...
13:30* wcosta <wtf meme here>
13:30pmooreAutomatedTester: i think wcosta / catlee will be able to help you get to the bottom of this
13:30wcostaaselagea|buildduty: the bustage was in builder machines, right?
13:31* pmoore steps away from the cookie jar
13:31catleeAutomatedTester: did you do anything to trigger the 64-bit builds?
13:31catleeI see -p win32 in your try syntax
13:31AutomatedTestercatlee: I used Treeherder to add jobs
13:32catleeAutomatedTester: ok, I think that&#39;s probably the source of the problem
13:32wcostaaselagea|buildduty: could you please re-image bld-lion-r5-090.build.releng.scl3.mozilla.com, I want to test it in a fresh machine to see if I can reproduce the bug
13:32aselagea|builddutywcosta: the puppet errors coming from your patch are for the tester machines
13:32aselagea|builddutynote that we use the same puppet environment for all the machines managed by puppet
13:33wcostaaselagea|buildduty: ahh, let me try it in tester machine
13:33aselagea|builddutyso even if you got no errors when testing on your builder, that could still result in errors on other platforms
13:33aselagea|builddutywcosta: yup, that would be the next step
13:33catleeAutomatedTester: you could push again with -p win64 for now
13:33AutomatedTestercatlee: ok
13:33catleeI don&#39;t know how TH triggers these
13:34AutomatedTesterneither do I :D
13:38travis-cibuild-puppet#1353 (master - da42a70 : Dragos Crisan): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/233238906)
13:43garndtmy guess is that TH publishes a pulse message saying that the job should run for that push, and something is listening for this that can communicate this to buildbot...such as pulse_actions perhaps
13:43catleewhere/why does it come up with the task ids?
13:44catleethe win32 builds dont have those properties set
14:06wcostaaselagea|buildduty: just tested in a tester machine and it worked :/
14:08aselagea|builddutywcosta: sounds good
14:09wcostaaselagea|buildduty: no idea why it doesn&#39;t work on prod, however
14:11aselagea|builddutyarr: do you see any reason not to re-land wcosta&#39;s patch that I backed out yesterday?
14:12arrasarih: yes, because it breaks everything
14:12arrplease do not land
14:12arrthere were tons of puppet error messages and gps said bundleclone broke a bunch of stuff
14:13wcostaarr: my patch leaves bundleclone untouched, it just adds robustcheckout
14:13arrsorry, robustcheckout
14:13arryes, that&#39;s what broke a bunch of things
14:14arrwcosta: go take a look at the puppet error messages for robustcheckout
14:18arrwcosta: https://groups.google.com/a/mozilla.com/forum/#!searchin/releng-puppet-mail/robustcheckout%7Csort:date
14:19travis-cibuild-puppet#1354 (production - e5d1937 : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/233253776)
14:21wcostaarr: just requested to join the group to see the link
14:23arrwcosta: added
14:24arrwcosta: in addition to the puppet errors, gps also said that it broke other things, so I&#39;d consult with him before trying to land, even if you fix the puppet errors
14:26wcostagps: what errors robustcheckout introduced to prod? besides puppet errors?
15:00bhearsumspacurar|buildduty: do you think we&#39;ll be able to get devedition tests on mozilla-beta by early next week?
15:02spacurar|builddutybhearsum: I think devedition tests for mac and win are done. Working on linux tests now. I would say they will be done until then
15:03bhearsumthat&#39;s awesome
15:03bhearsumif you have time, it might be good to enable the mac and win tests on mozilla-beta even before the linux tests are done on jamun
15:03philorwin isn&#39;t done, since it&#39;s running most tests three times
15:04philors/win/win7/
15:04spacurar|builddutybhearsum: I don&#39;t think there is something else to be done on the mac/win bug. If there still is please tell me so I can be aware of it!
15:05bhearsumspacurar|buildduty: it sounds like there&#39;s extra triggering of tests on win7 to deal with
15:05bhearsumbut we also need them enabled on mozilla-beta -- jamun is just our test branch for DevEdition stuff
15:06philorsomewhere, there&#39;s horribly ugly code that splits suites out into ones that run on g-w732-spot and ones that run on t-w732-spot and ones that run on t-w732-ix, that you must need to copy-paste, or make sure devedition uses (or, maybe, just make sure jamun uses and beta already does, dunno)
15:07spacurar|builddutyby extra triggering you mean that there are still duplicates?
15:08philorat least as of last night there were, tip of jamun hasn&#39;t finished building win yet
15:10bhearsumi see https://hg.mozilla.org/build/buildbot-configs/rev/04fefd582166 landed recently, maybe that fixed it
15:35tjrIs the winx64 buildbot bug still causing fallout? I couldn&#39;t get jobs to work last night (they timed out) and this morning I&#39;m seeing &quot;20 mins overdue, typically takes ~12 mins&quot;
15:45philortjr: I don&#39;t think arr thought the bug was about try, so if it was then probably it isn&#39;t fixed there
15:45arrit&#39;s now fixed in try as well... wouldn&#39;t have been fixed till the new amis rolled out this morning
15:45philorthose are a completely separate pool of instances from the non-try builders
15:45arrif you&#39;re still having problems today, please comment
15:46arrtjr: and the links you gave weren&#39;t the same issue, as far as I could tell
15:47philorspacurar|buildduty: much better looking set of win7 jobs on jamun now, other than the way they aren&#39;t actually running :/
15:50spacurar|builddutyphilor: Does this take some time to be effective since the push ?
16:43fox2mikebhearsum: hello. 90 days to the aus5.m.o thawte cert expiry.
16:43fox2mikebhearsum: should I file a bug somewhere so we can track?
16:44bhearsumi think that&#39;s https://bugzilla.mozilla.org/show_bug.cgi?id=1340880 ?
16:44firebotBug 1340880 NEW, dthorn@mozilla.com Move aus3/4 certificate from SHA1 to SHA256
16:44bhearsumi don&#39;t think we have a thawte cert for aus5
16:44bhearsumshould be digi
16:53Caspy7got someone (in #firefox) asking if we know if/when 53.0.3 will be released
16:53Caspy7anyone know?
16:54catleelizzard: ^^
16:54Caspy7specifically pertinent to bug 1360574
16:54lizzardah, what is the issue t heyre hoping will be fixed?
16:54firebothttps://bugzil.la/1360574 FIXED, hurley@mozilla.com Firefox stops working after 900 connections when using NTLM proxy
16:54lizzardoh. olooking
16:55lizzardMaybe tomrrow or friday
16:56Caspy7lizzard: Thanks!
16:57bhearsumphilor, spacurar|afk: looks like the double triggering is gone, but there&#39;s a bunch of permared
17:02* philor doesn&#39;t know what is supposed to set --cfg unittests/win_unittest.py, but apparently something is
17:04akimach_commands.py line 45?
17:05philorhttps://dxr.mozilla.org/build-central/source/buildbot-configs/mozilla-tests/config.py#2002 copy-pasted from the ones above it?
18:03tjrarr: I made this push this morning and it is stuck so something seems screwy. https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bb78cf9520a2d64235421ab10954e802c05d96d&selectedJob=99789850
18:03arrtjr: hm, no one else is reporting issues
18:03arrmaybe one of the sheriffs can help you out?
18:04tjrI assumed my try errors were related because the last actual error I got (not a timeout) was &quot;403 Client Error: FORBIDDEN for url... Failed to download vs2015u3.zip&quot;
18:04arrtjr: where did you see that?
18:04arrtoday?
18:04tjrLast night: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb26c9b90f8a5e3cc1dc5bb46512dc29d900cbc5&selectedJob=99538045
18:05RyanVMi don&#39;t trust that 12min typical number one bit, but I&#39;ve gone ahead and retriggered that build for you
18:05tjrRight before the comment saying &quot;we think this is fixed&quot; (26 minutes before)
18:06arrtjr: we didn&#39;t kill off old spot instances with the problem, so they rolled out on their own
18:06arrso I don&#39;t think your issues today are related
18:07arrif you&#39;re seeing download or missing token errors today, that&#39;s related
18:09RyanVMtjr: if it had been the issue from yesterday, your build would have long-ago died. The longest one I saw all day went 35min before failing with most in the 10-15min range
19:07tjrRyanVM: arr: Okay I&#39;m pretty sure my timeouts are from my own thing and not a systematic bug
19:07tjrBut hoping you or someone can help me fix them...
19:07RyanVM#build is probably your best bet
19:08tjrWell, it&#39;s a tooltool setup script...
19:08tjrStill build?
19:08RyanVMhrm, maybe not
19:08RyanVManyway, count me out :P
19:09tjrI have a setup.sh script for a tooltool item. The installer is a gui installer that (on my machine) completes in ~25 seconds then sits waiting for the user to click &quot;Ok&quot;. I wrote the below script:
19:09tjrhttps://irccloud.mozilla.com/pastebin/Lfex2SCI/
19:09tjrOn TaskCluster it seems to work: 14:51:21 INFO - 0:41.21 unzipping &quot;z:\build\build\src\wlsetup-idcrl.zip&quot; 14:51:22 INFO - 0:42.36 unzipping &quot;z:\build\build\src\MetadataExchange.zip&quot;
19:10tjrbuildbot not so much: 08:30:03 INFO - 0:45.20 unzipping &quot;c:\builds\moz2_slave\try-w64-d-00000000000000000000\build\src\wlsetup-idcrl.zip&quot; // command timed out: 10800 seconds without output running [&#39;c:/mozilla-build/python27/python&#39;, &#39;-u&#39;, &#39;scripts/scripts/fx_desktop_build.py&#39;, &#39;--config&#39;, &#39;builds/releng_base_windows_64_builds.py&#39;,
19:10tjr&#39;--custom-build-variant-cfg&#39;, &#39;debug&#39;, &#39;--config&#39;, &#39;balrog/production.py&#39;, &#39;--branch&#39;, &#39;try&#39;, &#39;--build-pool&#39;, &#39;production&#39;], attempting to kill
19:13tjrAlthough the timestamps on the taskcluster &quot;unzip&quot; seem odd...
20:44tjrSo debugging something that causes timeouts is pretty annoying. Can I adjust the 10800 second timeout to something that lets me get more than one attempt to run in a workday?
20:51catleetjr: not easily. you could try changing mozharness to unzip verbosely
20:51catleeor tooltool
20:51catleefor some reason it&#39;s taking 3h to unzip that file
20:52tjrThe problem is I can&#39;t see the logs from buildbot until the timeout occurs
20:52tjrIf I could stream the logs that&#39;d be fine
20:53nthomasdo you have one running at the moment ? I could take a peek at the machine to see whats going on
20:53tjrnthomas: https://treeherder.mozilla.org/#/jobs?repo=try&revision=152e795621ad20f951eedd3a2212d8006ac20184
20:54nthomasthx
20:54tjrI strongly suspect that the machine has a setup GUI sitting open waiting for the user to click &quot;Ok&quot;
20:54tjrBut my confusion is why, on buildbot, the setup.sh script did not function like taskcluster
20:56nthomasscreenshot https://irccloud.mozilla.com/file/29Bm4x9j/Screen%20Shot%202017-05-18%20at%208.56.08%20AM.png
20:56tjrYup.
20:57nthomaswhat does setup.sh look like ?
20:57nthomasits quite possible that msys translates a /foo arg into something else
20:58tjrhttps://irccloud.mozilla.com/pastebin/Lfex2SCI/
20:58tjrI&#39;ve tested the syntax decently well on a Windows machine I have locally
20:58tjrI think my next attempt is to add a & after the wlsetup and see if that makes it work...
20:59nthomasnot sure that helps if the installer has completed
21:00nthomasId suspect the taskkill
21:00tjrWell the installer won&#39;t ever complete, it waits for user action
21:00tjrthat&#39;s why I have the &#39;wait 60 seconds then taskkil&#39;
21:00catleecan you not just zip up the resulting install?
21:00tjrbut I&#39;m wondering if, for whatever reason, the setup.exe is blocking in this environment, so the wait 60 seconds/taskkil doesn&#39;t work. So I&#39;ll try running the setup in the background
21:01tjrcatlee: I don&#39;t know. That will be my next investigation. Have to see if it does registry stuff...
21:01catleeew
21:02* nthomas wonders what we need windows live essentials for
21:02nthomasand if it impacts on subsequent runs on instances before they are recycled
21:03tjrWelcome to the rabbit hole ;)
21:03tjrLive Essentials provides a login library that is necessary for Metadata Exchange
21:04tjrWe aren&#39;y going to _login_ to anything, but Metadata Exchange won&#39;t install without it present
21:04tjrMetadata Exchange provides a powershell library that we need to run a powershell script
21:05nthomasfun
21:06nthomascant just grab the powershell library ?
21:07tjrnthomas: Not sure. I&#39;ll investigate that. I assume tooltool&#39;s setup runs as admin, so I can manually copy it to whatever Program Files/ directory I need to?
21:08catleeit probably runs unprivileged
21:44tjrnthomas: Could screenshot https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a3cfae8382f12083340ec7df5749ae455805a47&selectedJob=99892947 as well and see if it shows the same thing?
21:47nthomasone sec
21:48nthomasgonna click Close on the first one, fwiw
21:49nthomasthe terminal for the script didnt close
21:49tjrWait 60 seconds
21:49tjrAnd then the terminal will (probably) close)
21:50nthomasyep, youre right
21:50nthomashttps://irccloud.mozilla.com/file/u2ZPVxcM/Screen%20Shot%202017-05-18%20at%209.49.41%20AM.png
21:50nthomasthats the more recent job ^^
21:51tjrInteresting! The taskkil ran successfully on this one
21:51tjrJust didn&#39;t kill it enough
21:51tjrI&#39;m currently trying to get a powershell script running without the installs and without privledged script installation
21:52tjrThanks!
21:52nthomasnp
21:58arrnthomas: do you know the maximum amount of time an AWS instance is supposed to be up before terminating?
21:58glandiumso
21:58glandiumwhere does buildbot decide to trigger SM builds?
21:58arrwe have a LOT of instances running right now, and I&#39;m wondering if we somehow borked runner so that it&#39;s not illing things off
21:59arrand that might also be contributing to the load on the log aggregators in use1
22:00arrI&#39;m wondering if all of those hosts are doing jobs or if they&#39;re stuck or... what
22:00arrmarkco: ^^
22:11nthomasarr: I think we dont reliably clean up on windows. Ive seen t-w732-spot get stuck for example
22:13nthomasglandium: this is just on windows ?
22:13nthomasseems to be tc otherwise
22:14glandiumnthomas: yeah, windows
22:15nthomashttps://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.py#l1939 is part of it, there must be something looking at the set of changes in a push too. Is that what youre getting at >
22:16glandiumnthomas: I&#39;m looking at what makes them happen on changes to js/ only
22:17nthomastry https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#2504
22:17glandiumnthomas: thanks
22:18nthomasfor some reason it doesnt use the other way we do that, ie https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#79
22:18glandiumnthomas: because it&#39;s an opt-in, not an opt-out
22:22glandiumhow far are we from having those builds handled by tc? it feels terrible to add the same list as in taskcluster/ci/spidermonkey/kind.yml (with the patch from 1365763)
22:26nthomaskmoir-afk would know more about that
22:26markconthomas: just a heads up. I am going to terminate a bunch of 2008 instances that got spuned up yesterday before the everything was good.
22:29nthomasack
22:38sfinkglandium: sorry, I got distracted from applying your review comments over in bug 1325936 and never landed it. Just land your patch.
22:38firebothttps://bugzil.la/1325936 ASSIGNED, sphink@gmail.com Trigger SpiderMonkey builds when there are changes to libs which SM builds depend on
22:38markconthomas: i screwed up and terminated too many in usw2
22:38markcojust a heads up i guess
22:38nthomasKWierso: fyi ^^
22:39KWiersonoted
22:41glandiumsfink: aaaah I knew I had seen something touching that thing
23:08fox2mikegps: you in SF?
23:08gpsfox2mike: yes
23:09fox2mikecan you come to a conference room on 7
23:10fox2mikered devil lounge
23:10fox2mikeplease
23:10gpsfox2mike: sure
23:10gpsbe there in like 60s
23:27glandiumnthomas: buildbotcustom landings still need a reconfigure, don&#39;t they?
23:27nthomasYUP
23:27nthomas-shouting
23:33glandiumnthomas: when is next?
23:34nthomasthis is an elaborate way of saying dude, you have a review request :-)
23:37glandiumnthomas: well, if the next planned reconfigure is not until in a few days, I don&#39;t need to bug you for a review, was my point
23:38nthomasok. they are done on an ad-hoc basis. Ill check the review and shepherd it in
23:38glandiumnthomas: thanks
23:46nthomaslanded on default, just waiting for travis to run tests
23:49travis-cibuild-buildbotcustom#1044 (master - 01a0d70 : Mike Hommey): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbotcustom/builds/233442010)
23:52travis-cibuild-buildbotcustom#1045 (production-0.8 - 2149550 : Nick Thomas): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbotcustom/builds/233442684)
18 May 2017
No messages
   
Last message: 41 days and 14 hours ago