mozilla :: #releng

15 May 2017
13:14travis-cibuild-buildbot-configs#2610 (master - 86f48df : Kim Moir): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232422652)
13:17travis-cibuild-tools#1780 (master - 0709a28 : Kim Moir): The build passed. (https://travis-ci.org/mozilla/build-tools/builds/232422691)
13:21spacurar|builddutykmoir: Good morning! I have a small problem about the devedition tests on windows 32. It seems I should be eliminating those duplicate tests which run on both ix and vms by splitting them in those which run only on ix, only on vm, and only on vm gfx. I have a way to do this but it will be hardcoding. The other way is modifying the config.py script somewhere
13:21spacurar|builddutycloser to the beginning in order to let the older patches take effect for the devedition platforms. But this means I will have to modify every conditional to add the "win32-devedition" and the devedition platforms, to work along with the win32 ones. What would be the best approach here ?
13:22travis-cibuild-buildbot-configs#2611 (production - d57807f : Kim Moir): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232425191)
13:23kmoirspacurar: good morning to you too! How is this done for the regular (non-devedition) builds? Aren't they implemented the same way?
13:25spacurar|builddutyWell yes. They are implemented the same way. But then there are patches which add new tests to a slave platform and remove others from the same or different slave platforms
13:27spacurar|builddutyI was thinking to add this part : https://irccloud.mozilla.com/pastebin/KiHs3J1h/
13:27catleesfraser: I started digging into the release-stats data, and it looks like we lose dependency information somewhere
13:28kmoirspacurar: can you copy the final list of tests for the regular edition to the dev-edition at the end of the file once they are defined? instead of adding everything to all those loops?
13:28spacurar|builddutyabove this line https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py#l3573 and then will have to add the devedition platforms and slave platforms in all conditionals regarding what's right after that line
13:28sfrasercatlee: the task dependencies are in there, which are missing?
13:28catleesfraser: they're all self-dependencies AFAICT
13:29* catlee looks at anotehr graph
13:31sfraserhmm, so they are.
13:32spacurar|builddutykmoir: ok got it
13:33kmoirspacurar: okay let me know if you need more help
13:39sfrasercatlee: it seems to be an accurate report of taskcluster's data: For example, check the dependecies for https://tools.taskcluster.net/task-inspector/#rMNmP21NTwy53iAC29AO-A
13:39catleeI think that's lies
13:40catleewe should be looking at the scheduler service for this data
13:40catleee.g.
13:40catleehttps://scheduler.taskcluster.net/v1/task-graph/WNGYY6nMTliD_CcUCV_Uag/inspect
13:40catleehttps://scheduler.taskcluster.net/v1/task-graph/dMnCiwCrRqucv7DS-bPOqQ/inspect for the graph for your task
13:41sfraseryes, that looks right
13:41catleewe're still using the old scheduler service for releases
13:41sfraserso the dependencies might work in the future, in task-inspector?
13:42catleeyeah
13:42catleerail has plans to do that I think
13:42sfrasershould be easy enough to check. Look at the graph we've got, does it refer to itself? If yes, go to the scheduler
13:42Tomcat|sheriffdutyrail: catlee pascalc has notified me of issues with the nightly
13:43Tomcat|sheriffdutylike bug 1364878
13:43Tomcat|sheriffdutyrail: catlee pascal has asked to stop nightly updates
13:43pascalchi, there is a patch today creating problems with users profiles and addons https://bugzilla.mozilla.org/show_bug.cgi?id=1364878
13:43catleeok
13:44Tomcat|sheriffdutypascalc: so its confirmed it bug 1361900 ?
13:44pascalcI am getting pinged about it by several people and also experienced the bug
13:44Tomcat|sheriffdutyjust wondering if i should backout and retrigger nightly
13:45pascalcKris is needinfoed, I don't have the knowledge to understand the patch and say that it is indeed the cause
13:45pascalcthanks catlee :)
13:46catleenightly seems to be CPU hungry today
13:48Tomcat|sheriffdutypascalc: ok let us know when we can help
13:49pascalcThanks Tomcat|sheriffduty once we have identified for sure the patch responsible for the regression, let's back it out and regenerate nightlies
13:51Pikeshould we halt updates for now?
13:51catleewe're working on it
13:53catleepascalc: fennec too?
13:53Tomcat|sheriffdutyok
13:53catleeor just desktop
13:54pascalccatlee, desktop only
13:54catleesfraser: yeah, dustin confirmed that tasks created with the old scheduler service will get self-dependencies injected
13:55catleepascalc: do you know what the last good build was?
13:55sfrasercatlee: ok. I'm putting together a script to populate the right values from the scheduler
13:55catleesfraser: brilliant
13:56pascalccatlee, the patch that seems to have caused the bug was 21 hours ago https://bugzilla.mozilla.org/show_bug.cgi?id=1361900#c69
13:56catleeso yesterday's nightly should be ok?
13:56pascalcyesderday's nightly was fine
13:56pascalcyes
14:47katsaselagea|buildduty: ping
14:50Tomcat|sheriffdutyrail things like https://treeherder.mozilla.org/logviewer.html#?job_id=98621474&repo=mozilla-beta is known right ?
15:04aselagea|builddutykats: pong
15:05katsaselagea|buildduty: the win8 qr reftest jobs were failing, i posted another patch to bug 1362397 that i think should fix it
15:05* aselagea|buildduty looks
15:29travis-cibuild-buildbot-configs#2612 (master - 8873717 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232478123)
15:33travis-cibuild-buildbot-configs#2613 (production - da0d212 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232479618)
15:42markcoarr: I am going to dive into the 2008 ssh issue this morning
15:42arrmarco: cool, please pull in whoever you need to help since this is impacting production /cc catlee
15:43catleearr: s/marco/markco/
15:43catleenot confusing at all
15:43arrargh, yes, markco stupid tab complete
15:44catleemarkco: I think the sheriffs could help find recent instances of this failure
15:44catleepascalc: who's working on backing out patches from central and re-triggering nightlies?
15:45aselagea|builddutymarkco: I wonder if you've seen the puppet error I mentioned in #buildduty today
15:46markcoaselagea|buildduty: I did not see it. (my irc client was offline)
15:46aselagea|builddutymarkco: oh, I see
15:47aselagea|builddutymarkco: I spotted some errors on the releng-puppet mailing list
15:47aselagea|builddutyhttps://irccloud.mozilla.com/pastebin/KoEgAGYb/
15:48aselagea|builddutyhowever, the AMI creation for both instance types seemed to work fine
15:49aselagea|builddutykats: r+-ed and landed the patch
15:49markcoaselagea|buildduty: for some reason some w732 istances are trying to puppetize
15:49katsaselagea|buildduty: thanks!
15:49aselagea|builddutykats: I think we'll also need a TH deploy to include the latest changes
15:49markcoI will open a bug up on it
15:49aselagea|builddutyfrom what I can see, we don't have the proper symbol for this new job
15:50katsaselagea|buildduty: yeah i noticed the TH patch from the bug isn't yet in the production branch
15:50aselagea|builddutymarkco: great, thanks!
16:12pascalccatlee, I don't know, I am also not sure we have already identified precisely the patches that broke nightly, I needinfoed Kris about it
16:19RyanVMmarkco: https://treeherder.mozilla.org/logviewer.html#?job_id=99173910&repo=mozilla-inbound
16:19RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99170809&repo=mozilla-inbound
16:19RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99170814&repo=mozilla-inbound
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99165275&repo=mozilla-inbound
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99168198&repo=mozilla-inbound
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99155084&repo=mozilla-inbound
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99165838&repo=autoland
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99165839&repo=autoland
16:20RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99169664&repo=autoland
16:21RyanVMhttps://treeherder.mozilla.org/logviewer.html#?job_id=99169666&repo=autoland
16:21RyanVMFYI, filter TH on "win build" and you can find them pretty quickly
16:23arrI suspect being able to find one in the act is going to be the issue
16:23catleeand dig into its history
16:23catleeI would like to know if it failed the first build
16:24markcoRyanVM: ty
16:24RyanVMyeah, hopefully one of those will still be alive
16:36catleemarkco: which host are you looking at?
16:37markcocatlee: b-2008-spot-045
16:37catleehttps://secure.pub.build.mozilla.org/buildapi/recent/b-2008-spot-045
16:38catleelauched at 4:55 PT
16:38catleeso that build it just did was the first build
16:39catleemarkco: do we use runner there?
16:41markcocatlee: yes
16:41catleemarkco: are there runner logs somewhere?
16:41markcoI was looking at those and did not see any reference to the ssh dir
16:42markcocatlee: from the launch time it looks like it had 2 successful builds before it failed
16:43catleemarkco: yeah, you're right
16:43* catlee failed TZ maths
16:43catleeok, so we can look at the logs for those other jobs for clues too
16:44arrso that sounds like something is blowing away/modifying things after the deployment/a successful job
16:44catleestill could be runner
16:44arrcatlee: is there someone still supporting runner in releng now that morgan is no longer there?
16:48catleearr: what do you need?
16:49arrjust wondering how to find/fix issues if it is runner
16:49arriirc runner doesn't log except to the localhost, so that makes it more difficult to correlate
16:49arryeah, no runner logs for windows in centralized logging
16:50catleelooks like some of the EC2 config stuff is still running per boot
16:51catleehuh
16:51catleehttps://papertrailapp.com/groups/1141234/events?focus=800765024883871799&selected=800765024883871799 is interesting
16:52catleesomeone triggered a clobber of win64 debug at 6:21
16:52arrhm. you think clobber builds might be zorching ssh?
16:52catleeoh, no, that's just it requesting clbober times
16:55catleeis it expected that EC2 config stuff runs each reboot?
16:56arrcatlee: yeah
16:56arror at least my understanding is yes. I'll let markco weigh in on that for a definitive answer :}
16:56markcoyes it is expected
16:57catleeok
16:57catleeanything in there that could be at play here?
16:57catleeand can you send whatever runner logs you found?
16:57arrcatlee: where's the best place to drop runner logs?
16:58markcocatlee: https://pastebin.mozilla.org/9021755
16:58markconot much there
17:01catleeyeah
17:03catleemarkco: what's left in ~/.ssh ?
17:03markcoenvironment file
17:05catleeweird
17:05arrmarkco: is there anything at all about ssh in the ec2config files (either what's on disk or what's on aws-manager2)
17:06catleewow, we still have CVS_RSH=ssh in the environment :P
17:06catleewe use ssh agent on windows?
17:06markcoarr: I am going to check that next.
17:08catleehttps://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-win64-debug/1494854424/autoland-win64-debug-bm70-build1-build1521.txt.gz is the log from the previous good build
17:08catleeit has SSH_AUTH_SOCK set
17:08catleewhich is bizarre
17:09markcoyeah, that makes no sense to me
17:13markcocatlee: arr: I am not seeing anything in the ec2 userdata scripts that would affect the ssh dir
17:16arrI wonder if ec2config is deleting stuff (since at instantiation it's responsible for adding ssh keys, I think)
17:18catleearr: I thought they were part of the AMI?
17:18arrthey are
17:18arrbut I was wondering if ec2config might be doing some sort of reset
17:19* arr is just tossing out ideas, not looking at logs
17:19catleecan we make those files r/o?
17:19catleemaybe the thing that's deleting them will error out
17:22markcocatlee: yes
17:24arrcatlee: any guesses as to why we've seen this for quite some time, but it seems to get worse/better (from what I gleaned from the bug)
17:24catleeno :(
17:25catleemore awareness?
17:25catleenow we fail fast instead of waiting forever to time out
17:25catleeRyanVM may have more info
17:26catleeso it looks like we run the clobber, purge_builds and check_ami runner tasks on windows
17:26RyanVMI do not, but I believe it spiked to current levels around two weeks ago
17:27arrRyanVM: is there any way to tell exacty when by looking at historical data?
17:27RyanVMsure, orangefactor
17:27RyanVMone sec
17:27arrin particular, I would like to know if it happened may 4th/5th
17:28RyanVMhttps://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1147271&endday=2017-05-15&startday=2017-05-01&tree=trunk
17:28RyanVMlooks believable
17:28RyanVMbug 1311861 shows a similar spike around that time (was eventually duped to bug 1147271) https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1311861&endday=2017-05-15&startday=2017-04-15&tree=trunk
17:29RyanVMarr: may 4/5 looks exactly right
17:29arrare may 6th and 7th just missing data?
17:29RyanVMweekend
17:29RyanVMlower volume
17:30RyanVM(going off when jobs actually get starred)
17:30arrmarkco: so maybe whatever broke ami generation ALSO is causing more issues with ssh
17:30* arr wonders if there were build system changes or something made at that time
17:30arrRyanVM: for reference, AMI genration worked on may 4th and mysteriously broke on may 5th
17:30RyanVMinteresting!
17:30arrwe didn't generate new amis until last least week, though
17:31arrbecause we've been trying (unsuccessfully) to find what the heck changed
17:31arrwe put a hack in place to get ami generation going again at the end of last week
17:31arrRyanVM: https://bugzilla.mozilla.org/show_bug.cgi?id=1362356
17:32arrso... what changed on the 4th
17:32arrit wasn't anything in build-cloud-tools or puppet
17:32catleerelated to the chocolatey stuff?
17:33markcowe removed choclately
17:33arrcatlee: that was an effect of AMI generation breaking, I think
17:33arrbut, yeah, chocolately got ripped out entirely since it wasn't being used
17:36arrwe didn't rip chocolatey out till after the 5th, though
17:37catleeany patches on teh puppet masters around that time?
17:37arrno
17:37arrnor build-cloud-tools
17:37catleeI mean the actual hosts
17:37catleelike IT puppet
17:37catleekernel upgrades, etc.
17:37arrno
17:37catleedarn
17:38arrI wonder if that's when proxxy1 broke... though I don't see hat that would have to do with any of this
17:38travis-cibuild-puppet#1329 (master - 410c066 : Jake Watkins): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232523044)
17:38catleehm
17:39arrnoe of this would have touched proxxy, would it?
17:39gcoxonly IT puppetmaster change lately was putting MFA on the boxes, but I have to find the date when that went in.
17:39catleedustin filed that on may 10th
17:39catleeI don't think so
17:39arrgcox: wouldn't have mattered, these dont use IT puppet
17:39catleeand that's in scl3
17:39arrcatlee: it had already been broken for days when dustin filed the bug
17:39catleedo we have the old AMIs around?
17:39catleepre-May 4th?
17:39gcoxarr: Okeydoke, back to ignoring. :)
17:40catleeif so, can we revert back to that for build instances for now? are there changes we need since then?; second, can we use that to compare before/after
17:41arrwe didn't change the ami till end of last week, so I don't think that will solve your probem
17:43arrbut, yes, we have the old AMIs and can vertainly do that if you want
17:44arrhttps://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#Images:visibility=owned-by-me;creationDate=%3C2017-05-04T00:00-04:00;sort=desc:creationDate is the AMIs that were generated on the 3rd
17:44arr(the top ones, if you sort by date)
17:47arrhm, based on https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1311861&endday=2017-05-15&startday=2017-04-15&tree=trunk I'm wondering if this is related to https://bugzilla.mozilla.org/show_bug.cgi?id=1321168
17:48arrcatlee: I think that should be rolled back, anyway, since it's making builds significantly slower ^^ I recommended as much in the bug
17:49arrI would try that first, since that seems to correlate with the times in https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1311861&endday=2017-05-15&startday=2017-04-15&tree=trunk where it looks like things started to go bad on may 2nd
17:50arrand that would also correlate with the slow AWS disk issue, as well, I think
17:50arrcatlee: can we roll that back?
17:51catleeSure
17:53arrI presume aobreja|afk is already gone for the day
17:53travis-cibuild-puppet#1330 (master - eaf566f : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232528756)
17:53arrcatlee: any of the buildduty folks around to back that out?
17:54catleearr: I backed it out
17:54arrcatlee++
17:55arrmarkco: do you want to try generating new AMIs?
17:55arror maybe the problem will just fix itself
17:55catleeno need to for that change
17:55arrI'm not sure if there are discrepancies between snapping a c4 and deploying on a c3
17:55catleehm, maybe
17:55catleenew spot instances anyway will use c3
17:56arrI'm also interested to see if this would fix the issue we saw starting on the 5th... but the timing seems off for that
17:57travis-cibuild-puppet#1331 (production - 8ee6a63 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232530060)
17:57markcoI will trigger new ami generation
17:58arrmarkco: I think hold off for the time being... but when we try a new AMI generation, we should try backing out our regex hack and seeing if that changes things, too
17:58arrI'm wondering if the c4s don't have the hostname length restriction that the c3s do... but I would think that would be all OS-based, not hardware-based
17:59arrcatlee: should we force kill a bunch of builds happening on c4 and retrigger them on c3?
18:00arrspeaking of puppet... looks like someone just checked in some broken stuff
18:00akii just pushed; what do you see broken?
18:01arraki: Puppet (err): Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class mercurial::ext::robustcheckout for t-yosemite-r7-0375.test.releng.scl3.mozilla.com on node t-yosemite-r7-0375.test.releng.scl3.mozilla.com
18:01akiah, wcosta's checkin
18:01akiwcosta looks like i should back out? ^
18:02arrnot limited to yosemite, either
18:02akigoing to guess robustcheckout doesn't exist in older hg's
18:05tjrSo I'm trying to add some installers to https://dxr.mozilla.org/mozilla-central/source/browser/config/tooltool-manifests/linux64/releng.manifest - I've uploaded a zip to tooltool that contains the exe and a setup.sh file.
18:05tjrI've experienced a few hiccups getting the zip downloaded (due to file size/hash mismatches) but those errors are gone and now I just get "The system cannot find the path specified."
18:05tjrSearch for "wlsetup-idcrl.zip" in to see https://public-artifacts.taskcluster.net/Mc1qGqaaS4eN1oUUD3RRfA/0/public/logs/live_backing.log
18:05tjrThe file is the latest upload batch from https://api.pub.build.mozilla.org/tooltool/ if you search for "Ritter"
18:06tjrDoes anyone know what might be causing the "cannot find path specified" error?
18:09travis-cibuild-buildbot-configs#2614 (master - 38eadaa : Kim Moir): The build was broken. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232533636)
18:09travis-cibuild-puppet#1333 (production - 5f42ba0 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232533660)
18:10travis-cibuild-puppet#1332 (master - 457d974 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232533658)
18:11kmoirlanded fix
18:17travis-cibuild-buildbot-configs#2615 (master - 9cc6f58 : Kim Moir): The build was fixed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232536615)
18:25travis-cibuild-buildbot-configs#2616 (master - d261d0e : Joel Maher): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232539068)
18:28catleeRyanVM: do you know who's tracking the fix for the busted nightlies?
18:28RyanVMi don't know if anybody is - kmag landed a fix on inbound
18:29travis-cibuild-buildbot-configs#2617 (production - 9f30f29 : Kim Moir): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232540245)
18:30arrRyanVM: so, is there a way to tell if our error rates are dropping for ssh stuff as things deploy on c3s?
18:31RyanVMother than watching orangefactor?
18:31RyanVMKWierso|afk, philor, Aryx: ^
18:31arrI presume orangefactor is going to take a while to catch up?
18:31RyanVMit should trend with starring data
18:31* Aryx would use orangefactor
18:33RyanVMarr: as long as trees are being actively watched by the sheriffs (and are therefore staying on top of starring new failures), the OrangeFactor graph for bug 1147271 should be reasonably current
18:35catleeRyanVM: ok, we'll need to be told when it's safe to unblock updates
18:35catleeI guess I could watch the bug
18:36RyanVMcatlee: I guess it comes down to whether new nightlies get triggered after the next inbound merge including that fix or if we just wait for tomorrow's
18:36catleecould set up a scheduled change for tomorrow :)
18:36WG9s_catlee, RyanVM: are you thinking maybe either land the fix on central or do a merge inbound and central and respin nightlies?
18:36catleeWG9s_: I'm in no rush
18:36RyanVMWG9s_: i'm not thinking either, not the sheriff on duty here :)
18:37catleeI just don't want to forget to re-enable updates
18:38WG9s_I was just thinking esier to remember to re-enable lager today than to try to remember to do it tomorrow.
18:39WG9s_s/lager/later/
18:40arris there somethign finer granularity for https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1147271&endday=2017-05-15&startday=2017-05-01&tree=trunk (like per minute or hour?)
18:41FloreHi everyone, I have an issue with mozregression involving taskcluster
18:41jmaherarr: orangefactor is per day really- there are timestamps there, but timestamps are when the job ran, which is misleading
18:42FloreWhen it's on bisection everything's OK, but when it switches to taskcluster, it stops working
18:42KWiersocatlee: I just triggered pgo builds on that patch on inbound. I'll either cherry-pick it or include it in a merge at some point today. if that happens quick enough, we could trigger new nightlies. if it gets to be near the end of the day, it might make more sense to wait for tomorrow's normal nightlies
18:42Flore 4:18.64 INFO: Switching bisection method to taskcluster
18:42Flore4:18.64 INFO: Getting mozilla-central builds between 838652a84b76c273e084d0705f3f4f3be89520a8 and 8a7d0b15595f9916123848ca906f29c62d4914c9
18:42Flore4:22.49 ERROR: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)
18:43FloreCan someone help me understand what the problem is. I'm on MacOSX
18:44akiFlore: #taskcluster ?
18:44Floreaki: OK, I can try there ;)
18:45WG9s_Flore: but that might be an issue between no intersection between ciphers you accpt and ciphers taskcluster accperts for an ssl handshake.
18:45WG9s_or perhaps a sha-1 signed certificate issue.
18:47FloreI have no idea, that's why I ask, I would like to go deeper in regressions
18:50catleearr: I see new c3.4xl instances coming up
18:58travis-cibuild-puppet#1334 (master - 1f6984d : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232549440)
18:59philoras usual, I was asking the wrong question about the way we're having tons of disconnects while running try reftest jobs on t-w732-ix which then leave the slave broken until it is rebooted
19:00philorwhy the hell are we running reftest on t-w732-ix on try?
19:02travis-cibuild-puppet#1335 (production - e1881c5 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232550845)
19:02philorthe only place we're still running it on hardware is esr45, which is probably past its sell-by date now and we've just forgotten to turn it off
19:03RyanVMdo eeet
19:04WG9s_philor: well you see, that is the difference between the sell by date and the use by date! ;-)
19:05* WG9s_ thinks we really need a don't use after date.
19:05arrcatlee: so I see a build failure on a c3, but I'm not sure it's the same thing... this is a forbidden URL code
19:05arrhttps://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=99217700&lineNumber=1785
19:06* WG9s_ was trying to say that was for a date stamp on food also. That is the date we really need and none of the current existing or proposed for food labeling dated seem to include that date.
19:06arrcatlee: should that be starred in with these other failures?
19:06catleeno, that's different
19:07arrhm, not sure why it's showing up in https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1147271&endday=2017-05-15&startday=2017-05-01&tree=trunk then
19:08philorarr: because that shows what *was* starred as that bug, not what *should* be starred as it
19:08arrso we have junk data?
19:08arrfor that particular thing?
19:08philorit's as good as the humans creating it
19:08arris there a way to disassociate it since it's not the same issue?
19:09WG9s_philor: so are you complaining about things not starred that should ahve been or things startred that should not have been? Or perhaps both?
19:09philornope, you can unstar it as far as treeherder is concerned, but you can't unstar as far as orange factor is concerned, nor can you usefully file bugs about orange factor because it has been walking dead, gonna replace, for years now
19:10arrhrm. okay. Guess I'll just hand inspect, then :/
19:10WG9s_philor: OIC and issue with the ornage factor system you can decide later oh i should not have stared that an unstarred it but that does not effect the orange factor system.
19:11catleephilor: can you please file a bug on that t-w732-ix reftest issue?
19:14philorhttps://bugzilla.mozilla.org/show_bug.cgi?id=1365008
19:15catleethanks
19:32arrcatlee: RyanVM: I went in and killed off some of the c4.4xlarge offenders that keep burning builds at the moment
19:32arr(145, 102, 125, and 160). They were all in usw2, which may just be a coincidence /cc markco
19:34arrthe only thing I;ve seen fail on a c3.4xl is starred with this bug but is actually unrelated
20:13travis-cibuild-puppet#1336 (master - 6e39729 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232574651)
20:17travis-cibuild-puppet#1337 (production - 0453ecb : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232576236)
20:54travis-cibuild-puppet#1338 (master - 284b3ff : Wander Lairson Costa): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232587578)
21:01nthomasare we doing better on c2 ?
21:02nthomaser, c3
21:06catleenthomas: we still have a bunch of c4s running
21:06catleedidn't get around to killing them all off yet
21:06nthomasah, ok
21:13coopi am disappointed that armenzg_afk gets out of alligator wrestling this time around due to paternity leave
21:13RyanVMclearly we just need to send an alligator to him
21:29arrcatlee: should we just kill off all the c4s?
21:29catleearr: why not wait for them to die?
21:29arrthat's easy enough to do
21:29arrcatlee: because they keep failing
21:29catleeif you find ones that are failing, then kill them by all means
21:29arrthat's what I'm doing
21:29catleeok
21:30arrbut I'm wondering if we should just kill them ALL so that any errors that get reported are real at this point
21:30arrothrwise they won't die off till tomorrow
21:51tjrDoes anyone have a suggestion for where to put code to run post-build on buildbot? I was poking around PostScriptRun in mozharness, but of the examples I find in-tree, i can't match up their log messages to actually being executed.
21:52tjrI can try writing my own script, but I'm not sure how to 'add' it to get picked up and included...
21:52akiPostScriptRun is probably the best solution for that
21:53akishould be targeted to specific scripts
21:59arrcatlee: nthomas: will builds get automatically restarted if their instance dies?
21:59nthomasyep
21:59arrthere's 20 left
21:59arrI think we should just kill them off and start clean with c3
22:00nthomasseen any failures on c3 yet ?
22:00arrnthomas: none that are *actually* ssh errors, no
22:00arrthere was at least one that was listed in https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1147271&endday=2017-05-15&startday=2017-05-01&tree=trunk that was not at all related
22:01arrc4 instances do continue to fail, though
22:01nthomassounds promising, if the c3s have been up long enough to finish some work
22:02arrI'm wondering if there's some timeout on the c4s (we know they can take MUCH longer) that winds up cleaning up the ssh config dir
22:02arrdunno if runner would do that or what
22:02nthomashuh, c4 should be faster but turns out not to be ?
22:03arrit's slower because the disk is slower (builds are on the boot disk)
22:03arrI warned against this in the bug before we switched over to c4
22:03nthomasah, ok. I remember g.ps talking about that
22:03arrnthomas: https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bmozilla-inbound,99aa49aa36f11a56ffb54f9ce5720bb82a7a6f8d,1,2%5D&series=%5Bmozilla-inbound,8ddd7cf44dad0352a7c715dcb6df1776fb2d3df0,1,2%5D&series=%5Bmozilla-inbound,18851c0bb8875e85890ba4b1aed2439f8c682743,1,2%5D&series=%5Bmozilla-inbound,7e2973e7febe97736736e1373db56d25161d45ce,1,2%5D
22:03arrapparently reporting wasn't working on during the couple days when it looks like 0 time
22:04arrbut that's when the switch happened
22:04arryou can see that builds are all over the place after that
22:04arrso, yes, some are faster, but not all
22:04nthomasfun fun
22:04arrand trying to get 2008 to see a second disk reliably was the difficulty when we tried in the first place
22:05arr(and why we opted not to switch to c4 back when we tried this the first time)
22:05nthomassounds like its worth any rebuild hit to get rid of the c4s
22:06arrokay, I'll kill them off
22:06arreverythign should be c3s from now on
22:06nthomasboth b-2008 and y-2008 ?
22:07arryep. I filtered by instance type and state
22:07nthomascool
22:08nthomasonce youre done there it'd be great to chat about ntp
22:08arrthere are two left running in use1 which I assume are markco (b-2008-spot-039(markco) and y-2008-ec2-golden)
22:09arrnthomas: I was just about to head off for the night. I know there's a NI on the bug for me for tomorrow
22:09nthomasok, no worries
22:09arrnthomas: what did you want to chat about?
22:09nthomasjust the turn up the alert threshold question
22:10arryeah, I'm not really keen on that, honestly. It's already waiting a long time before alerting
22:10arrwhat's your take?
22:10markconthomas: in regards of 2008, if you need anything i will still be around for a couple hours
22:10nthomasarr: me neither, feels a bit like papering over a problem
22:10arrexactly
22:10nthomasmarkco: thanks
22:11nthomasmarkco: I did wonder the motivation for removing chocolatey last week. Is that stuff all in the golden image now, or no longer needed ?
22:11arrnthomas: we don't use it
22:11markconthomas: it is not needed.
22:12arrthere was a grand plan to use chocolately when we were still going to use puppet, but since we're moving to tc instead, we're not going to implement that
22:12nthomasok. It also removed some log aggregator pref setting, same for that ?
22:12arrnthomas: which pref setting?
22:13nthomastheres a call to Configure-NxLog -aggregator $aggregator, in https://github.com/mozilla-releng/build-cloud-tools/commit/f029f17a73e0f3a6fa2d00fb54cc2b5a95dfc504
22:13nthomasjust double checking, not hugely worried
22:14arrmarcia: ^^ that shouldn't have been deleted
22:14markcoarr: i assume that was meant for me
22:15arrargh, yes
22:15markcoi will added back on the next pr
22:15arrnor should the error reporting stuff, I think
22:15arrunless those are being installed elsewhere
22:15nthomasthanks mark
22:16arrsublimetext3 should have been deleted
22:16arrI think everything else in there probably should have stayed
22:17arrmarkco: was Install-MozillaBuildAndPrerequisites not being used, either?
22:17* arr isn't sure if this stuff is all handled elsewhere
22:17markcoarr: no and it had caused issues previously
22:17arrokay
22:18arrI wonder if Install-RelOpsPrerequisites wasn't being used at all... I would think it would error out if it was
22:18arrsince it doesn't exist anymore
22:18arrbut we should verify that logging is still being set up correctly along with windows error reporting
23:16travis-cibuild-tools#1782 (master - c0fd319 : ffxbld): The build passed. (https://travis-ci.org/mozilla/build-tools/builds/232627822)
16 May 2017
No messages
   
Last message: 7 days and 4 hours ago