mozilla :: #releng

16 May 2017
00:10arrnthomas: checked back in and haven't seen any failures yet. Of course I'm not sure there are a lot of builds happening now, either
00:10nthomasokie doke, thanks for checking
00:11* nthomas awaits sheriff displeasure if there is mass issues
00:11nthomasnon-trivial amount of tooltool failures
00:12arrthat shouldn't be related to instance type, though...
00:12nthomasmaybe to lots of new instances'
00:12arrthat could be
00:13nthomasI hope we hand off a signed url to s3 so we dont have to carry the traffic ourselves
00:14arr82 running in usw2, 0 in use1
00:15nthomaslooks like we do do the signed url thing, but a 403 before that is kinda blocking
00:17arrnthomas: is the 403 expected..?
00:17arr(this is a different issue, right?)
00:18nthomasnot that I know of
00:18nthomas b-2008-spot-162 seems to be several (but not all) of the errors on autoland
00:18arrrecently?
00:18arror were those errors fro the c4 instance?
00:19nthomasyes, from after your cull
00:19nthomaseg http://buildbot-master91.bb.releng.usw2.mozilla.com:8001/builders/WINNT%206.1%20x86-64%20autoland%20leak%20test%20build/builds/1579
00:19arrnthomas: where are you seeing the errors?
00:19nthomashttps://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows
00:20nthomasthe red B's
00:23nthomaslooks like vs2015u3.zip is the only internal file, and its just barfing on that
00:23arrI presume we were seeing those before?
00:23nthomasperhaps the token has gone missing
00:24arrmarkco: ^^ just to keep you in the loop
00:24nthomasIm not sure, investigating
00:24nthomasoverlapping bustage, many clicks, sadness
00:25arryes :{
00:25arra maze of twisty broken things, all alike
00:25nthomasI thought there were supposed to be several tokens in c:\build\. I dont see any on b-2008-spot-162
00:26arrwe didn't change the ami any
00:26nthomasok, who enabled the chaos monkey
00:27nthomasdid we do something like swap to c4, rejigger the drives, swap back to c3 ?
00:28catleerevert AMIs to before may 4th?
00:28arrI don't think buildduty did anything with the drives
00:28arrwe can
00:28nthomasI checked b-2008-spot-190 (which has been running a job for 80 minutes), and it has 7 tokens in c:\builds
00:28arris everything failing or just a few jobs?
00:29nthomaseverything would be too simple
00:29arrmarkco: ^^
00:30arrmarkco: I wonder if some are being instantiated as loaners?? I'm so at a loss here
00:30arrthat would kill ssh and tokens, yeah?
00:30nthomashostnames are currently correct on 162 and 190, but maybe something like that
00:30catleehmm, that's an interesting idea
00:30catleeit would wipe ssh too
00:31arrnthomas: I'm wondering if it's going through multiple ec2config runs and it's pattern matching on a loaner name because renaming is broken
00:31arror passing the name
00:31nthomasdefinitely worth looking at
00:31arrthis is what broke on the 4th (things were showing up with the hostname regex failing)
00:32nthomaswhat updates https://dxr.mozilla.org/build-central/source/cloud-tools/configs/b-2008.user-data#103 ?
00:32nthomashah, its a 404
00:33nthomashow is this supposed to work ?
00:34catleewhere's that pattern matching code?
00:34nthomasbrain_nthomas: module not found
00:34nthomascatlee: https://dxr.mozilla.org/build-central/source/cloud-tools/configs/b-2008.user-data#120 ?
00:35catleeprep-loaner
00:35arrnthomas: that's... not what's on aws-manager2
00:35nthomasIm wondering if were running a copy of Ec2UserdataUtils.psm1 stashed on the AMI, rather than latest, because line 103 is bogus
00:35arrDownload-Module -url 'https://raw.githubusercontent.com/mozilla-releng/build-cloud-tools/master/configs/Ec2UserdataUtils.psm1'
00:35nthomasoh sorry, dxr is stale again.
00:35catleehttps://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/b-2008.user-data#L120
00:35* nthomas goes back 3 steps
00:36nthomassry about that
00:37catleewhat's https://github.com/mozilla-releng/build-cloud-tools/commit/b142bfb0582ed1887a37e5e42d14ec7e1c34331b ?
00:37arrcatlee: I was talking to markco about this earlier today... I asked him to put in some debugging statements to see if the data is being passed correctly from the user data
00:37catleeok
00:37arrbecause the difference I saw when things broke was that it couldn't find its domain
00:37arrand the hostname wasn't matching
00:37catleethis is the most promising lead I think
00:37arrthe hack that got put into place was to match golden as well as gold
00:38arrbut I'm wondering if that's matching EVERYTHING now
00:38arrwhich is what mark was looking into today
00:38catleethis code removes secrets https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/Ec2UserdataUtils.psm1#L775
00:38arrI think the ssh issues are separate and are due to the c4
00:38catleessh and tokens
00:39arrthere weren't any changes to this file when things broke on the 4th/5th
00:39arrso I'm wondering if AWS changed something
00:39catleealthough why weren't tooltool downloads failing before?
00:39arror something on aws-manager2 broke...
00:39nthomasme too
00:39catleesigh
00:40catleethe hamsters are going on strike
00:40KWiersocatlee: so, I merged the fix for bug 1364878 to m-c
00:40KWiersodunno if it'd be worth triggering nightlies at this point
00:41KWiersobut updates can be unfrozen after tonight's nightly comes out
00:41catleeKWierso: ok, thanks
00:41catleewe need a way to reset the latest blobs
00:43markcocatlee: that code only remove secrets when the prep loaner function is called
00:43catleeyeah
00:43catleewe're wondering if that function is being called in the wrong cases
00:44catleeit's dependent on this: https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/b-2008.user-data#L115
00:44catleewhich is around the code that's been busted lately iirc?
00:44catleeIs-HostnameSetCorrectly
00:44arrmarkco: https://bugzilla.mozilla.org/show_bug.cgi?id=1362356#c25
00:45arrmarkco: I'm not sure if $hostname.Contains -gold would save us here
00:46arrI still want to know why it's not getting the correct hostname passed to it
00:46arrbut I'm betting money that's why it's deleting the secrets
00:47arrbecause that string will match b-2008*
00:47catleeyeah...
00:47catleebut why wasn't it deleting tooltool otkens before?
00:47arrdo we know it wasn't?
00:47nthomasfwiw, the tooltool token error shows up before the ssh one in the build
00:48nthomasanyone got a quick tip on how to get a zip file off of b-2008 ?
00:48arrscp?
00:48arrsftp?
00:48nthomasscp to upload host wins
00:48nthomasta!
00:49nthomascue slave doing disappearing act :/
00:51markcoarr: catlee: if the prep-loaner function was being called, The passwords would change as well as the machine would no longer auto login .
00:53arrI don't see any prep-loaner log messages in the userdata reports since 2017-01-25
00:54arrand nothing with "flushed secrets" since 2017-01-06
00:54arr(looking at puppet reports)
00:54arrbut it sure *seems* like that's what's happening
00:55catleedo we log all the function calls?
00:55arrthe things that say Write-Log
00:55arrhttps://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/Ec2UserdataUtils.psm1#L791
00:56catleedo we see other INFO level logs?
00:56arryeah, and DEBUG
00:57catleetwisty maze of busted passages indeed
00:57arrcatlee: https://groups.google.com/a/mozilla.com/forum/#!searchin/releng-puppet-mail/%22purged%22$20%22cltbld%22%7Csort:date/releng-puppet-mail/sxZDiQ_J1nA/JkaF6yVjEQAJ is an actual loaner
00:58nthomasIve got a set of logs from 190 if anyone wants them
00:58catleedo we get those from non-golden or non-loaners?
00:58arrmarkco: any chance there's a psm file on disk that's getting run instead that has old data in it?
00:59arrcatlee: just from loaners
00:59arrhttps://groups.google.com/a/mozilla.com/forum/#!searchin/releng-puppet-mail/%22purged%22$20%22cltbld%22%7Csort:date
00:59nthomaswell, Im seeing 2017-05-15 02:10:00: Ec2HandleUserData: Message: The errors from user scripts: Install-RelOpsPrerequisites : The term 'Install-RelOpsPrerequisites' is not
00:59nthomasrecognized as the name of a cmdlet, function, script file, or operable
00:59nthomasso its not seriously out of date
01:00arrI'm wondering if we should just roll back all the psm changes we made since the 4th and try to debug this from scratch
01:01arrand roll back to the old amis
01:01catleeyeah
01:01catleethat will hopefully tell us if it's us or AWS
01:01arrthings were failing before we made changes
01:01arrso I think it's definitely AWS
01:02arrbut at this point, I think we're also shooting ourselves in the foot somehow
01:02arrand trying to figure out where things are failing will be easier if we stop running into our own footguns
01:03arrmarkco: what do you think?
01:04markcowell if we roll all way back it will fail because of choclatly is blocking u s
01:04nthomasthey removed their block now
01:04nthomasbut we dont want to hit them every time we boot up
01:04arrwe can roll back and take out JUST the chocolately stuff
01:04arrwithout changing the hostname regexes
01:05markcowe can roll this back https://github.com/mozilla-releng/build-cloud-tools/commit/b9c1aed49990b4a7e7ad28a90e030219d2634f5f#diff-c23f7ab876ac167e05468ef79f046cb7
01:05arrmarkco: but we're also getting errors on the init function missing
01:05markcoand by tomorrow i should have some changes that will add additional logging
01:05arrbecause all of it got removed instead of just the chocolately bits
01:05arrwhich is why I was suggesting rolling all the way back then applying new changes
01:06arrbecause if we start adding things back in piecemeal, I'm wondering if we miss stuff
01:06markcowell that would include choclatly because i think portions of those others that were remove will fail with it
01:07arryes, I mean roll back to when chocolately was still then, then create new patches that just remove the chocolately bits (instead of all the other stuff that got removed along with it)
01:11arrand also add in more logging
01:11markcohttps://github.com/mozilla-releng/build-cloud-tools/pull/298 this and reverting this https://github.com/mozilla-releng/build-cloud-tools/commit/b9c1aed49990b4a7e7ad28a90e030219d2634f5f will pretty much have us at that point
01:11arrso we can tell where the hostname and domain name settings are failing
01:11gpsbug 1364072 needs reverted
01:12gpsspecifically https://hg.mozilla.org/build/puppet/rev/fad7b4f445a2fd99e6730197288c00a064d15ae5
01:12gpsor whatever triggered robustcheckout to start being used
01:12catleewhat's happening?
01:13markcoarr: I have to run out for a family thing, but i can get a pR togther tonight that includes what i mentioned above and increase logging.
01:13gpsbecause a try push failed with a failure similar to what happened in bug 1352209
01:13arrgps: hm, I thought aki backed that out earlier because it broek all the things
01:13akii did
01:13markcoMy first go around at increasing the logging broke stuff when i tested it
01:13gpsok. maybe this bug was just filed a few hours too late
01:13gpsfirst i have heard of anything
01:13akiit's back on default, but not on production
01:13arrmarkco: okay
01:14catleemarkco: thank you for digging into this today
01:14arrmarkco: hopefully we'll get to the bottom of this with more logging :}
01:15arrand figure out why it looks like info isn't being passed correctly (which may solve our other problems)
01:16arrnthomas: okay if markco flag you for review?
01:16nthomassure
01:17arrmarkco: if you don't get to it till tomorrow, I can take a look, too
01:18arr(not that I'm a powershell wizard, but...)
01:19arrI'm also wondering if something about the c3 -> c4 change is what messed up the parameter passing for ec2config
01:19arrbut it was a couple days after, and it affected w7 as well
01:20arrso, yeah, a whole bunch of stuff broke within a few days of each other
01:59catleearr: I'm wondering if something is confused about loaner/not-loaner status, then maybe the userdata logging is also confused
01:59catleeso we may not get email logs from regular instances?
02:04nthomasI dont think we do, but I think they still get pushed into c:\logs
02:05nthomashttps://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/b-2008.user-data#L137
02:08nthomasnot sure I grok them properly, but on b-2008-spot-190 theres no mention of flushing secrets
02:09catleeAnd it's busted?
02:09catleeLog would get overwritten per boot, no?
02:10nthomaswas when I pulled the log
02:10nthomas(s)
02:10nthomasthe filenames are datestamped, eg userdata-run-20170515-1515.log
02:10philorwhere are we with the "stays busted" "recovers" thing? should I be killing these dozens of instances I don't like?
02:10catleeThey don't recover
02:12catleeI don't think
02:13philorwell, other than by the usual AWS method, move to a different city where nobody knows what a mess you've made of your life, and start over fresh
02:13philorwe should just use new slavenames for new instances, anybody with any sense who is moving away from their past changes their name
02:42* philor begins to lose faith in "kill things until morale improves"
02:46nthomashow so ?
02:59philorI seem to just be getting new bad eggs to replace the ones I killed
03:01philorpossibly eventually it would work, but killing the awful https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=b-2008-spot-144 just got me https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=b-2008-spot-116 which by the timing got killed for size and respawned as bad-again
03:16* philor keeps killing anyway
03:16philorsometimes, it's its own reward
06:30travis-cibuild-puppet#1339 (master - 3173d2d : Dragos Crisan): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232703779)
06:50travis-cibuild-puppet#1340 (production - 7223871 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232708180)
07:02jcristaudid we not respin nightlies after fixing the addons issue?
07:09KWierso|afkjcristau: I did not. By the time the fix got merged around to m-c, we were close enough to the normal nightlies getting built that I didn't think it was worth manually triggering new ones
07:09jcristauokay
07:09gerard-majaxwhen is the nightlies coming?
07:09KWierso|afkonce normal nightlies wrap up, we'll need to unfreeze updates (unsure if cat.lee did something to make it happen automatically)
07:11KWierso|afkgerard-majax: they start building around 3 hours from now
07:12jcristaugerard-majax pointed out that https://www.mozilla.org/en-US/firefox/channel/desktop/#nightly points at the broken builds. so going back to a working version is kinda hard.
07:14gerard-majaxKWierso|afk, I got tricked because the update applied silently today and my firefox had to restart this morning
07:14gerard-majaxso basically broke my system without me being able to stop it
07:23gerard-majaxKWierso|afk, so overall 12 hours after the fix is merged an update shipping it will be available
09:46spacurar|builddutyface
09:46spacurar|builddutynot here sorry
09:58travis-cibuild-puppet#1341 (master - 9938356 : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232760403)
10:01travis-cibuild-puppet#1342 (production - 8721e76 : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232761297)
11:20arraselagea|buildduty: it looks like the bad robustcheckout stuff got relanded in https://hg.mozilla.org/build/puppet/rev/85781bf4eeac
11:20arrcan you back that out (and out of default, too) please?
11:21arrpuppet is broken because of that
11:21aselagea|builddutyarr: gah, sorry about that
11:21aselagea|builddutyI just wanted to do a merge for dragrom
11:21arraselagea|buildduty: ticking time bomb for someone to hit in default
11:22* aselagea|buildduty takes care of that
11:22arraselagea|buildduty: if you hdn't hit it, I would have :}
11:22arraselagea|buildduty: which is why we should back it out of default as well
11:26aselagea|builddutyarr: so it was backed out first, then pushed to default again: https://hg.mozilla.org/build/puppet/rev/fad7b4f445a2
11:26aselagea|builddutyI wonder why we did that
11:26aselagea|builddutygiven it was causing bustage
11:26arraselagea|buildduty: not sure, but we should definitely keep it out of default for now
11:27arrit's all kinds of broken
11:27aselagea|builddutyyeah, I'll back it out
11:34travis-cibuild-puppet#1343 (master - 4663de7 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232788596)
11:37travis-cibuild-puppet#1344 (master - 10585cb : Alin Selagea): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232789474)
11:40arraselagea|buildduty: thanks!
11:41aselagea|builddutynp!
11:41travis-cibuild-puppet#1345 (production - a44a877 : Alin Selagea): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232790465)
11:45arrmarkco: I'm trying to use the golden ami creation process to test out some stuff, but not sure if it's stuck. When you're around, can you show me how to RDP/VNC in?
11:47arr(I just get popups telling me that the security chain isn't valid when I try to connect with Remote Desktop Connection for Mac)
13:10bhearsummtabara|food: i'm totally lacking context other than e-mail, but is there a reason you set Firefox nightlies to "No-Update" in balrog instead of locking them to a previous one? i'm presuming they had to be shut off for some reason.
13:13mtabara|foodbhearsum: we didn't know at the time which was the last good build and we preferred shuttind them off instead. Additionally, setting them to last good build would have meant adding an extra-rule in balrog for Linux (as build ids differ). eventually we kept the former solution.
13:15bhearsum1ah, ok
13:15bhearsumwe've been eating the cost of the extra rule when doing this lately
13:25spacurar|builddutyDoes anyone happen to know in exactly what set of tests do spidermonkey tests belong?
13:34arrmarkco: so I turned on emailing the logs of the spot instances while I spun up some new b-2008 nodes... As I suspected, they're all running Prep-Golden
13:41travis-cibuild-buildbotcustom#1042 (master - fdbd98a : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbotcustom/builds/232827822)
13:46travis-cibuild-buildbot-configs#2618 (master - 84ce01b : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232829631)
13:53travis-cibuild-tools#1783 (master - 1e6141c : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla/build-tools/builds/232831142)
14:10catleebhearsum, mtabara: speaking of which, we can re-enable updates now
14:10catleeassuming teh nightlies are done
14:11catleebhearsum: I was thinking that we need an easy way to reset the latest blob
14:11catleethen re-enabling updates is safer
14:11mtabarathey should, it's been 4h I think since they started
14:11mtabarayeah, indeed - http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-55.0a1.en-US.linux-i686_info.txt
14:12jmaherkmoir: I landed some changes yesterday to buildbot-configs, when will a reconfig take place?
14:13mtabarajmaher: if f51dfdef83c3 is your change, I think kmoir already did a reconfig judging by https://hg.mozilla.org/build/buildbot-configs/
14:13bhearsumcatlee: i think we can just delete it while nightlies are shut off
14:14bhearsumthe first reporting respun one should recreate it
14:14catleebhearsum: oh nice
14:15jmahermtabara: oh, I see!
14:15catleearr: need any help?
14:15jmaherhmm, maybe my patch didn't do what I thought it would
14:16arrcatlee: things are still broken after last night, so I was trying a few things... have lots of data now
14:16jmaherthe buildbot builder differences look good
14:16bhearsumcatlee: i guess it doesn't help if we forget to do it before respinning...but maybe in that case we can delete and recreate by hand with one of the dated nightlies
14:16mtabarabhearsum: r? https://aus4-admin.mozilla.org/rules/scheduled_changes
14:17jmahermtabara: is there a chance that the reconfig didn't pick up my change, maybe it got the qr-sequential changes- I am not in a rush, just wanting to get an idea
14:17bhearsummtabara: looks good to me
14:17arrcatlee: going to try to get ahold of markco and/or Q to actually help make changes, since they have a better idea what things *should* look like
14:18catleearr: what repo are you referring to in https://bugzilla.mozilla.org/show_bug.cgi?id=1362356#c29 ?
14:19arrcatlee: build-cloud-tools.. the change that got made last week to try and work around the golden ami generation hostname issues
14:19catleeah, ok
14:20mtabarajmaher: to be honest, I doubt so but am no expert in that field. I'll defer to kmoir or buildduty folks to avoid confusion
14:21jmahermtabara: ok! thanks for the help so far, it got me a bit further
14:21mtabaranp
14:22catleemtabara: did the windows builds finish?
14:23mtabaracatlee: good catch, it didn't - http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-55.0a1.en-US.win64_info.txt
14:23catleeI think they broke
14:24Tomcat|sheriffdutyyeah
14:25Tomcat|sheriffdutymtabara: catlee bug 1365219
14:25mtabaraI'll cancel the scheduled changes pending further investigtion
14:25mtabaraTomcat|sheriffduty: thanks
14:27travis-cibuild-buildbot-configs#2619 (master - f67a756 : Joel Maher): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232844704)
14:29catleeTomcat|sheriffduty: which nightlies are done?
14:29catleeTomcat|sheriffduty: did both win32/win64 break?
14:30mtabarajudging by artifacts under firefox/nightly/latest-mozilla-central/ I'd say both win43 and win64 are broken
14:30Tomcat|sheriffdutyyes
14:30mtabaralinux/mac is fine
14:30Tomcat|sheriffdutycatlee: Windows XP opt and Windows 8 x64 opt is broken
14:30Tomcat|sheriffdutynightly wise
14:30Tomcat|sheriffdutyall others non win is fine
14:30Tomcat|sheriffdutyincluding android
14:31catleemtabara: maybe we can make a new latest blob from the union of the osx and linux dated blobs
14:31catleewe're going to be waiting a while for windwos
14:34markcoarr: i am on now looking into things
14:35arrmarkco: I noticed one error off the bat: https://groups.google.com/a/mozilla.com/forum/#!searchin/releng-puppet-mail/Install-RelOpsPrerequisites%7Csort:relevance/releng-puppet-mail/iYeG1-hfUt0/FUFtOH9QBQAJ
14:35arrwell, a few things
14:36arrfirst, it's still missing the Install-RelOpsPrerequisites function
14:36arrthere's an issue with the path to Mercurial.ini (not sure if that's relevant)
14:36catlee [mNotice: /Stage[main]/Users::Builder::Setup/Ssh::Userconfig[cltbld]/File[C:/Users/cltbld/.ssh/authorized_keys]/ensure: removed [0m
14:37arryep
14:37arrthere's also a paren missing from the loaner string match
14:38catleealthough it does set up known_hosts
14:38catlee [mNotice: /Stage[main]/Users::Builder::Setup/Ssh::Userconfig[cltbld]/File[C:/Users/cltbld/.ssh/known_hosts]/content: content changed '{md5}323c01ab1a4b98ad8a56f104d264cb7b' to '{md5}cd041d380b247c4bfe72d48dde977c6b' [0m
14:38arrit's running puppet when it shouldn't be
14:38arrbecause it thinks it's the golden image
14:39catleeah
14:39arreverything is
14:40arr(t- g- y- and b-)
14:40markcoarr: https://github.com/mozilla-releng/build-cloud-tools/pull/298 this will clear up the initial errors
14:40arrmarkco: I did the revert for https://github.com/mozilla-releng/build-cloud-tools/pull/298/commits/3239e8bf9c5bbc955a63dd3e57dd1fbf8bd1579e
14:40arrmarkco: I did not land the patch to put back Install-RelOpsPrerequisites
14:40arrwe should start with that
14:42travis-cibuild-buildbot-configs#2620 (production - 4f348ae : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbot-configs/builds/232850665)
14:42markcoarr: that PR will add it back in without what depends on chocolatly being installed in the script. Could you merge it?
14:45arrmarkco: what is https://github.com/mozilla-releng/build-cloud-tools/pull/298/commits/8ee8044603d4c8fdea2ee30a160bda6a2f787eaf for?
14:46travis-cibuild-buildbotcustom#1043 (production-0.8 - 18a9616 : Rail Aliiev): The build passed. (https://travis-ci.org/mozilla-releng/build-buildbotcustom/builds/232852270)
14:46arr(why are we disabling CloneBundle?)
14:46markcoarr: that is what is causing the "Remove-Item : Cannot find path 'C:\mozilla-build\hg\Mercurial.ini' because it does not exist.Remove-Item : Cannot find path 'C:\mozilla-build\hg\Mercurial.ini"
14:47markcoall of the hg bits is in Puppet
14:47markcoare
14:53mtabarajmaher: fyi, rail just did another reconfig, if that helps
14:55jmahermtabara: yes, I see comments in my bugs about being deployed! thanks mtabara and rail
14:55railnp
14:55mtabaranp
14:57mtabaracatlee: I don't think we can re-enable the nightly updates, not even partially for linux/mac, due to https://bugzilla.mozilla.org/show_bug.cgi?id=1365256#c1, marco just pointed it in #releaseduty
14:58catleeok
14:58arrmarkco: I added those and fixed the regex syntax error as well as adding another debugging line
14:58pascalcif we can't have a working fix for the regression caused by 1358846, I would advocate that we should back it out
15:33AutomatedTesterbhearsum: Callek: pmoore|mtg has directed me to you about the win8 issues in https://treeherder.mozilla.org/#/jobs?repo=try&revision=73819d0d2a85
15:33AutomatedTesterthey appear to be infra issues
15:35bhearsumaselagea|buildduty, aobreja|buildduty: ^
15:35Tomcat|sheriffdutymarkco: arr have to head out but i guess KWierso|afk will be around soon
15:36arrTomcat|afk: rgr, thanks
15:36bhearsumspacurar|buildduty: did you get an answer to your earlier question?
15:36RyanVMarr: if there's anything urgent, I can maybe help until KWierso arrives
15:36spacurar|builddutybhearsum: No but I found out the question was pretty irrelevant
15:36arrryanc: right now it's just us combing through logs
15:37bhearsumspacurar|buildduty: ah, ok
15:38aselagea|builddutybhearsum: hmm, I wonder if that's related to bug 1362356 we are trying to fix
15:39aselagea|builddutythe error seems related to mercurial anyway
15:39RyanVMi wouldn't be shocked if it was
15:40bhearsumyeah, i'm not sure
15:42AutomatedTesteraselagea|buildduty: aobreja|afk: bhearsum should I raise a bug?
15:45aselagea|builddutyAutomatedTester: it's hard to tell at this point what's the root cause for that
15:45aselagea|builddutywe're trying the fix the Windows AMI issue at this point since the builders are mostly failing
15:47aselagea|builddutyI'll need to head off soon, but filing a new bug for this might be worthy I think
15:47aselagea|builddutyI'll catch up tomorrow morning
15:48catleesfraser: did we get OSX en-US updates?
15:49sfrasercatlee: balrog knows about them, so I think yes
15:50RyanVMyeah, we could at least un-throttle linux/osx nightlies I'd think
15:51catleesfraser: huh, ok. I wonder where that's coming from
15:51catleethe en-US buidl didn't generate balrog props
15:51RyanVMi see the last attempt at Windows ones also burned and I'll assume we're holding off on those retriggers until we have a reason to believe there'll be a different outcome
15:52catleeand it looks like the l10n repacks submit directly to balrog
15:52catleehttps://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-16-03-02-06-mozilla-central-l10n/mozilla-central-macosx64-l10n-nightly-1-unknown-bm85-build1-build15.txt.gz
15:52catleeso close!
15:52catleeRyanVM: there was some doubt whether it was actually fixed today at all
15:53catleehttps://bugzilla.mozilla.org/show_bug.cgi?id=1365256#c1
15:53RyanVMcatlee: should I be concerned that the OSX l10n jobs all show as green on TH?
15:53sfraserso the changes to buildbase.py just pushed it down a step?
15:54catleeah, I think maybe funsize is doing it after all..
15:54catleehttps://public-artifacts.taskcluster.net/URMl1QqxRhK-QkcHrCLhGA/0/public/logs/live_backing.log
15:54sfraserit does
15:54catleeRyanVM: why?
15:54RyanVMsorry, I think I misunderstood your comment above
15:54catleesfraser: ah, I thought it needed balrog props to work
15:55sfraserI'm confused now. I remember that, too.
15:55sfraserI thought we were still generating the balrog_props artifact, just not uploading it to balrog, now?
15:57catleeyeah, I thought we should too...
15:58catleedoes funsize use buildprops if it can?
15:59RyanVMcatlee: probably time to consider a backout then
15:59catleeRyanVM: of what?
15:59RyanVMbug 1358846
15:59RyanVMassuming there was an irreversable migration
15:59RyanVMwasn't*
15:59catleeI don't know
16:00catleewho's working on it?
16:00RyanVMkmag
16:00* catlee thinks relman should be making that call
16:00RyanVMi just asked in the bug anyway, he's already got a NI on him
16:00RyanVMI highly doubt RelMan will be opposed assuming there's no technical reason why it can't be done
16:00RyanVMI just know there was a migration involved, so I'm not 100% certain it'd be safe to actually do so
16:01John-GaltA backout for what?
16:02RyanVMJohn-Galt: bug 1358846
16:02John-GaltBut why?
16:02akiis the windows spot puppet email going to quiet down at some point?
16:03RyanVMJohn-Galt: bug 1365256 indicating that there's still issues even after the fix that was landed
16:03RyanVMand nightly updates being frozen as a result
16:03John-GaltOh, I hadn't seen that.
16:03John-GaltNo, a backout probably isn't feasible. I'll look into that bug.
16:04RyanVMok, thanks
16:06sfrasercatlee: which mac step didn't produce balrog_props.json?
16:06catleesfraser: https://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-16-03-02-06-mozilla-central/mozilla-central-macosx64-nightly-bm86-build1-build12.txt.gz
16:10sfrasercatlee: ok, so the things still on buildbot get enough data from the pulse message to construct the info they need. It's the TaskCluster tasks that don't get that, and so need to raid balrog_props.json
16:11arrcatlee: where is the tooltool token stored?
16:12catleearr: in puppet
16:12arrcatlee: got a URL?
16:12catleeit's in hiera iirc
16:12arrcatlee: sorry, I mean the module that writes it out
16:12catleeah
16:13catleehttps://hg.mozilla.org/build/puppet/file/tip/modules/slave_secrets/manifests/relengapi_token.pp
16:13markcois the releng api token that gets used for tooltool?
16:13catleepretty sure
16:14arrcatlee: do clobber builds delete /builds?
16:15catleethey shouldn't
16:15catleethey should delete individual dirs under /builds
16:16catleeyou think something is busted with clbobering?
16:16catleesfraser: ok, that makes sense. so then that leaves l10n repacks
16:16arrcatlee: just trying to figure out where to look for when things are busted
16:20marcoJohn-Galt: why is a backout not possible?
16:21John-Galtmarco: Because other changes already depend on those, and we don't have a reverse migration path for the data changes.
16:21marcocatlee: ^
16:21John-GaltI already have a fix. Just waiting for review.
16:22marcoJohn-Galt: can you trigger a try build and ask Alice or FlorinMezei to verify?
16:23marcoit seems like this should be verified to avoid further regressions
16:23FlorinMezeimarco: no chance for me to verify... can talk to Liz and see if Las Vegas people can help
16:23John-Galtmarco: Not much point. It'll be landed before a try build would finish.
16:24John-GaltOh. Trees are closed.
16:25marcoJohn-Galt: as long as it's verified
16:25marcoJohn-Galt: if you have an inbound/central build instead of a try build it's the same
16:25marcolet's just make sure the bug is actually fixed
16:25John-GaltSure
16:25John-GaltRyanVM: Can I have an a= for landing on a closed tree?
16:26travis-cibuild-mozharness#1021 (master - f1bd83f : Dave House): The build passed. (https://travis-ci.org/mozilla/build-mozharness/builds/232894785)
16:26lizzardmaybe when we land giant db/profile changes land it preffed off so we can test (even on nightly)
16:27lizzardif that makes any sense. Anyway. sounds like we want to land this
16:27RyanVMJohn-Galt: a=me, please land directly on m-c
16:27John-GaltThanks
16:27RyanVMnote that windows builds will likely burn due to infra issues currently closing trees
16:27RyanVMso if we can get someone looking at linux/osx in the mean time, that'd be nice
16:27RyanVMcatlee: and we'll just trigger new nightlies off ^ if testing goes well
16:28catleeRyanVM: sounds good, thanks
16:28jmaherRyanVM: the trees have been closed all day so far
16:28catleewe're still working on the windows issues
16:29RyanVMJohn-Galt: I submitted a PI request for more smoketesting around addon installation/disabling/enabling too
16:30travis-cibuild-mozharness#1022 (production - 1879894 : Dave House): The build passed. (https://travis-ci.org/mozilla/build-mozharness/builds/232896199)
16:30lizzardryan: so did you do a pi-request for emergency testing for htis?
16:30John-GaltRyanVM: That would probably explain why krupa just pinged me about it
16:30lizzardthe problem with this pi system is i cant see whether someone else has filed the thing
16:30lizzardaha, wait, there is a giant spreadsheet where i can see it
16:31RyanVMlizzard: https://docs.google.com/spreadsheets/d/1EzOQrVsPOZUW2m98B3yxiF48rFdkisV5fnn96Wdo8Bw/edit#gid=0
16:31RyanVMhah, yes
16:31lizzardwhat the. why. why not use bugzilla?
16:31lizzardok
16:31lizzardi hate a spreadsheet
16:31gerard-majaxKWierso|afk, still no new build ?
16:32lizzardI think what we need is testing for updating from an old build, and from a broken build
16:32lizzardor do we just assume the folks with broken builds need to re-install nightly?
16:32RyanVMgerard-majax: Windows infra issues blocking things at the moment. One will go out as soon as possible.
16:33gerard-majaxRyanVM, yeah I guess you are not doing that just for fun :)
16:35lizzardHmm. how do they test this if updates are turned off for nightly?
16:35lizzardcan people stlil manually update?
16:35akinightlytest channel works
16:36lizzardok! thanks
16:38arrcatlee: can we force a build on a specific client?
16:38arrcatlee: we've rebooted a host multiple times, and it's not deleting the token. But on systems we found where the token was deleted, we can't find any of our code that would do that
16:39catleearr: not easily
16:39catleewe could hook it up to staging
16:39arrin this case, it's deleting the tokens in c:\builds but leaving the ssh keys
16:39catleea dev master
16:39arrI'm not sure that would mirror prod, anyway
16:45arrcatlee: RyanVM: can we retrigger a build with the trees closed so we can pick up one and se if it fails?
16:45RyanVMarr: push to m-c from 17min ago just had 3 windows builds fail
16:45RyanVMi can retrigger those if you like :)
16:45arrRyanVM: yes, please
16:45RyanVMdone
16:46arrRyanVM: Hm, how are people pushing, I thought the trees were closed?
16:46RyanVMi gave him permission to do so due to extenuating circumstances :)
16:47RyanVM(there's discussion about it above in this channel)
16:47arrRyanVM: sorry, missed that. We're still debugging the token/keys issues
16:47RyanVMno worries!
16:55_6a68hey all! trying to run some talos tests on my local machine, is this the best, up-to-date docs for trying to reproduce Try results? https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer
16:55_6a68I'm also wondering what the specs are of the machines in the Try cluster that run Talos tests. Is that documented somewhere?
16:56jmaher_6a68: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation
16:57_6a68jmaher: thanks!
16:57_6a68Any advice on getting local Talos numbers to resemble Try Talos numbers? Is that mdn page reliable?
16:57jmaher_6a68: to run tests it is typically just |mach talos-test -a <name>|
16:57_6a68Ah, ok. So if the numbers don&#39;t quite match up with Try, I guess the _relative differences_ should still be meaningful?
16:57jmaher_6a68: the local numbers will not come close to try; in many cases you can do an A/B test with a patch locally and see the difference
16:57* _6a68 is not sure what the guidance is for local builds
16:57_6a68
17:00arrRyanVM: again, please?
17:01RyanVMarr: oddly, the ones I requested are still showing as pending
17:02arrRyanVM: okay, cool
17:02* RyanVM hopes that&#39;s due to lag in spinning up instances or something
17:03arrworking instances, yes :}
17:13arrRyanVM: still show as pending?
17:13RyanVMJohn-Galt: we actually having finished Windows TC builds on m-c tip we could use now for verification if someone who can reproduce the problem is around to test
17:13RyanVMyes
17:13arrgood good
17:22travis-cibuild-puppet#1346 (master - e92e6b3 : Aki Sasaki): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/232913823)
17:22RyanVMarr: just started
17:24arrRyanVM: I&#39;m expecting this will fail. I think we found some of the inconsistencies... it looks like the instances are reading an outdated file on disk *sometimes*
17:25John-GaltRyanVM: Not sure who that would be at this point. I can ping krupa, b