mozilla :: #releng

13 Sep 2017
00:34Calleksfink: did you see my ping yesterday?
00:48arrpip <2 won&#39;t work, either, since it was 1.9 that broke things
00:51aki>=1.5,<9.0.1 will probably work ;-)
00:51akiuntil someone uploads some other version
01:07arroh, yeah, you&#39;re right... I completely dysleixaed that
01:08arrthat&#39;s... a big gap
01:08arrI think the bigger problem is that it assumes that it&#39;s going to be the same version across all platforms
01:08arrthe whole reason the new version got uploaded was a request for OS X updates
01:17bholleyglob: is the VCS sync bot stuck? I don&#39;t see
01:18bholleyglob: (autoland was closed for a lot of today)
01:21bholleyglob: ah, there it goes!
01:29akiyeah. i&#39;m still inclined to leave pypi.p.b.m.o alone for now. pinning in mh code is problematic because it affects everyone and everything running those tests, unless we get a good config story going
02:20philornot sure where g-w732-spot-* get their pypi, but apparently it&#39;s somewhere still not-good
02:28nthomasapparently those two proxy instances still have 9.01. (eg
02:31philormmm, my hundreds of retriggers on esr52 aren&#39;t going to go well
02:32philorwell, 172, a smallish &quot;hundreds&quot;
02:34nthomashow did we get from a psutil install error to too new pip ?
02:39nthomasseem to have pip 0.8.2 in that test log
02:39* nthomas is confused
02:40* nthomas tries reading bug 1399151
02:40firebot NEW, Windows buildbot test fail: Could not install python package: C:\slave\test\build\venv\Scripts\pip i
02:40philormaybe &quot;new pip wanted to build new psutil, without a build environment&quot;?
02:44nthomasI blew away the nginx proxy and pip 9.0.1 isn&#39;t present in the dir listing any more
02:49nthomashope we can retrigger to success now
02:50philortrying to
03:32Mardakanyone familiar with updating from ? and where that source .zip exists now?
03:36nthomasmight be in tooltool ?
03:37Mardakah got it via ./mach awsy-test --quick as bc suggested
04:30philornthomas: green retriggers, thanks
04:30philornothing like a 92 minute test suite
04:40whimbooarr, fox2mike I wonder why we have such a clear documentation on Mana which describes everything about this host. :)
07:46travis-cibuild-buildbot-configs#2876 (production - 9c7fc23 : Andrei Obreja): The build passed. (
13:00catleegrenade: is still valid?
13:00firebotBug 1359264 NEW, Remove support for bundleclone from build-cloud-tools
13:00catleecan we remove those references from cloud-tools now?
13:16travis-cibuild-buildbotcustom#1091 (master - a27e287 : Xidorn Quan): The build passed. (
13:46grenadecatlee: probably, considering it isn&#39;t supported server side. I guess safest answer is we need to run some tests...
13:48catleewho&#39;s a good person to review branding changes?
13:58Callekcatlee: maybe ted?
13:59Callekprobably depends on the scope/goal of said bug and what about it your changing
14:05Pikethat one, any firefox peer should be fine
14:05Callekcatlee: so yea I think relman is probably good enough for that, if they don&#39;t know trust the review I can give it a &quot;this does what you say it does&quot; r+
14:21travis-cibuild-puppet#1960 (master - aac975f : Peter Moore): The build passed. (
14:24PikeCallek: do you remember what l10n_tree does in buildbot-configs?
14:24Callekoffhand, no; let me look
14:27CallekPike: looks like it sets (so a buildbot property) -- I don&#39;t see anything offhand using it, fwiw
14:29Pike&#39;k, thanks
14:30Pikeshould we push the mozharness config changes to autoland, or would you rather wait for the buildbot-config patches to be ready to land in sync
14:30CallekPike: *maybe* it was related to old Tinderbox code?
14:31CallekPike: I&#39;d say push to autoland, no reason to hold off
14:31Pike&#39;k, thanks
14:33travis-cibuild-puppet#1961 (production - 8d60c67 : Alin Selagea): The build passed. (
16:20pmooreCallek: is there any way to see when a puppet change has completely propagated?
16:20pmoorethis is for mac workers on generic-worker
16:22pmoorei need to land a gecko change, but want to be sure all workers have been updated first, so they don&#39;t start resolving tasks as malformed-payload because they don&#39;t understand the new task payload fields
16:22pmooreor catlee?
16:23pmoore(basically one worker that didn&#39;t upgrade could sabotage a whole swath of tasks, if there was just one out there not upgraded)
16:26Callekarr: do you have any insight there, now that we don&#39;t run puppet on every boot?
16:26* Callek isn&#39;t sure
16:26arrCallek: they&#39;ll run puppet if the hg checksum has changd
16:27Callekooo, ok. now to the &quot;how do we tell that all workers have run puppet&quot; (which I never had a good answer for)
16:27Callekpmoore: in the past my rule of thumb was +24 hours, and if any stragglers by then we star and reboot/kill/whatever
16:27Callekbut I don&#39;t know if there is a better way
16:28pmoorethanks Callek
16:29arrall of the OS X workers should update within the hour at most, assuming they&#39;ve rebooted
16:29pmoorearr: awesome, thanks!
16:30arrpmoore: if a host isn&#39;t taking jobs and does&#39;t reboot, it will lag behind (but should only burn one job before it reboots again)
16:30pmoorearr: ah, ok, great
16:30arrso if you look at the last time a host did a job, that should give you a good idea
17:17catleeoremj: hey, if I give you a bucket/prefix, would you be able to copy all the files into the product delivery bucket?
17:19oremjcatlee: yeah, if i have access to the bucket. I think your releng api keys should give access to do that as well
17:19catleeoremj: I don&#39;t think I have the keys to post to product delivery
17:20catleethey&#39;re locked away
17:20oremjah, gotcha. I think nick may have a set, but perhaps not
17:20catleeah, ok
17:21catleeit&#39;s mozilla-releng-test ATM
17:21catleethe bucket
17:21oremjif you need it before he is online, I can take a look
17:22catleeok, thanks
17:22catleeshould be ready to go in an hour or so
17:30sfinkCallek: sorry, I&#39;ve been kind of buried in other things. I hope to look at that later today. My plan is to skim through some old logs to see how long the timing out tests are taking, to see which ones are simply slow and can be skipped.
17:30sfinkseparately, I&#39;ll be switching those jobs at some point to structured logging so activedata will have the data from then on
17:31sfinkand I also need to fix the gcMaxBytes failure
17:31catleehwine: FYI - we need to repack updates for 56.0 release. they&#39;ll need to be copied into the releases dir
17:31catleeI guess we&#39;ll need to update checksums
17:31firebotBug 1393789 NEW, Repack updates for 56.0 RCs
17:32catleewill that break any of your checks?
17:32hwinecatlee: thanks - it should not break checks unless you break signatures (which I&#39;ll happily catch for you)
17:32catleeyeah, I hope so too
17:35hwinecatlee: it will be interesting if anyone notices or complains. Good to have the docs in the bug for future curiosity
17:35catleewe&#39;re going to do it for b11 first to test
18:01Calleksfink: sooo, a bunch of the tests from my try running, had runs, where it either timed out or had 0.5 or 0.6s as the baseline in passing runs
18:01Callekone or two had as high as 0.9s
18:02sfinkbut that&#39;s not what I wanted to hear
18:02Calleksfink: but that was where it would be 1 to 3 out of 10 on the failure count per [failing] test, but never the same set of tests (it seemed)
18:02sfinktell me something I want to here
18:02CallekI had basic\orNanTest1.js failing with timeout as well, so not even all complicated test suites
18:03sfinkso I guess it&#39;s all down to Windows and multiprocess not getting along
18:03sfinkbut what&#39;s different about running under taskcluster?
18:03sfinkdidn&#39;t the buildbot jobs do the same d*mn thing?
18:03Calleksfink: sooo, to add to that discussion the BB machines running these in AWS that pass, are on a different instance type than the TC machines (and of course setup by different mechanics)
18:04sfinkhm, ok
18:04Calleksfink: the BB ones are Compute Optimized: c3.4xlarge, the TC ones are c4.4xlarge aiui
18:05Callekbeyond that I don&#39;t have a clue what *actually* is different between machine states...
18:06sfinkis it possible to do test runs using c3.4xlarge? (/me knows nothing about what this would require)
18:06sfinktest, as in, just to see if the instance type makes a difference
18:06catleeCallek: maybe due to EBS warming up?
18:06sfinkit&#39;s timing out in the midldle of the run
18:06Callekgarndt: ping -- offhand, is it possible, without too much trouble, to schedule windows Buildbot jobs in TC (build, instances) using c3.4xlarge instead of c4.4xlarge....
18:07sfinker, wait, which is which?
18:07Calleksfink: tc uses c4.4xlarge
18:07sfinkoh, sorry
18:07sfinkI thought you were saying to do buildbot jobs on c3.4xlarge
18:07Callekcatlee: yea, its the middle of the run, so EBS warm up seems unlikely (but possible I guess)
18:07Calleksfink: yea buildbot is on c3.4xlarge
18:08garndtCallek: we do not control what machines buildbot uses
18:08Callekdoing buildbot on c4 would be harder I think
18:08Callekgarndt: yea I&#39;m asking about TC here
18:09Callekgarndt: as in, hard to setup a new (temp) worker type that uses the same windows ami as the existing win workers and has access to the same secrets (I&#39;ll presume try-is-ok here)
18:09Callekgarndt: but uses the different AWS instance type
18:09sfinkCallek: how would you field about making these tier 1 with -j 1 (--cpu-count 1 or whatever it&#39;s named)? They&#39;d take longer and cost more, but less than running buildbot in parallel
18:09sfinkI canna type today
18:11Calleksfink: I&#39;m ok with it, I&#39;m not sure however how to make then -j1 for windows only, since I don&#39;t want to slow down all SM jobs on other platforms.
18:11sfinkgood point, I&#39;ll do that part.
18:13Calleksfink: I feel if we can get buildbot off for 57 (which is within a week for merge day aiui) we should be happy here, since then we don&#39;t need to worry as much. And we can worry about job-time and -jN N>1 at a later date
18:13sfinksounds good
18:13Calleksince if its a system config issue then we can easily bump the job-count on uplifts, if its an in-tree issue we can decide if an uplift is too risky or not at that point ;-)
18:15Calleksfink: thanks for the assist here
18:15sfinksorry I&#39;ve been so slow about this
18:16Callekmeh, SM isn&#39;t your only priority, and I ended up being away ~3 weeks so not a big deal
18:17hwinecatlee: holler next time you&#39;re in MoTo -- have bundle for safe
18:25sfinkCallek: I think should do it. Can you give it a try?
18:31CallekI&#39;ll send it to try soon....
18:33sfinkI guess I may as well do the initial try push
18:36firebotBug 1393789 NEW, Repack updates for 56.0 RCs
18:36catleehwine: ok, thanks
19:17garndtCallek: sorry for the delayed reply, I think when it comes to the windows stuff, it might be simple? I&#39;m not sure since those are provisioned with OCC. We would need to sync up with grenad.e
19:17garndtCallek: but in theory, yes, it should be easy enough
19:25Calleksfink: fwiw, `./mach try fuzzy` will be your friend now :-)
19:25Calleksfink: `./mach try fuzzy` .. `spidermonkey` .. `ctrl+a` :-)
19:25sfinkyeah, I kinda failed at that push, didn&#39;t I
19:26sfinkwell, other than it crashing when I try it :(
19:27sfinkIOError: [Errno 2] No such file or directory: u&#39;/home/sfink/src/mozilla2/js/src/devtools/rootAnalysis/taskcluster/docs/parameters.rst&#39;
19:27sfinkah, I guess it wants to run from the toplevel
19:29Callekyea (though run from toplevel, IMHO, shouldn&#39;t be forced)
20:00Calleksfink: to be clear, your try push sadly didn&#39;t seem to run any BB spidermonkey, I&#39;m not certain why, but I can&#39;t seem to even trigger it via treeherder right now (I suspect something in that side of things regressed)
20:03Calleksfink: yea thats the one I&#39;m looking at, only the SM-tc jobs
20:03Callekwhich admittedly are the jobs we care most about right now, but still just being sure :-)
20:03sfinkoh, I see
20:26catleeoremj: that bucket is in our account
20:26catleeI think it&#39;s public...
20:35oremjcatlee: I don&#39;t have access to your account. I tried to list the bucket anonymously, but I got access denied.
20:38catleeoremj: did you try since I updated the bug?
20:45oremjdidn&#39;t see the update. I am able to list it now
14 Sep 2017
No messages
Last message: 6 days and 5 hours ago