mozilla :: #releng

7 Sep 2017
02:48nthomasthis is one of those sucky report something then run away comments but ...
02:48nthomashttps://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=talos-linux64-ix shows a bunch of machines non-functional and a 1500+ pending count
02:50nthomastalos-linux64-ix-018, talos-linux64-ix-029, and talos-linux64-ix-017 have had a reboot
04:41travis-cibuild-puppet#1943 (master - 94298df : Dave House): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272755760)
04:53travis-cibuild-puppet#1944 (production - dd1fd5d : Dave House): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272757348)
08:18PorkepixHi, little question: regarding nightly and especially photon changes, which OS X versions are internally tested?
08:23nthomas|awayPorkepix: not the best channel to ask that, assuming you mean manual testing
08:23nthomas|awaythere's a #photon-visual which might be useful
08:23nthomas|awayaobreja|buildduty: spacurar|buildduty - https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=talos-linux64-ix is looking very sad
08:23nthomas|awaysomething is stopping those machines, probably after they reboot and run puppet
08:24nthomas|awayon talos-linux64-ix-018 I saw some firewall errors
08:24Porkepixnthomas|away: Well, someone pointed me here to ask that. I was thinking about both manual and automatic. But I guess automatic only detects breakages, not visual glitches and issues
08:24nthomas|awayhah, who/what sent you here ?
08:24Porkepixnthomas|away: And not always easy to get answer there :(
08:24Porkepixnthomas|away: It was pascalc
08:24nthomas|awayPorkepix: fwiw, automated testing is 10.10
08:25aobreja|builddutynthomas|away: just saw that couple of minute ago,i will have a look and solve the problem,i must investigate what is the issue there
08:25aobreja|builddutythank you :)
08:26nthomas|awayeg https://foreman.pub.build.mozilla.org/reports/28645015
08:26Porkepixnthomas|away: Well, I'm getting some little visual things on Mavericks that makes me think it's not really tested on this version. What are OS X versions Mozilla choose to officially support?
08:27nthomas|awayPorkepix: https://www.mozilla.org/en-US/firefox/55.0/system-requirements/ says 10.9 and up, and I don't think it's changing in 56 or 57
08:28PorkepixMmmmh, so I'm on the very last supported one
08:28PorkepixAnd so you think it's #photon-visual to know about the manual testing?
08:29nthomas|awaythat was just a guess really, based on a whois
08:30PorkepixOk, thanks for the informations :)
08:32nthomas|awaygood luck with your quest
08:42nthomas|awaythe firewall stuff doesn't appear to be new, exists before the workers started dropping like flies about 13 hours ago
08:42nthomas|awayI wonder why runner isn't ... running
08:44nthomas|awayX might not be starting ?
08:45* nthomas|away starts to suspect https://hg.mozilla.org/build/puppet/rev/f69996fdd2d3
08:45nthomas|awaybased on https://irccloud.mozilla.com/pastebin/WS52jaxK/
08:49aobreja|builddutycould be https://hg.mozilla.org/build/puppet/rev/f69996fdd2d3
08:50nthomas|awaythe modules/puppet/manifests/atboot.pp part, maybe ?
08:51nthomas|awayhm, no
08:51jmaher|afknthomas|away: is this related to the linux machine backlog? that looks to be osx specific
08:51nthomas|awayyeah, I know. There's nothing much else in puppet landings to blame though
08:51jmaher|afkok
08:52jmaher|afkcould there be something that we ran on firefox (or on try) which caused the os to go bonkers ?
08:53nthomas|awayso bonkers they don't come back after a reboot ?
08:53nthomas|awaythat would be top work
08:54jmaher|afkI admit it is far fetched
08:55nthomas|awayanyone see anything weird in https://hg.mozilla.org/build/puppet/rev/f69996fdd2d3#l4.1 ? that's shared
08:55aobreja|builddutystill strange that runner is not working on talos
08:57nthomas|away[root@talos-linux64-ix-087 ~]# ls -l /var/lib/puppet/last-good-run
08:57nthomas|away-rw-r--r-- 1 root root 0 Sep 6 16:13 /var/lib/puppet/last-good-run
08:57nthomas|awaypuppet is failing
08:58jmaher|afknthomas|away: we do not remove the reboot_flag_file anymore, do we get stuck in a reboot?
08:58nthomas|awaymaaaaybe. it still gets removed at https://hg.mozilla.org/build/puppet/file/production/modules/puppet/templates/puppet-atboot-common.erb#l40
08:58nthomas|awaybeen a long time since I looked at this code/process
08:59jmaher|afkok, probably not an issue
08:59nthomas|awayaobreja|buildduty: could you try running puppet manually on one of these ? perhaps you get some useful logs then
09:00nthomas|awayin foreman it just looks like notice and warning, but maybe that redirect changes is bogus on ubuntu or something
09:00aobreja|builddutyok,sure
09:00aobreja|builddutyalso for some machine puppet seems to work:
09:01aobreja|builddutysorry my mistake i was checking on yosemite
09:02nthomas|awayat any rate I don't think backing this out would be the end of the world
09:02Porkepix$/buffer 58
09:02PorkepixWops, sorry
09:02nthomas|awaynp
09:04nthomas|awayf69996fdd2d3 lines up really nicely with https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/running filtered to talos-linux64-ix
09:06* nthomas|away files a bug for this
09:09aobreja|builddutynthomas|away:nothing suspicious after running puppet: https://papertrailapp.com/systems/talos-linux64-ix-046/events?focus=842374655783366661
09:09nthomas|awayhttps://bugzilla.mozilla.org/show_bug.cgi?id=1397674
09:10firebotBug 1397674 NEW, nobody@mozilla.org talos-ix-linux64 pool is not doing any work
09:10aobreja|buildduty++nthomas
09:11nthomas|awayurgh, would be nice if it gave the exit stats
09:11nthomas|away*status
09:13aobreja|builddutyNotice: Finished catalog run in 16.11 seconds
09:15nthomas|awayin the absense of anything else that's still smoking, shall we try backing that out, or back it out in a puppet env ?
09:15aobreja|builddutyI could test that in a puppet env
09:15aobreja|builddutyis strage because I ran puppet but still:
09:15aobreja|builddutyhttps://irccloud.mozilla.com/pastebin/B5nNIjrF/
09:16nthomas|awayah, you probably called puppet directly though ?
09:19aobreja|builddutyyes i ran puppet agent --test on machine
09:28nthomas|awayI'm going to go afk now, good luck wih this
09:30aobreja|builddutythank you Nick and sorry for keeping you so late :-)
09:50pmooreaobreja|buildduty: good morning! have you decided if you will back out bug 1393524?
09:50firebothttps://bugzil.la/1393524 NEW, dhouse@mozilla.com Change mac-mini slaves to not always run puppet at boot.
09:51aobreja|builddutypmoore:no I'm testing to see if that cause the problem
09:51aobreja|builddutyon my environment
09:51aobreja|buildduty8now
09:51aobreja|buildduty*now
09:51pmooreok :)
10:18aobreja|builddutybug1393524 doesn't seems to affect ,tested on my environment and the issue persist after backout
10:18firebothttps://bugzil.la/1393524 NEW, dhouse@mozilla.com Change mac-mini slaves to not always run puppet at boot.
10:24aobreja|builddutyso a backout will not help
10:36mtabaraaobreja|buildduty: I wonder if testing on a fresh loaner would help
10:38aobreja|builddutypuppet should had change the status : last-good-run but it didn't,still i'm not sure if my test could reveal anything since it's touching reboot flag
11:41travis-cibuild-puppet#1945 (production - cb28bc5 : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272860728)
12:17travis-cibuild-puppet#1947 (master - ab2e356 : Andrei Obreja): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272871772)
12:25travis-cibuild-puppet#1948 (master - 1493fcd : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272874256)
12:26travis-cibuild-puppet#1949 (production - 0668f14 : Amy Rich): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/272874273)
12:42travis-cibuild-tools#1958 (master - e059f26 : Johan Lorenzo): The build passed. (https://travis-ci.org/mozilla/build-tools/builds/272878457)
13:58catleemtabara: re https://bugzilla.mozilla.org/show_bug.cgi?id=1396337, what we'd like is for nightly artifacts to have a short cache max-age, and have release artifacts have a longer expiry
13:58firebotBug 1396337 UNCONFIRMED, nobody@mozilla.org CDN Max Age causes frequent sha512 mismatches on firefox nightly
13:58catleeI wonder if we could override the default max age in the beetmover template
13:59catleekmoir-mtg: thanks for getting that buildbot try syntax patch landed!
13:59kmoir-mtgyw
14:00kmoirglad to see less red on try
14:03catleeyeah
14:03catleeless confusing too
14:06sfraserhmm, wish you could filter by 'compatible with firefox 57+' on addons.m.o rather than just have it as a label you visually scan for when looking through the list
14:44mtabaracatlee: release artifacts as in ... fennec artifacts candidates so far, right? We have a different beetmover for releases. just wanted to make sure we're talking about the same thing
14:45mtabarawould you prefer the hackish short-term plan or a nice rewriting of the templates? depends how urgent this is I suppose
14:51catleemtabara: firefox artifacts
16:04armenzgcatlee: does shipit-uplift have CORS headers?
16:04armenzgwho do I ask while garbas is away?
16:05jmaherjryans: I have noticed that the stylo-disabled test jobs which run on hardware do not run; we only run on stylo- I suspect this is because there is no stylo-disabled buildernames for the tests
16:06jryansjmaher: aha! that's would need a BB change?
16:07jmaherjryans: yeah; I noticed this while moving test jobs from windows 8 to windows 10 on the hardware and getting double runs
16:08jmaheralthough, possibly we don't need all these tests for windows non-stylo? :)
16:10jryansjmaher: well, i believe we'd like to keep the tests alive until stylo hits release. hopefully with SETA limiting, it's not too much load to take on, but you would know better.
16:10ahalso mochitest-chrome stylo-disabled et al wasn't running on windows10 before this change
16:10ahalonly win7
16:11jmaherwell, right now it isn't running in -stylo-disabled; so I guess if it is desired, we should file a bug and get the buildbot buildernames
16:11jryansi guess it's mochitest-chrome, mochitest-clipboard? anything else?
16:11ahalyeah, I'm just saying that turning them back off for now would be the same coverage we had before
16:12ahalif it's desirable to run them there, we should get the buildername added
16:12jryansahal: jmaher: ah, so i guess they were never running before we flip for stylo-enabled mode either?
16:12jmaherjryans: for windows 8/10 it is browser-chrome,mochitest-gpu,mochitest-webgl, mochitest-media, reftest, reftest-no-accel
16:15jmaherwe could get by with just windows10 buildernames and ignore windows 7 buildernames as that is a much shorter list
16:15ahalbut actually.. looking at the taskcluster configs, turning them off would mean creating yet another stylo-disabled test-set
16:15ahalmaybe just adding the buildername is cleaner
16:15jryansjmaher: ah, that's more than i thought.
16:15jmaheryeah, the list is sort of long
16:16jryansjmaher: assuming we can accept the load, i do think we want to run them properly, esp. since windows is our top platform.
16:16jmaherI need to run an errand real quick, happy to hack on this when I get back; possibly similar to the hacking done in this patch: https://bugzilla.mozilla.org/show_bug.cgi?id=1393198
16:16firebotBug 1393198 FIXED, jmaher@mozilla.com add buildbot configs for running remaining windows 8 tests on windows 10
16:16jmaherjryans: yeah, we can figure out the load
16:16jryansjmaher: thanks for the note and investigation!
16:17jmaherok, back in a bit
16:17catleearmenzg: I'm not sure. jlorenzo may know
16:20jlorenzoI haven't worked on ship-it uplift, but I can look
16:35selenamarieif orangefactor hasn't seen an issue reoccur in 2 months, is it safe to close?
16:40Aryxselenamarie: i'd regard "not occurred for a year" as good to close, just reopened some bugs closed during the mass intermittent closure on 2017-07-09
16:49armenzgjlorenzo: hey! thanks
16:50armenzgjlorenzo: marco tells me that Access-Control-Allow-Origin is set on https://shipit.staging.mozilla-releng.net
16:52jlorenzoarmenzg: there's that extension that is loaded no matter what the environment is https://github.com/mozilla-releng/services/blob/master/src/shipit_uplift/shipit_uplift/__init__.py#L21
16:54jlorenzoarmenzg: hmmm, maybe not, after all https://github.com/mozilla-releng/services/blob/master/lib/backend_common/backend_common/__init__.py#L47
16:56armenzgjlorenzo: how hard is it to add a new app that is mainly a rect UI?
16:57jlorenzoarmenzg: I'm sorry, I've never added a new app in services. Marco, Andy and Calixte from relman may have a opinion
16:59armenzgOK thanks
19:10jmahercatlee: https://bugzilla.mozilla.org/show_bug.cgi?id=1397829
19:10firebotBug 1397829 ASSIGNED, jmaher@mozilla.com add buildernames for stylo-disabled tests which run on hardware, win7 and win10
19:11catleejmaher: https://bugzilla.mozilla.org/show_bug.cgi?id=1397521
19:11firebotBug 1397521 NEW, nobody@mozilla.org TC trying to start jobs buildbot doesn't have configured
19:12catleejmaher: https://bugzilla.mozilla.org/show_bug.cgi?id=1381597
19:12firebotBug 1381597 NEW, nobody@mozilla.org create a taskcluster task that tests the decision graph for invalid buildbot-bridge builder names
19:17jmahercatlee: we lost you
19:17catleeI lost you!
19:18catleevidyoooo
19:32tedwcosta: you're probably not around now, but yay: https://bugzilla.mozilla.org/show_bug.cgi?id=1331049#c43
19:32firebotBug 1331049 NEW, nobody@mozilla.org DeadlockDetector death test block for 90s each and cause gtest failure due to timeout on osx debug T
19:35wcostated: I am :)
19:36wcosta\o/
19:36tedalways nice to have your theories confirmed
19:36wcostathat bug deserves a presentation on all hands, it would be a nerdy stand comedy show
19:37wcosta"We wanted to change home directory, and in the end of had to patch valgrind to achieve that"
19:38tedseriously
19:38catleewow
19:38tedactually it would be cool to do a post-mortem on it
19:38catleeyeah
19:38catleeI'd love to see the full story presented
19:38tedi'd be up for spending some time working up a presentation with you and presenting in cancun
19:39teddunno what the schedule is going to be like there, but i'm sure we can shoehorn in 30 mins for interested folks
19:43catlee"the things they don't teach you in school"
19:45travis-cibuild-puppet#1950 (master - 8c8ebe9 : Dave House): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/273040450)
19:45wcostated: I am totally up for it. Busy right now with some docker-worker stuff, but I was thinking on writing a blog post with the full story, then it would be easier to convert to a presentation
19:46wcostaI am only afraid Internet might not have enough memes for this presentation
19:47aki:)
19:48catleeI think it's up for the challenge
19:57travis-cibuild-puppet#1951 (production - 38b7fde : Dave House): The build passed. (https://travis-ci.org/mozilla/build-puppet/builds/273044912)
21:58RyanVMcatlee: on Beta, we just went from pushing to tests mostly complete and ready for gtb in ~2hr flat
21:58RyanVMthat's....fan-frigging-tastic
21:59catleeRyanVM: let's see how long it takes to ship!
21:59RyanVM:D
21:59catleeI think it's down to ~3hr
21:59RyanVMreally amazing to think how far we've come in the last couple years, but even the last 6-12mo really
21:59RyanVMawesome work by y'all :)
22:00catleethank you!
23:02travis-cibuild-tools#1959 (master - 35d4ef7 : ffxbld): The build passed. (https://travis-ci.org/mozilla/build-tools/builds/273096155)
8 Sep 2017
No messages
   
Last message: 12 days and 3 hours ago