mozilla :: #ateam

16 Mar 2017
07:43whimbooglob: so I upgraded mercurial from 3.9.1 to 4.1.1 now
07:44whimbooi dont have anything to push the nexthours but will have an eye on it
07:44globwhimboo: thanks. current theory is this is a result of three different things falling into perfect alignment
07:45globone's easy to fix (client upgrade), one is harder (server work), one is impossible (load balancer bug)
07:52whimbooi see
07:52whimbootricky
08:08whimbooTomcat|sheriffduty: looks like we had no linux nightlies on central yesterday
08:08whimboowas there a known taskcluster fallout?
08:10whimboohttps://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=8dd496fd015a2b6e99573070279d9d1593836ea9&filter-searchStr=nightly&filter-tier=1&filter-tier=2&filter-tier=3
08:10whimbooall fine for ohter platforms
08:12whimboooh. the decision tasks were failing
08:12whimboowill file a bug
08:17Tomcat|sheriffdutyyeah there was a bug
08:17Tomcat|sheriffdutythats now fixed
08:17Tomcat|sheriffdutybug 1347569
08:17bugbotBug https://bugzilla.mozilla.org/show_bug.cgi?id=1347569 General, blocker, dustin, RESOLVED FIXED, Decision Task for Nightly Desktop + Android failed with KeyError: u'NS7jKig_R8-1F_7DWSTQ-Q'
08:17whimboooh
08:17whimbooso i filed a dupe. hurray
08:17whimboowe should have starred it in treeherder
08:18Tomcat|sheriffduty;-)
08:18Tomcat|sheriffdutyyeah at the time i filed the bug it was not fixed
08:18Tomcat|sheriffdutyso wanted to get attention to it
08:18Tomcat|sheriffdutybut will still star it
08:19Tomcat|sheriffdutywhimboo: should be fixed soon
08:20Tomcat|sheriffdutythe US did already the time switch so nightlys start an hour earlier for us
08:26whimbooyepp
08:26whimbook, heading out for a run. bbl
09:10nishu-tryinghardjaws, i think i can work on this https://bugzilla.mozilla.org/show_bug.cgi?id=940882#c32 how should i check if the changes i made are correct and test them while i work on it.
09:10bugbotBug 940882: Mochitest, normal, nobody, NEW , Consolidate waitForCondition implementations and switch to using BrowserTestUtils.waitForCondition
09:23nishu-tryinghardnvm i have gone through the html and see that script tags are used to load the required scripts.
09:24nishu-tryinghardAnd what about the implementation of the 2nd function, waitForConditionPromise, which returns a promise
09:25nishu-tryinghardThe link for that function implement is dead
09:25nishu-tryinghardimplementation*
09:43jmaher|afkmarco: as a note, all of the tests should be enabled for code coverage now (the patch got merged last night (at least in my tz) to mozilla-central
09:43marcojmaher|afk: awesome
09:43marcojmaher|afk: bug #?
09:44jmaher|afkbrowser hung right now
09:44jmaher|afkone sec
09:44jmaher|afkmarco: https://bugzilla.mozilla.org/show_bug.cgi?id=1347241
09:44bugbotBug 1347241: Code Coverage, normal, madeleinechercover, RESOLVED FIXED, Enable linux64-ccov coverage collection for common tests set
09:45jmaher|afkit didn't run on the cron scheduler, so we will get results in ~12 hours on the next cron cycle
09:54Ms2gerStill mochitest-only?
09:55jmaher|afkMs2ger: no, everything- coverage had wpt/xpcshell a long time ago; now we have marionette, gtest, cpp, etc.
09:55Ms2gerHuh
09:55jmaher|afkprior to that bug: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=8c89d1991786625a64d868798281610872a2bc26&filter-searchStr=ccov
09:55Ms2gerTIL
09:56jmaher|afkafter the bug: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3fb43f3b51ec24e235e3b52b99653275da08c089
10:09marcojmaher|afk: now only e10s is missing, right?
10:09jmaher|afkmarco: intentionally
10:10jmaher|afkmarco: we need to ensure that we actually get coverage from >1 process
10:17marcoyep, I remember the discussion
10:52jgrahamjmaher|afk: So how does one get access to the coverage data from those builds?
10:53jgrahamDownload code-covereage.gcda.zip and then?
10:53AutomatedTesterjgraham: marco might be able to help?
10:53jgrahamAutomatedTester: YEah I was about to say
10:53jgrahammarco: ^
11:04jmaher|afkjgraham: there are the artifacts that you see, but we don't have anything hooked up to display the data
11:05jmaher|afkjgraham: if you stay tuned in a few weeks we might have a solution for seeing the data
11:12Tomcat|sheriffdutywhimboo: the nightly task failed again ..i'm on it
11:17jgrahamjmaher|afk: But presumably I cna do something locally
11:17jmaher|afkjgraham: that I do not know
11:18jgrahamLike presumably something something lcov. Although maybe these days something something marco's program; idk
11:18jmaher|afkjgraham: I assume if you get the gcdo file from the corresponding build and the gcda file from the test job you could run that through lcov
11:23jgrahamHonestly our python packaging setup has more hacks than the Daily Mail
11:23* jgraham wonders how he has managed to make pip hit the recursion limit
11:45jmaher|afkjgraham: ha
12:11jlastHi - I'm running into some problems setting up a docker image for m-c. Is there a docker image for m-c that I could pull down?
12:14jmaher|afkjlast: yeah, linux64 I assume?
12:14jlastyep
12:15jmaher|afk16.04 or 12.04? I believe you are working on devtools stuff, right?
12:16jmaher|afkdevtools runs on 1204 right now
12:16jlast12.04 is good
12:17jmaher|afkjlast: https://queue.taskcluster.net/v1/task/CmYINgRlT_SicRWBGetVOQ/runs/0/artifacts/public/image.tar.zst
12:17jmaher|afkit was about 8 steps to find that- I am happy to share if you want
12:18jlastthanks - i'll give this a try
12:18jlasthow often is it updated
12:18jmaher|afkjlast: in this case 7 days ago, I would say rarely- maybe once a release cycle
12:18jlastactually - if we want to update it, is there an easy way to get a new url
12:19jlastthat's probably fine
12:19jmaher|afkupdating is fairly easy
12:19jmaher|afkgive me a minute to show you a link, prior art, etc.
12:19jlastsure
12:20jmaher|afkjlast: I would ask that any updates to 12.04 are done to 16.04, we are working on getting everything on 16.04
12:20jlastokay. that should be okay
12:20jmaher|afkjlast: the definition for 1204 is here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/recipes/ubuntu1204-test-system-setup.sh
12:20jmaher|afk1604 is in the same directory
12:21jmaher|afkjlast: if you push to try and that file changes it will *automatically build a new docker image and use it magically in your test jobs*
12:21jmaher|afkthe * is to indicate that it might need a retrigger or one try push for the docker image and a second for the tests, ymmv :)
12:22jlastokay. We use the image to run mochitests in Circle CI by the way
12:22jmaher|afkand here is an example of a recent bug that updated the docker images: https://bugzilla.mozilla.org/show_bug.cgi?id=1334641
12:22bugbotBug 1334641: General, normal, botond, RESOLVED FIXED, Patch libxcb on Ubuntu 12.04 testers to include the fix for "xcb_conn.c:186: write_vec: Assertion `!c->out.queue_len' failed"
12:23jmaher|afkjlast: oh?!? is that the devtools development toolchain prior to m-c integration?
12:23jlastyes - we do our development in github with a separate CI process and update M-C semi-regularly
12:24jmaher|afkjlast: cool; I have meant to follow up on that and learn what works great or how any small changes could make it smoother if possible
12:24jlastoh great. I'm on a work week this week, but we could schedule a time to talk next week?
12:25jlastwhat's a good way to load the image? `docker load -i image.tar.zst` is failing with `archive/tar: invalid tar header`
12:26jmaher|afkoh, there is a bit more magic
12:26jmaher|afkwe compress it for speed
12:27jmaher|afkjlast: can you jump to #taskcluster, I have reached my common knowledge
12:28jlastsure
12:29jmaher|afkjlast: and to answer your question, I think we should chat next week- what timezone are you in?
12:29jlastEST
12:34marcojgraham: I was in the process of writing a small python tool to get all the coverage artifacts given a treeherder build and generate a report
12:35marcojgraham: doing it manually is kind of a pain, as you have to download several artifacts (one for each test suite chunk) and run lcov merging all the results
12:35jmaher|afkmarco++
12:35jmaher|afkmarco: I know treeherder api stuff if you need help there
12:35jgrahammarco: I'll be your guinea pig if you like :)
12:35marcojgraham: also, lcov takes 10 minutes on my machine for each artifact; so if you have 90 artifacts it's going to take 900 minutes
12:35jgrahammarco: What's an artifact here?
12:36marcoa zip file containing gcda files
12:36marcojgraham: thanks, I'll ping you once I have something working :)
12:36jgrahamOK, so if I only care about wpt then I have 14 odd artifacts
12:36jgrahamSo 140 minutes sounds fine (dunno how machine performance compares)
12:38marcojmaher|afk: I would indeed need some help; I'm querying taskcluster directly ('https://index.taskcluster.net/v1/task/gecko.v2.try.revision.' + revision + '.firefox.linux64-ccov-opt'); I suppose there's a better way to get a taskcluster task given a treeherder build
12:39jmaher|afkmarco: hmm, let me think a bit
12:40jgrahammarco: Why do you need the taskcluster task? If you're just downloading artifacts you wnat something like testing/web-platform/update/fetchlogs.py
12:41jgrahamThat downloads raw log files, but it's easy to change to download some other artifact type from some other set of jobs
12:41jmaher|afkjgraham: that is more of my familiarity- I assume we could iterate the artifacts and look for *gcda*
12:42* jmaher|afk typically loads up treeherder in firefox + devtools and looks at the network requests to find the correcct api
12:43jmaher|afkmarco: https://treeherder.mozilla.org/api/jobdetail/?job_guid=741b67e1-2233-43d5-9465-2f05887113b6/0
12:44jmaher|afkmarco: look at each result, if value matches *gcda*, download the url
12:45jmaher|afkmarco: I can put a script together quickly if you wish, I have some prior art to get all jobs; although it is fun to solve this on your own the first time
12:45marcoI've used taskcluster because I initially wrote the code to get the latest scheduled linux-ccov build
12:46marcoI decided later to make it generic
12:46jmaher|afkok, I haven't used taskcluster api much
12:47marcogetting the latest task was as simple as requesting https://index.taskcluster.net/v1/task/gecko.v2.mozilla-central.latest.firefox.linux64-ccov-opt
12:47marcojmaher|afk: is there a way to go from a revision to a job guid?
12:47jmaher|afkoh, that is cool
12:48marcojmaher|afk: I was thinking the CLI tool should ask the user for a revision
12:48marcoas that's the most user-visible thing I suppose, right?
12:48jmaher|afkmarco: that sounds reasonable
12:50jmaher|afkmarco: so revision is something more treeherder specific
12:52jmaher|afktypically I have to take revision and translate to pushid via treeherder, then use pushid to get all jobs ran
12:52jmaher|afkmaybe in #taskcluster we could find more info on taskcluster api
12:54jgrahammarco: My code I pointed to does that :)
12:54marcojgraham: indeed, I've just read it :P
12:55Ms2gerAutomatedTester, https://dxr.mozilla.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_window_position.py#32-37 lolwat
12:57Ms2gerAutomatedTester, okay, looking at blame makes it funnier
12:59* jmaher|afk prepares to afk for 2 hours
13:13jgrahamSo, sersiously, does anyone understand pip?
13:13jgrahamhttps://treeherder.mozilla.org/logviewer.html#?job_id=84261122&repo=try&lineNumber=2073
13:14jgrahamIt tries to install six, but on the Linux images (not on OSX or Windows) it's already installed, so it proceeds to try and uninstall the existing version and then dies horribly
13:17AutomatedTesterjgraham: emorley does
13:32emorleyjgraham: at a guess I'd say we're still using an ancient pip/setuptools version
13:33emorleylast time I saw we were still uses 1.x or 6.x in places, when 9 is the latest (and 8 is required for hashes support)
13:33emorley<insert usual sorry state of packing on linux story here>
13:37emorleythough I see higher up the logs: &quot;pip 9.0.1 from /home/worker/workspace/build/venv/local/lib/python2.7/site-packages (python 2.7)&quot;
13:37emorleyunless it&#39;s a different pip being used later
13:37emorleythere&#39;s also lots of &quot;error resolving pypi.pvt.build.mozilla.org (ignoring): &quot;
14:01AutomatedTesterdecamping to coffee shop before fetching my kids
14:02jgrahamemorley: It appears to be a bug? if you try to install a package that&#39;s already installed from a specific path
14:03jgrahamI&#39;m hoping that https://treeherder.mozilla.org/#/jobs?repo=try&revision=01e2c8a22a579af3609f806212dd630c3db8dc81&selectedJob=84302535 (specifying a normal dep rather than a path in marionette_requirements.txt) works, even though I don&#39;t know where six comes from on those machines
14:05emorleyah
14:06jgraham(maybe a bug in pip, but there&#39;s no many layers of awfulness here that it&#39;s hard to tell)
14:07jgraham*so
14:15whimboojgraham: wptserve?at least that is listed in line 3 of marionette_requirements.txt
14:15jgrahamwhimboo: wptserve now depends on six
14:16jgrahamThat wasn&#39;t getting installed which was causing bustage in Mn tests, and, amusingly, reftests (reftests don&#39;t use wptserve)
14:16jgraham(afaict)
14:16whimboojgraham: maybe pick a one-click loaner and check without running any tests
14:17jgrahamwhimboo: Yeah, I have
14:17whimbooso it&#39;s globally installed?
14:18whimboobut we run virtualenv --no-site-packages --distribute /home/worker/workspace/build/venv
14:18whimbooso we should not get those pacakges into the venv
14:18jgrahamI think it&#39;s available somewhere because mochitest also uses it
14:19jgraham(and it clearly gets installed into the virtualenv already on linux because you can see it in the loaner, and that&#39;s why the subsequent install fails)
14:20jgrahamSo I hope just supplying a dependency string rather than a path which has to be installed will be enough to use the existing version if it&#39;s already installed, or install it from wherever if it&#39;s not. But we shall see.
14:22whimboojgraham: i think it comes with requests
14:23whimboonot sure what those obj dirs are in the tree but I can see it here
14:23whimboohttps://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/_virtualenv/lib/python2.7/site-packages/pip/_vendor
14:27jgrahamted: Say what you like about entitled linux users, but I&#39;m infavour of unsupporting all platforms that we can&#39;t run in a VM. Pretty sure that costs us way more productivity than dealing with some &quot;but what about my obscure home-rolled distro&quot; users
14:34whimboojgraham: what is the benefit of the failure classification panel? I wonder why i cannot use it to star failures
14:34whimbooit has a kinda good prediction
14:35jgrahamUsing it for classification is only enabled for sheriffs at the moment
14:36jgrahamThe benefit is per-line classification which feeds back into the autoclassification backcend
14:37jgrahamBut it&#39;s a little more fiddly than normal classification so it makes sense to limit the users at least at first to maximise the data quality
14:37jgraham(happy you like the prediction though)
14:53davehuntAutomatedTester, jgraham: given people often pass the path to Firefox.app on macOS when specifying a binary, do you think it&#39;s a reasonable enhancement request to accept this for rust_mozrunner and just look for the predictable binary path within?
14:54whimboojgraham: the only problem for me now is that i&#39;m forced to have to switch to the other panel all the time when I want to star a result
14:55jgrahamwhimboo: Maybe we should add a pinboard icon so that non-sheriff users can still do something
14:55AutomatedTesterdavehunt: how often is often
14:55whimboojgraham: ++
14:56davehuntAutomatedTester: *shrugs* enough?
14:56jgrahamAnd make it available to sheriffs for bustage
14:56jgrahamwhimboo: Mind filing a bug?
14:56davehuntAutomatedTester: I guess the question is, would you accept a patch? :)
14:56AutomatedTesterdavehunt: we always accept patches :)
14:56whimboojgraham: any special component?
14:56AutomatedTesterdavehunt: with tests
14:56AutomatedTester:D
14:57davehuntAutomatedTester: there are no tests for this area, other than a manual one.. I&#39;ve already patched this area
14:57whimboodavehunt: we had this in the past for mozrunner too
14:57davehuntbut I&#39;ll direct jimboslice to the repo
14:57whimboobut then it got removed
14:57davehuntwhimboo: why was it removed?
14:57jgrahamwhimboo: treeherder:general or whatever it&#39;s called
14:57AutomatedTesterdavehunt: https://twitter.com/codestandards/status/835951177288146945
14:57AutomatedTester:P
14:58davehuntAutomatedTester: the &#39;test&#39; is here: https://github.com/jgraham/rust_mozrunner/blob/master/src/bin/firefox-default-path.rs
14:58davehuntthe package has no unit tests
14:58AutomatedTesterlets just blame jgraham then :P
14:58davehuntheh
14:58whimboodavehunt: i cannot fully remember that. but i thought code complexity
14:59davehuntwhimboo: k, I think it&#39;s simple enough, and handy
14:59davehuntanyway, he may not be interested in working on it
15:01whimboojgraham: its bug 1347946 now
15:01bugbotBug https://bugzilla.mozilla.org/show_bug.cgi?id=1347946 Treeherder, normal, nobody, NEW , Add pinboard icon to the new failure classification panel
15:02jgrahamdavehunt: I&#39;m happy to take a patch for that
15:03whimboodavehunt: when we had this implemented we were parsing the plist file
15:04davehuntwhimboo: I don&#39;t think we need that, just look in {path}/Contents/MacOS for firefox-bin
15:04whimboodavehunt: no
15:04whimbooor do we only support firefox?
15:04whimbooi don&#39;t think so for the future
15:04davehuntwhimboo: in mozrunner?
15:04jgrahamWe only support firefox at the moment
15:04davehuntthere&#39;s already firefox-bin searching in there
15:05jgrahamwhimboo: (note that this is the rust version not the python version)
15:05whimbooi nknow
15:05whimboodavehunt: https://github.com/mozilla/mozbase-deprecated/pull/9/files
15:09davehuntwhimboo: thanks
15:09whimboobut rust might not have such a package yet
15:12jgrahamhttps://crates.io/search?q=plist
15:12jgrahamDunno about quality ofc
15:34gbrownwlach: &quot;gbrown paranoia&quot; - ha!
15:36wlachgbrown: oh I share the paranoia :)
15:37gbrownsometimes mental issues come in handy around here
15:37gbrownthanks for the reviews
15:47wlachgbrown: btw, I was talking to jmaher|afk about this yesterday, but I may as well ask you too -- are there particular things you think you would want from job retriggers before you consider it viable? I am just getting it working with reftest now, then am going to look at xpcshell and android support
15:52gbrownwlach: if I can set an env var or a pref, that&#39;s a huge win there. I think the most common thing I would be interested in would be collecting NSPR logs in a mochitest. I would normally do that by setting MOZ_LOG in run_tests.py...can your retrigger get those logs, just with env vars?
15:53wlachgbrown: we should be able to
15:53gbrowngood enough for me then!
15:53wlachgbrown: could you give me the exact env variable and a test to try it with?
15:54wlachgrabbing some lunch biab
15:55gbrownwlach: I would normally update https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#104, with something like MOZ_LOG = &quot;nsThread:5,timestamp&quot;
15:55gbrownnot sure exactly how that translates into env vars, but probably straight forward
16:21jgrahamwhimboo: I don&#39;t think that will work because we install with --no-deps
16:22jgrahamDoesn&#39;t mean we shouldn&#39;t do it anyway, but this is blocking a wpt update, so I&#39;d prefer to land something that works sooner rather than later
16:22whimboojgraham: so when I would install wptserve from pypi in a new local venv, how would it going to work if the package has not dep specified for six?
16:23whimboohttps://github.com/w3c/wptserve/blob/master/setup.py#L4
16:23whimboothat would be the right place
16:23whimbooif six is not included as a package in wptserve
16:23jgrahamwhimboo: Yes, we should do that too
16:23jgrahamBut I don&#39;t think it&#39;s necessary or sufficient to fix this bug
16:24whimboodeps for packages we install via requirements.txt in config should not include those things
16:24whimboowhere do we install with no-deps?
16:26whimboomarionette_requirements.txt is not broken
16:26jgrahamwhimboo: mozharness calls pip with --no-deps
16:27jgraham/home/worker/workspace/build/venv/bin/pip install --no-deps --timeout 120 -r /home/worker/workspace/build/tests/config/marionette_requirements.txt --no-index --find-links http://pypi.pub.build.mozilla.org/pub --trusted-host pypi.pub.build.mozilla.org
16:29whimboojgraham: where is this from?
16:30whimboofound it now
16:35jgrahamwhimboo: I pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=e816d916f47f31e2ef999f13553dadffb67deb0c to verify that it doesn&#39;t work
16:36whimboojgraham: yes, just looking in mozharnes why we are using -no-deps here
16:37jgrahamwhimboo: I really really don&#39;t want to block my wpt update on changing mozharness. Happy to try that in a followup, but experience suggests that randomly changing mozharness is a way to pain and breakage
16:38whimbooproblem is here https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/python.py#479
16:38jgraham(frankly marionette_requirements.txt is already a giant hack; it hardcodes paths from the test package zip files, so it isn&#39;t reusable in any context other than CI, so I don&#39;t feel at all bad about adding something CI specific in there)
16:39whimboojgraham: its all installing local packages so far, so we don&#39;t force downloads from other package indexes
16:39mconleyjmaher: ping
16:40jgrahamwhimboo: True, but I can&#39;t see why that&#39;s a huge moral victory
16:41jmahermconley: hi!
16:41mconleyjmaher: hi! :)
16:42mconleyjmaher: kind of got a situation here in bug 1345904, don&#39;t we?
16:42bugbotBug https://bugzilla.mozilla.org/show_bug.cgi?id=1345904 General, normal, nobody, NEW , Frequent failures in windows 7 vm pgo tests with Automation Error: Received unexpected exception while running application
16:42jmahermconley: yes, that bug
16:42jmaherI know Tomcat|Afk and RyanVM have been working on that as well
16:42* mconley nods
16:42jmahermconley: it is sad that we have so many intermittents in the last 1.5 weeks
16:42* RyanVM ducks
16:42RyanVMjmaher: btw, you&#39;re also getting next to no win7 talos thanks to it
16:42mconleyI&#39;m having trouble extracting actionable information from the logs in that bug, and in the builds that OF links to
16:42jmaherRyanVM: I noticed
16:43jgrahamwhimboo: (and removing --no-deps will force downloads from other places, which might be why it&#39;s there in the first place, to avoid downloading deps that should come from the local package)
16:43jmahermconley: I have a quick meeting I need to hop into, can you ping me in 15 minutes or so- I am eager to help out on this as it affects a lot
16:43mconleyso I&#39;m wondering if we could pool our information here. I mean, worst case scenario, we just back the guilty patch out of Aurora, but this&#39;ll bite us again in a few weeks
16:43mconleyjmaher: sure thing
16:43whimboojgraham: well, and in this case we break out of this requirement
16:44jmahermconley: that is a concern- we need to debug this...wlach...can your retrigger with options help here maybe?
16:44whimboojgraham: i would suggest that you better ask a build peer for review. I&#39;m not sure about the requirements here
16:45wlachmconley: jmaher: yeah I could even show mconley how to do it :)
16:45mconleyI like learning
16:45RyanVMwlach: also annoyingly, we&#39;re not getting screenshots on these failures for some reason
16:45wlachI&#39;ll come over
16:45wlachmconley: is now a good time?
16:45RyanVMi half-wonder if there&#39;s a system dialog up or something
16:45jgrahamwhimboo: This isn&#39;t part of the build system really
16:46mconleywlach: sure
16:46jgrahamI would ask ahal for review but he&#39;s away
16:54wlachRyanVM: mconley showed me an error about mozscreenshot not being installed, that might have something to do with it
16:54wlachunfortunately my retriggering stuff won&#39;t help here as this problem is happening in a buildbot environment :(
16:55RyanVMwlach: I&#39;m wondering if we&#39;ll need someone from releng to help remote into a box during a run
16:55RyanVMwe can certainly trigger more jobs on aurora to help facilitate that
16:56wlachRyanVM: yeah I recommended that he talk to rail (who is also in our office)
16:56RyanVMseems like having a better understanding of WTF is actually going on here would be a good start :)
16:56wlachyep
16:56mconley&quot;mozprofile.addons WARNING | Could not install c:\slave\test\build\tests\mochitest\extensions\mozscreenshots: [Errno 2] No such file or directory: &#39;c:\\slave\\test\\build\\tests\\mochitest\\extensions\\mozscreenshots\\install.rdf&#39;&quot; usually shows up a few lines before the automation error
16:56RyanVMso nice having an office!
16:56wlachthese problems will be a lot easier to debug in a taskcluster world
16:56RyanVMwith real people in it and everything!
16:56wlachof course that doesn&#39;t help right now
16:56mconleyis that a common thing? The mozscreenshots complaint?
16:57wlachhmm, I wonder if that could even be related
16:57RyanVMmight be informative to look at a green run
16:57mconleytrue
16:58mconleyyeah, I see it in green runs too
16:58mconleyred herring, I think.
16:58wlachyeah, I&#39;m seeing it everywhere
16:58RyanVMso....is that broken in general?
16:58wlachlinux and windows, tc and not
16:58* RyanVM goes looking for orange runs
16:59RyanVMno, there&#39;s a screenshot on a win8 bc5 orange
17:03wlachRyanVM: do you still see that log message?
17:04RyanVMyeah
17:04RyanVMi actually wonder if that&#39;s related to the actual mochitest-screenshot job
17:05RyanVMthe thing MattN wrote once upon a time
17:05mconleyjmaher: ~15 minute ping
17:06jmahermconley: good timing, meeting done!
17:07RyanVMwlach: yeah, 99% sure that&#39;s what it&#39;s for
17:07RyanVMnot sure if that gets packaged outside trunk or not
17:08mconleyjmaher: hey - so, because this is buildbot and not taskcluster, the &quot;help options&quot; stuff that wlach has been working on isn&#39;t really applicable
17:08jmahermconley: got it
17:09jmahermconley: ok, so we have to do this old fashioned
17:09jmaherRyanVM: do you know if we can reproduce this with opt builds on try using aurora code/configs?
17:09mconleyjmaher: what&#39;s old fashioned?
17:09mconleyas in
17:09jmahermconley: just old school debugging :)
17:09RyanVMjmaher: yes
17:09mconleyget a loaner machine?
17:09RyanVMjmaher: it showed up on try simulations before the uplift
17:09RyanVMit&#39;s how we narrowed it down to mconley in the first place
17:10jmaheryes, we should get a loaner machine, or two
17:11jmaherRyanVM: does this show up on the VM stuff as well, or only hardware
17:11RyanVMboth
17:11RyanVMmostly VM by nature of which suites run where these days
17:11jmaherRyanVM: so if we can prove this on opt, that reduces the complexity
17:12mconleyI&#39;ve got go ahead to back this out on Aurora
17:12RyanVMare you distinguishing between opt and pgo here? If so, I&#39;m not sure if anybody&#39;s pushed aurora to Try to run regular non-pgo builds to see if it reproduces or not
17:12RyanVMmconley: I think that makes sense for now - certainly no reason it can&#39;t be considered for uplift later again once this is sorted
17:12RyanVMbut at least it stops the bleeding for now
17:12mconleyRyanVM: agreed. Care to do the honours?
17:12RyanVMcan do
17:12mconleyor is that somebody elses job now?
17:16RyanVMpushed
17:16RyanVMwe can still certainly retrigger jobs on the older revs as needed if you want to try catching one in the act
17:16wlachgbrown: experimenting with that MOZ_LOG setting now
17:16RyanVMotherwise, rev 77fdb54e3df6 has a build in flight that hasn&#39;t started running tests yet
17:22mconleyjmaher: so, my first question is:
17:22mconleyhow come this problem isn&#39;t occurring on mozilla-central / Nightly?
17:22mconleyWhat&#39;s different about Aurora?
17:23jmahermconley: that is a great question- I assume it is the configs we use; although I am not sure exactly- RyanVM for Aurora simulations is there a patch to apply to get the configs to match up?
17:24RyanVMthe main difference is that the NIGHTLY_BUILD ifdef goes away
17:25RyanVMkeying off the version number
17:25RyanVMthere&#39;s some other differences in config too, but that&#39;s general what ends up being the cause more often than not
17:25jmaherthat isn&#39;t a lot here: https://dxr.mozilla.org/mozilla-central/search?q=NIGHTLY_BUILD&redirect=true
17:25RyanVM(likewise, RELEASE_OR_BETA once things go from aurora -> beta)
17:25jmaherbut 127 results, I suspect a handful of them might lead to problems
17:26RyanVMjmaher: can&#39;t ignore that a decent number of those are affecting prefs too
17:26RyanVMbut yes, this is why uplift simulations are done
17:27jmaherok, so that is the place to look most likely
17:28mconleywell, I guess it&#39;s a place to start
17:28marcojgraham: do you want to test my script?
17:28jmahermconley: so taking that data point- what other questions do we have to sort out
17:29mconleyWhat&#39;d be ideal is to understand what state Firefox is in at the time of the problem
17:29mconleyright now, it&#39;s not clear. We get no screenshot, no stack, nothing
17:29mconleyis it hung? My guess is &quot;probably&quot;.
17:29RyanVMmconley: https://hg.mozilla.org/releases/mozilla-aurora/rev/dca7b42e6c67 gives a general sense of what changes happen when trunk goes to aurora. The uplift simulation patches basically mimic that for Try pushing
17:29mconleyCan we connect windbg or VS to it and dump the stacks of each thread?
17:29jmahermconley: maybe being on a loaner and watching it happen
17:30RyanVMhttps://hg.mozilla.org/releases/mozilla-aurora/diff/dca7b42e6c67/browser/config/mozconfigs/win32/nightly
17:30mconleyRyanVM: I see
17:30RyanVMmconley: --enable-profiling
17:30RyanVMhow relevant might that end up being?
17:30mconleyOh, it might be very relevant
17:30mconleyRyanVM: that&#39;s a good thought right there.
17:30mconley--enable-profiling isn&#39;t set on DevEdition?
17:30RyanVMno
17:31RyanVMjmaher: (and this could be one of the rare times when NIGHTLY_BUILD isn&#39;t at fault :P)
17:31jgrahammarco: Sure
17:31jmaherRyanVM: true!
17:32mconleywelll well well
17:32mconleythat&#39;s a theory we should definitely test.
17:33RyanVMmconley: want me to do the honors?
17:33mconleyRyanVM: if quite convenient
17:33mconleyif not, I can do it
17:33mconleythat&#39;ll certainly tell us something if it fixes the issue
17:33marcojgraham: after &quot;git clone https://github.com/marco-c/grcov && cd grcov && cargo install&quot;, you can run this python script: https://pastebin.mozilla.org/8982235. The first argument is the path to your source directory, the second argument is the branch and the third argument the commit hash. It will create a &quot;report&quot; directory containing the HTML report.
17:37RyanVMmconley: showtime - https://treeherder.mozilla.org/#/jobs?repo=try&author=ryanvm@gmail.com&fromchange=ab2612e5b3fa935099d14488bd60d43c79b8cadf&group_state=expanded&tochange=675aa1504fa7461d08a60dc569321a1cf5cb7786
17:38RyanVMfirst push is the last rev on aurora before the backout just to confirm no issues reproducing the original problem
17:38RyanVMsecond has the --enable-profiling change in the try commit
17:38mconleyI eagerly await. :)
17:43jgrahammarco: Does my source directory have to be at the same rev as the push?
17:44marcojgraham: yes, otherwise some lines might have changed in the meantime. It will not fail if it isn&#39;t at the same rev though
17:45marcoso, the answer is that it doesn&#39;t have to be
17:45jgrahammarco: OK :)
17:45jgrahamIsn&#39;t proc_macro stable now?
17:54marcojgraham: it wasn&#39;t a couple of weeks ago
17:55jgrahamSince early feburary I think
17:55jgrahamAnyway it compiles for me on stable with it removed
17:55marcooh, great
17:56marcoperhaps it was my stable version of rust which was out of date
18:02RyanVMmconley: for kicks, I&#39;m running another Try push with the version number set back to 54.0a1 (which would in turn cause NIGHTLY_BUILD to be set again)
18:02mconleyokie dokie
18:16marcojgraham: let me know if it works
18:33jgrahammarco: OK, I&#39;ve hacked it a little to just get wpt tests and not force me to cargo install the binary
18:33jgrahamBut we&#39;ll see how it goes
18:33marcojgraham: ok
18:33marcojgraham: what did you do regarding the second part?
18:34marcojust copied your compiled grcov somewhere?
18:34jgrahamWell just hardcoded the path that I put it at. But I would add a --binary option if I could make a PR :)
18:35wlachgbrown: holy moley those options lead to a large log https://treeherder.allizom.org/#/jobs?repo=try&revision=e774e23f581dde726c435df94ec835d53622b370&selectedJob=78723272
18:35marcojgraham: okok
18:36marcojgraham: I&#39;m not sure what&#39;s the best way to distribute grcov; e.g. if I wanted to add this script to the tree
18:38jgrahamYeah, this part is more gecko specific I think. So adding it to the mozilla tree might make some sense
18:40gbrownwlach: interesting. if I use MOZ_LOG in runtests.py, it puts all the extra logging in a separate artifact, mozLogs.zip
18:40marcojgraham: the problem is, how do I make people using the script install grcov?
18:40marcojgraham: one way would be to download the binary from github releases
18:40gbrownwlach: but you have produced the right stuff, and it is kind of handy seeing it in the main test log.
18:41wlachgbrown: it might be mozharness which does that
18:41gbrownwlach: yeah, I&#39;m not an expert on the implementation
18:42wlachwell this sounds good all around
18:45jgrahammarco: Just check if it&#39;s there and if not tell them to install it?
18:46marcojgraham: good idea, let them decide how to install it
18:50jgrahammarco: Seems to be working so far
18:55jgrahammarco: I have something!
18:56marcojgraham: \o/
18:57jgrahammarco: Yeah, pretty impressed. It was rather fast and comparatively easy
18:58marcoglad it worked; I&#39;ll work on adding it to the tree if possible
19:00RyanVMmconley: some more fuel for the fire - debug builds leave --enable-profiling set
19:00mconleyRyanVM: hey - dumb question, but in those try builds you pushed... didn&#39;t we need pgo?
19:01RyanVMmconley: we never determined that one way or another
19:01RyanVMbut given the way the vanilla Try push is going, I&#39;m beginning to wonder...
19:01RyanVMtests have started on the backout on Aurora too
19:04RyanVMmconley: let&#39;s give it another 10-15min to let more e10s results filter in and if it looks like it isn&#39;t reproducing on the vanilla run, I&#39;ll do another round w/ PGO
19:04RyanVMthat&#39;ll be another data point at least
19:06RyanVMmconley: bah, I&#39;ve seen enough.
19:07RyanVMthe sadness continues
19:10mconleyBoooo. :(
19:11mconleywelp
19:12mconleyRyanVM: wait
19:12RyanVMthis time with 100% more PGO - https://treeherder.mozilla.org/#/jobs?repo=try&author=ryanvm@gmail.com&group_state=expanded&fromchange=c77cc8a2be5f5f59cd83826c423092fc7492e869&tochange=ea2004608caa8ad7763ba7f73c21d4a40c35b84c
19:12mconleyRyanVM: where is the sadness continuing? I don&#39;t see anything except for bug 1291926
19:12bugbotBug https://bugzilla.mozilla.org/show_bug.cgi?id=1291926 Mercurial: hg.mozilla.org, major, gps, RESOLVED FIXED, Intermittent Windows builds failing with abort: stream ended unexpectedly (got 131914 bytes, expected 1630418948)
19:12RyanVMmconley: that&#39;s the point
19:12RyanVMthe vanilla run was pre-backout and should have reproduced
19:12mconleyoh, that&#39;s not sadness
19:12mconleyoh
19:13mconleyohhhh
19:13mconleyso that&#39;s some sadness, yeah
19:13RyanVMheh
19:13RyanVMmconley: god have mercy on your soul if it&#39;s a combo of PGO and --enable-profiling
19:14mconleyimagine
19:14RyanVMon the bright side, nothing&#39;s burning on the win7 e10s tests on the aurora backout push!
19:14mconley\o/
19:14RyanVMyeah, things are looking very good there
19:26RyanVMmconley: oh hey, M-e10s(mda) did hit it on the vanilla run
19:26RyanVMbut so did M-e10s(3) on the --enable-profiling one
19:26mconleyRyanVM: &quot;vanilla&quot; run meaning without PGO? Or without the patch?
19:27RyanVMboth
19:27mconleyRyanVM: so ... the patch is _not_ responsible?
19:28RyanVMsorry, I misunderstood you
19:28RyanVM&quot;the patch&quot; here being --enable-profiling
19:28mconleyohhh, okay
19:28RyanVMno, your patch still looks culpable given the results on Aurora post-backout
19:28mconleyRyanVM: so boil it down for me - what&#39;d we just learn? That --enable-profiling does _not_ help?
19:28mconleyand that PGO is _not_ required?
19:28RyanVM--enable-profiling so far doesn&#39;t seem to have made a different
19:29mconleyokay.
19:29RyanVMstill waiting on results for the version number change to see what NIGHTLY_BUILD does
19:29RyanVMand there&#39;s a new set of PGO builds going now
19:29RyanVMso this could be a low-frequency issue exacerbated by PGO or something, but that awaits to be seen
21:47RyanVM|biabmconley: OK, the PGO results were a bit more illuminating
21:49RyanVMmconley: so the Try pushes say that PGO makes things much more likely to happen and that NIGHTLY_BUILD is where the issue appears to lie
21:49bholleydustinm`: yt?
21:52bholleykmoir: ping
21:52RyanVMbholley: I think you want #releng
21:52kmoirbholley: hi
21:52bholleykmoir: hey
21:52RyanVMmconley: no differences with or without --enable-profiling, but solid green with the version number set to 54.0a1
21:53bholleykmoir: so, we&#39;re currently only testing the sequential stylo traversal, and I&#39;d like to test both sequential and parallel, without spending more money
21:53bholleykmoir: to that end, I was thinking we should