mozilla :: #ateam

8 Sep 2017
00:36bkellybc: AWFY seems to be going a bit crazy? https://arewefastyet.com/#machine=35&view=single&suite=speedometer-misc&subtest=score
00:36bkelly50% regression across chrome and firefox
00:37bkellymaybe a speedometer update?
01:28RyanVMbkelly: yes, they switched to geometric mean for calculating scores
01:28RyanVMhttps://bugs.webkit.org/show_bug.cgi?id=172968
01:28bugbotBug 172968: MailNews: Message Display, critical, sspitzer, VERIFIED WORKSFORME, Mozilla crashes when I attempt to open the Mail Client.
04:04bcbkelly: thanks, RyanVM: thanks.
11:40whimbooAutomatedTester: I totally 3> this summary: Selenium tests on macOs hugs during startup
12:23whimboojgraham: so when I use info!() for logging the command in mozrunner, nothing gets added to the log
12:23jgrahamwhimboo: You are sure you're using the right version of the library?
12:24whimboojgraham: yes, so I put println lines around
12:24whimbooand get
12:24whimboohttps://irccloud.mozilla.com/pastebin/3k7krBpK/
12:25whimbooso not sure where the output ends-up
12:27whimbooi will see if I can find out
12:35whimboojgraham: maybe its because mozrunner doesnt use slog?
12:37jgrahamwhimboo: In theory they work together
12:37jgrahamIt could be broken
12:37jgrahamato might know more
12:39atoWe filter out logs from everything but geckodriver and webdriver: https://searchfox.org/mozilla-central/source/testing/geckodriver/src/logging.rs#133
12:39whimboooh!
12:39whimbooso we should add mozrunner
12:41whimbooso yeah, that makes it work
12:41whimbooato: thanks!
12:54whimboojgraham: for your pleasure: https://github.com/jgraham/rust_mozrunner/pull/13
12:55jgrahamwhimboo: s/with args//
12:56* jgraham wishes that cinnabar would get bundleclone support
12:57whimboojgraham: sure. and updated
12:57jgrahamwhimboo: Merged
12:57whimboothanks
12:58whimboowith the other changes to exit status i would need a new release
12:58whimboolets see if I can make it for 0.1
12:58whimboo0.19
13:10glandiumjgraham: it has it
13:11jgrahamglandium: Oh. So is it just much much slower than a hg clone becacuse it has to generate the metadata every time?
13:12glandiumyes
13:12glandiumand other reasons
13:12jgrahamI see :/
13:12glandiumit's actually slower with bundleclone than without
13:13glandiumI hope to fix that for 0.5
13:13jgrahamOh. I guess it's not possible to cache the metadata, up/download from (somewhere) and only generate what's missing? As I say it I realise that sounds like a lot of work
13:14glandiumjgraham: that's actually kinda possible with current master, and only missing a small thing to work for a grafted gecko-dev
13:17glandiumthat said, with fast network, a bundleclone should be feasible in less than 20 minutes.
13:18glandiumon linux, at least ; mac is... special
13:18glandiumand windows has its own set of problems
13:36jgrahamglandium: Oh, OK. Well 20 minutes sounds a lot better than the >1 hour I see at the moment, so I'm looking forward to it :)
13:36bcmcote: Could you mark the autophone1 pulse queue as unbounded? autophone2,3 are already unbounded. We lost the setting when we had to delete it several months ago.
13:36mcoteyup will do
13:36bcThanks
13:43mcotebc: done
13:43bcyay. thanks.
13:57davehuntekyle: are you around to help me to formulate a query?
13:58ekyledavehunt: yes, i am
14:00davehuntekyle: great.. I want a useful durations chart.. I'm thinking that I could total up the average duration for all tests per day
14:01ekyledavehunt: yes, median, or 70percentile, or 90th percentile would be nicer
14:01davehuntthat would prevent the chart being thrown out by variations in the number of executions for each test
14:01davehuntyeah, I was thinking 90th percentile
14:02davehuntso I need to calculate the percentile for each test, accumulate that across all tests, and group by day
14:02davehuntcan that be done in a single query?
14:02ekyledavehunt: yes
14:02ekyledavehunt: give me a some time to phrase the query...
14:03davehuntekyle: of course, thank you!
14:15davehuntStandard8: temporary upgrade?
14:15* davehunt wonders what new features Standard9 has
14:18ekyledavehunt: an example: https://activedata.allizom.org/tools/query.html#query_id=ibPtQz2E
14:20ekyledavehunt: be sure to limit the number of tests you request in a batch; my example is just one test, you can probably pull a thousand tests at a time.
14:20davehuntekyle: okay, so I could take that and then group by date and aggregate the sum of 70percentile?
14:21davehuntoh, wait, so that's a single test
14:21ekyledavehunt: that query gives you the per-day 70th percentile test duration
14:22ekyledavehunt: yes, change the where clause to include more tests
14:22davehuntright, I'd like that aggregated over all tests executed on that day
14:22davehuntokay.. when you say about batching, is there no way to get all results in the range?
14:23ekyledavehunt: I am not understanding your questions
14:23ekylein what "range"?
14:23davehuntekyle: I mean the expressions, in particular the date range
14:24ekylethe example shows all tests that match the where clause for every day over the past month
14:25davehuntekyle: in my case I'd want all tests within that period
14:25davehuntalthough I'd likely limit by some other factors such as branch
14:26davehuntI'd like my chart to give a number for each day, which would indicate an increase or decrease in the duration of the tests.. there could be a better way than what I've suggested
14:27ekyledavehunt: oh, you want a number that represents all tests across all suites?
14:27davehuntekyle: yes, but I was thinking that I'd want to avoid (or even out) multiple executions of the tests, hence the percentile
14:28davehuntso the sum of each test's average duration, per day
14:29ekyledavehunt: you will probably want to take the geometric mean of (the average duration of (all the test runs in a day))
14:30ekyledavehunt: probably better to track the task time
14:30davehuntekyle: I was just thinking that could be skewed if some suites are run more frequently than others
14:31davehuntmaybe not, perhaps I can take a look at the data
14:31ekyledavehunt: yes, there may be problems with missing tests for such a large statistic
14:33ekyledavehunt: well, if you want to do something like it; then I must make a unittest-summary table so you can get better response time, and perform aggregates on aggregates.
14:34davehuntekyle: that would be great!
14:34davehuntekyle: maybe I'll do what you've suggested for the demo next week
14:35ekyledavehunt: I will keep you up to date on the creation (and filling) of the unittest-summary table.
14:35davehuntekyle: perhaps a summary for fx-test would be good, too?
14:36ekyledavehunt: maybe: are the queries on fx-test slow?
14:36davehuntekyle: no, they're much faster
14:36ekyledavehunt: oh, maybe you want it for the compound aggregates
14:37davehuntyeah, the two are treated almost identically in the tool
14:37davehuntin fact, the fx-tests are run by pytest, and pytest can run unittest tests, so I wonder if they need to be separate schemas
14:38RyanVMdylan: getting out of dodge?
14:39ekyledavehunt: they can probably be merged, but fx-tests are very slim records compared to unittests, they seem very different
14:43dylanRyanVM: currently no, but we're registered at a shelter.
14:43dylanwe're evac zone D, nearly the last ones to leave.
14:43RyanVMwell, hang in there - looks like you've got a hell of a weekend coming your way
14:59wlachdylan: yeah, take care
15:02davehuntekyle: https://screenshots.firefox.com/IF1Z3jNtFgaaf61k/localhost
15:03ekyledavehunt: cool! :)
15:23jgrahamAutomatedTester: https://pastebin.mozilla.org/9031864
15:42armenzg_mtg"Reports are queued for processing... Please review report with caution, it may change. "
16:06ekyleanyone know how this time machine works?
16:06ekylehttps://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-win64-pgo/autoland_win8_64_test_pgo-mochitest-chrome-1-bm110-tests1-windows-build313.txt.gz
16:06ekylehttps://irccloud.mozilla.com/pastebin/6Y5b5wp3/
16:07ekyleIf we download buildprops.json enough times, maybe we can make a dent in build times :)
17:02AutomatedTesterjgraham: what is that pastebin?
17:07jgrahamAutomatedTester: Changes to test files in different locations since the start of the year
17:07jgrahamin m-c
17:07AutomatedTesterahh cool
17:08jgrahamPretty depressing really :p
17:08jgrahamBut it's a starting point to figure out where we are adding mochitests for things that have or could have wpt, and whether it's a technical issue or a cultural one
17:13AutomatedTesterjgraham: definitely
17:13AutomatedTesterjgraham: the only way is up right
17:13AutomatedTesterso dont be depressed
17:15AutomatedTesterwhimboo: just looked at the prioritised bugs for marionette... we definitely have enough bugs in there :D
20:03RyanVMrwood: do our tp6 runs use pageloader too?
20:03RyanVMbug 1372942 appears to have improved tp6 quite a bit on the youtube subtests
20:03bugbotBug https://bugzilla.mozilla.org/show_bug.cgi?id=1372942 Talos, normal, rwood, ASSIGNED , tp5: Wait for the idle-callback before moving to the next page.
20:08rwoodRyanVM: yes they do, but that's unexpected as that patch shouldn't change the measurements, only changes when the next iteration can start
20:08RyanVMhttps://treeherder.mozilla.org/perf.html#/alerts?hideDwnToInv=1
20:08RyanVM20-30%
20:13rwoodinteresting... it wouldn't be the backout that caused the improvement would it be? I see it listed in the top improvement jobs: https://treeherder.mozilla.org/index.html#/jobs?repo=autoland&fromchange=9815926c3bc14b22941415a5e036d1be2bc87fdf&tochange=a8d468af2cd355a1b01e566dcfc2ca5d70363dc0
20:16RyanVMi don't see any corresponding regression from when it landed, so I don't think it was the backout
9 Sep 2017
No messages
   
Last message: 12 days and 7 hours ago