w3 :: #testing

17 Apr 2017
15:43bobholtjgraham: i've done some more sleuthing about my sauce connect issues
15:43bobholti originally thought the connect tunnel was dying randomly
15:43bobholti've since discovered that IF it dies, it ALWAYS dies while the manifest is being generated
15:44bobholti also just found running `python manifest` locally, it uses 100% of my CPU
15:44bobholtso my hypothesis is that `python manifest` is causing travis to kill the sauce connect process (due to cpu usage or some other)
15:44bobholt`nice python manifest` doesn't alleviate the CPU issue
15:44bobholtso my question to you is: any ideas? :)
15:46jgrahambobholt: Exciting. We *want* that process to use 100% CPU
15:46jgrahamIn the sense that if it isn't there's clearly performance being left on the table
15:49jgrahamI wonder if it's actually an OOM issue; I can't see wy it would die due to CPU usage. Unless there's some keepalive signal it needs that gets delayed
15:50bobholtthe sudo-required VM is supposed to have 7.5GB of memory
15:50bobholtbut the cores is listed as "~2, bursted"
15:51bobholton my machine, the manifest build only uses 5-8% of my available memory
15:51bobholtwhich is 16GB to be fair
15:54jgrahamSeems to peak at about 1.5Gb
15:55jgrahamSo probably not OOM
15:58jgrahamHave you tried to reproduce locally? I guess it would be difficult but maybe with a travis-like VM or something. Although stack overflow suggests that they don't exist, but it might be possible to ask for ssh access to a debug instancce
15:59jgrahamWell, it was in 2013
16:06annevkFor worker tests, is the done() call always required?
16:07annevkIs it not required for .any.js tests somehow?
16:07annevkI don't really understand the standalone done() call
16:08jugglinmike1annevk: It looks that way
16:08jugglinmike1Dedicated and shared workers don't have an event that corresponds to theloadevent in a document. Therefore these worker tests always behave as if theexplicit_doneproperty is set to true.
16:09annevkHmm, so what about any.js? Does it just add that?
16:09jugglinmike1No idea. Probably not, if the recent patch to `master` is any indication
16:09bobholt@jgraham: i've tried a simple test of opening my own tunnel and running `python mainfest` at the same time, but it's fine here
16:10bobholttravis has a docker, but only for their "non-sudo" machine image
16:10annevkjugglinmike1: thanks, that seems problematic, and judging by PRs coming by it's a bit of a footgun
16:10bobholti can try that, but may run into other issues
16:10annevkI guess I'll look into it at some point if nobody else beats me to it
16:11bobholti was going to consider trying browserstack, but figure they may have the same issue with process death - but it may also be worth a try
16:14bobholti'll see if eating lunch gives me any epiphanies
16:15jgrahambobholt: I presume the issue is on the travis side
16:15jgrahamRather than on the sauce side
16:16bobholtseems that way
16:16* jgraham wonders about the merits of circleci :)
16:16jgrahamannevk: Hmm I didn't think about that closely but it does seem like a footgun. I'm not sure you can fix it by making done() not required for worker tests though
16:16bobholti've been wondering all weekend how to reduce our reliance on outside dependencies
16:16jgrahamIt's very non-obvious to me how that would work
16:17* jgraham is supposed to be packing, but probably back in a bit
16:17annevkjgraham: yeah, dunno, I don't know enough about the current setup to really help at this point I'm afraid
16:18annevkjgraham: post-Easter vacation time?
16:42gsneddersjgraham: no, we want `manifest` to be I/O bound, because if we're CPU-bound there's performance left on the table
16:43gsneddersalso memory usage of it is... crazy.
17:19jgrahamannevk: Easter is four days here
17:21jgrahamgsnedders: Well yes but since it *is* limited by the CPU required to parse all the html, we want it to use as much CPU as possible
17:22jgrahamIs it even IO bound if we use a C/Rust/etc. parser?
17:23gsneddersfor the hashing case, where there's no change, it absolutely should be I/O limited
17:23gsneddersand really we should do some local caching of mtimes so we don't even need to read the file
17:23jgrahamOh sure
17:23jgrahamBut that isn't the case on Travis
17:24gsneddersfor the HTML case, it'll depend on I/O speeds
17:24jgrahamMaybe making the manifest a cached artifact on travis would help
17:24gsneddersI have no idea what I/O performance on Travis is like
17:25jgrahamIn any case it must be better than regenerating it from scratch
17:25jgrahambobholt: You could try that perhaps. Not really a solution, but it could help
17:26jugglinmikeI was about to suggest the same
17:28jgrahamjugglinmike: You will be pleased to know i have exactly the same problem you found with screenshots in chromedriver in my faster reftest implementation in marionette/gecko
17:28jgrahamI think I just need to wait for mozafterpaint to fire
17:29jgrahambut interestingly the solutioon recommended for content that doesn't have access to mozAfterPaint is requestAnimationFrame
17:32jugglinmikejgraham: I'm no sadist; that sounds frustrating
17:33jugglinmikejgraham: Is there a chance we need to account for this in the WebDriver specification?
17:34jgrahamjugglinmike: Well this is a lower layer than WebDriver
17:34jgrahamI'm adding a specific "run reftest" endpoint
17:35jgrahamIn marionette
17:35jgrahamBut I tend to agree that WebDriver should mention waiting for a paint
17:35jugglinmikeyeah, alrighty
17:35jugglinmikeI'll see about making a issue or patch to discuss
17:36gsneddershttps://github.com/w3c/webdriver/issues/893 is related there
17:39jgrahamjugglinmike: Just wrote an issue
17:41annevkjgraham: here too, so you're going back home I guess?
17:41jgrahamgsnedders: (that comment wasn't a correction for you, just a clarification for the people reading on the WD side who aren't as familiar with the intricacies of document loading)
17:42jgrahamannevk: Yeah, we've been visiting my parents
17:43* gsnedders doesn't want to imagine how bad traffic on your way back home will be
17:44jugglinmikeGot it, thanks jgraham
17:48jugglinmikejgraham: gsnedders Do you have time for https://github.com/w3c/web-platform-tests/pull/5591 and https://github.com/w3c/web-platform-tests/pull/5554 today?
17:54gsneddersjugglinmike: 5591 in general I'm just in favour of landing updates to once they've been reviewed upstream
17:56jugglinmikeMakes sense. Mind pushing the button for me?
17:57jugglinmikeI can then update https://github.com/w3c/web-platform-tests/pull/5536 , which I believe we wanted to incorporate into https://github.com/w3c/web-platform-tests/pull/5554
19:09jgrahamjugglinmike: (FWIW rAF seems to have trivially solved my issue)
19:09jgraham(had a more complex solution in mind but it's hard to justify if this works :)
19:10jugglinmikeI'm going to play the stickler here (alright, I'm always the stickler) and point out that "seems to have solved" is not the same as "solved"
19:12jugglinmikebut I hope I'm not coming across as grumpy. I'm sure this is going to let you get higher-priority work done
19:12jugglinmikeI'm just worried that we're papering over something important, and missing an opportunity to solve this for web developers generally
19:13jugglinmikeI suppose your issue against the WebDriver specification is a good step in that direction
19:13jugglinmikeAlthough technically, if that language makes it into the specification, we could then remove the rAF call from wptrunner
19:14jugglinmikebut I guess that would be mean
19:17jgrahamjugglinmike: Well *in this case* I am solving it at the lower layer. Like I would expect a solution for general screenshots in geckodriver/marionette to either a) wait for a MozAfterPaint event (my original plan) or b) use rAF (equivalent per documentation)
19:18jgrahamIn fact per ato we might already do b)
18 Apr 2017
No messages
Last message: 128 days and 11 hours ago