mozilla :: #gfx

17 Mar 2017
14:45kvarknical: did you read the Synchronization section?
14:49nicalI did, my understanding is that it gives you mechanism for synchronization but does not enforce it. In this case there's the followup questions to answer (scope of UB, etc)
14:53nicalgoogle's proposal makes buffer access thread safe, unless we can prove that the cost of making things safe is prohibitive, not doing it is a hard sell.
14:54nicaland also get confirmation from vendors that unsynchronized access can't cause crashes or vulns
15:13kvarknical: call to the gfx room?
15:19Basjrmuizel: AHAH! So this was a test worth doing.
15:20Basjrmuizel: Even when I draw the quad for the video at 500x500 pixels, it's still slow.
15:20BasBut again, when I size down the window, that 500x500 quad looks smooth.
15:27jrmuizelBas: interesting
15:27jrmuizelBas: that's a pretty valuable finding
15:27Basjrmuizel: I'm going to experiment with changing our swap model not to use a full blit but rather a flipped buffer chain.
15:28BasIf nothing else that should save bandwidth when we're changing the whole screen anyway.
15:47Basjrmuizel: It's not perfect but there's definitely some improvement using a chain with flips over the bitblt.
15:48jrmuizelBas: that makes some sense
15:49Basjrmuizel: So this all seems to confirm the theory that this machine just doesn't have the bandwidth to take the 2-3x fill of 4K BGRA you get with the DWM on.
15:49jrmuizelBas: right
15:50Basjrmuizel: (Which for a Lenovo P50 is sort of disappointing, really)
15:50BasSince its built in screen is 4K.
15:51jrmuizelBas: have you tried doing arithmetic to estimate how many fills of 4K BGRA you should be getting?
15:53Basjrmuizel: Well, so NV12->Backbuf, 12MB read, 33MB write, Backbuf->DWMBuf, 33MB read/write, DWMBuf->DWMBackBuf, 33MB read/write, 100MB write, 78MB read, 178MB memory traffic per frame, is that what you mean?
15:54BasMakes for 11 GB/s@60fps, on a 16 GB/s bandwidth machine
15:54BasAdd the decoder 12MB/s read&write (for interframe encoding) I guess. So 202 MB/frame
15:55Bas12GB/s, we're getting there I suppose.
15:56jrmuizelyeah that's what I meant
15:58Basjrmuizel: I guess I can see why that multiplanar overlay business would help :)
15:58jrmuizelBas: yeah
15:58jrmuizelBas: do you have any idea how to use it?
15:59Basjrmuizel: A little bit, it's just used automatically if you create a video thingy in DirectComposite I think.
15:59jrmuizelBas: ah
15:59BasI think it's a driver side feature used internally by DC when it can.
15:59jrmuizelBas: do we have a plan for how we could use DirectComposite?
16:00Basjrmuizel: I would make it a separate compositor side layer manager, but last time I was asked about that, I said in light of WebRender it wouldn't be a worthwhile investment.
16:00jrmuizelBas: but WebRender won't help with this
16:01jrmuizelBas: but I think I see what you mean
16:01jrmuizelBas: with WebRender we'd need to do something differently
16:01Basjrmuizel: Indeed, but it will replace the current layer manager system, so unless we integrate directcomposite with webrender somehow..
16:03jrmuizelBas: even if we didn't ship it, I think it would probably be worth experimenting with DirectComposite in our current infrastructure
16:03jrmuizelBas: maybe even if we made hacks just to be able to get performance numbers
16:03Basjrmuizel: There's a lot of work involved. There isn't really an easy way to 'combine' it with our current compositor in our current architecture, you'd want to basically do a full host side layermanager for this.
16:04BasThat's fair enough.
16:04BasThe plumbing would be very difficult.
16:04BasThe DirectComposite structures aren't cross-process afaik.
16:04jrmuizelBas: I wonder than if we could get some performance numbers from a standalone program
16:05Basjrmuizel: I guess for video those performance numbers are more or less Edge?
16:06jrmuizelBas: yeah, but I think it would be valuable to reproduce the differences
16:07BasYeah it's tricky, it's not obvious how we'd do it, now that video decoding and the compositor are in the same process, it may be possible to hack something up.
16:09Basjrmuizel: Performance is so much better with FlipSequential though, particular for fullscreen, I'm probably going to make a patch this weekend for all of firefox to use it, always, fwiw. Should save us bandwidth and battery life everywhere.
16:10jrmuizelBas: sounds good
16:11Basjrmuizel: My idea is I'll just use a 2 buffer preserving buffer chain, and add the damaged rect from the previous frame to the area to be re-composited.
16:11BasMuch like we do with our internal double buffering solutions.
16:28katsjrmuizel: ping
16:30jrmuizelkats: pong
16:30katsjrmuizel: if i'm hitting is that interesting to you?
16:31katsthe comment above it makes it sound like i should debug it
16:31jrmuizelkats: more to milan
16:31jrmuizelkats: but it's probably worth figuring out what's going on
16:31katsjrmuizel: ok. do you know what the intent of the code is?
16:32katsugh such a big function
16:58pulsebotCheck-in: - 398 changesets - Merge m-c to graphics
16:58nicalkvark: So I checked and callaing await does not return from the JS event loop
16:59nicalit is just syntactic sugar to call resolve
16:59kvarkwell, calling it doesn't return, but the stuff that follows is only executed upon return
17:00kvarkand until then, it lives in a closure
17:00kvarkso you might as well treat it as a return to the event loop ;)
17:02nicalkvark: see
17:03kvarkI know, that's what I'm talking about
17:04kvarkthe code following `await` becomes a part of the "then()" closure
17:04kvarkso it's only going to be executed after the event loop returns
17:07nicalsorry I got mixed up, I'm a bit feverish because of the vaccine from yesterdau
17:07nicalI'll call it a day
18:22katsdigitarald: ping
18:27digitaraldkats: pong
18:27katsdigitarald: hey, so about these long frames. i think recording the max contiguous frame drops is probably the most useful
18:28digitaraldkats: seems like the most reasonable right now
18:28digitaraldkats: we can collect some more data from profiles to understand the distributions
18:28katsdigitarald: it's true that a single long jank will report a large number, but if you take the p95 of the telemetry data you should be able to filter that out
18:28digitaraldwe = me
18:30digitaraldkats: could we drop the FT data as markers for the profiler?
18:30digitaraldthat sounds unclear, I mean "drop" as in add markers
18:31katsdigitarald: should be straightforward enough to do, yeah. i can do it in a follow-up bug
18:40digitaraldI see having the compositor process marked up in the profiler for when it animates or scrolls will be really interesting for debugging smoothness
18:58katsdigitarald: do you want the max contiguous frame drop data also to be split by chrome process / content process / APZ?
18:58katsor just one that combines everything?
19:20katswoah JP's coming back
19:48jrmuizelkats: yeah
19:48katsjrmuizel: do you know if he'll back in the toronto office?
19:49jrmuizelkats: hopefully
19:49jrmuizelkats: but I don't haven't heard yet
19:49katsjrmuizel: you might have to give up a desk!
19:49jrmuizelkats: unlikely :)
20:02pulsebotCheck-in: - 111 changesets - Merge m-c to graphics
20:08digitaraldkats: long frame per dimension (chrome/content/apz): yes
20:13botondkats: i don't really follow the mean / standard deviation analogy
20:15digitaraldbotond: what are your concerns?
20:16digitaraldit is mostly about capturing the overall experience and distribution of frame times. mean is skewed by outliers, unlike the median
20:16digitaraldthe 95th percentile captures the long tail or now, the longest frame
20:18rhunt jrmuizel: ping
20:18jrmuizelrhunt: pong
20:19botonddigitarald: i was referring to the analogy made here:
20:19rhuntjrmuizel: currently a wr radial gradient supports different start and end center's and this isn't really needed for css gradients
20:20rhuntjrmuizel: and it would make the shader simpler to just assume the same center
20:20digitaraldbotond: its basically mean and max; max doesn't really map to sd
20:20botonddigitarald: my understanding is that a standard deviation is in the same unit space as a mean, but the quantities being measured have very different units
20:20rhuntjrmuizel: but svg gradients need different centers
20:21jrmuizelrhunt: I think it would be ok to switch to the simpler thing for now
20:21jrmuizelrhunt: we can always add the more complex version back
20:21rhuntjrmuizel: how about moving it into a ComplexRadialGradient or SVGRadialGradient?
20:22jrmuizelrhunt: that sounds good to me too
20:22rhuntjrmuizel: got a preference on the name?
20:23jrmuizelrhunt: Complex
20:23rhuntjrmuizel: sounds good, thanks
20:23jrmuizelrhunt: it conveys that it's more expensive
20:23botonddigitarald: max would also have the same units as mean
20:23digitaraldbotond: right
20:25botonddigitarald: but COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS and COMPOSITOR_ANIMATION_THROUGHPUT_* have different units. so they are not like max and mean, either
20:26katsbotond: it's not an exact analogy
20:26katsbotond: the intent is to capture the shape of the distribution
20:27katsbotond: and you can't do that with a one-axis measure
20:27katsyou need two axes
20:27katsbotond: i'll have to redo that last patch to be per chrome/content/apz
20:28digitaraldbotond: the unit is different because one is a *ratio* and the other is the contiguous dropped frame *count*
20:29botonddigitarald: right, exactly
20:29digitaraldbotond: I am open for better ideas but that was the best we could come up with that makes it vsync independent
20:30botonddigitarald: to be clear, i have no objection to what we're collecting. i just found the analogy strange
20:30digitaraldbotond: ok
20:32digitaraldkats: did you collect some data already for COMPOSITOR_ANIMATION_THROUGHPUT_* how do the shapes look like?
20:32katsdigitarald: the patches haven't landed yet
20:32digitaraldok, just thought maybe you tested them in a build to see the data being collected
20:32digitaraldI'll wait
20:33katswell i tested them to make sure they were generating sane data. but i haven't run with them for an extended period of time
20:34digitaraldok, just wanted check if the histogram's upper bound makes sense. I assume the center is at 1000 and it tapers off both sides with a skew towards 0
20:38rhuntjrmuizel: i think im going to write a patch standardizing '-' vs '_' in wrench
20:38rhuntjrmuizel: is there a problem with choosing hyphens?
20:42jrmuizelrhunt: nope
18 Mar 2017
No messages
Last message: 156 days and 17 hours ago