mozilla :: #jsapi

11 Aug 2017
01:29pbonesfink: awake? in GCRuntime::setParameter we assign directly to mode,
01:29pboneshould it be calling setMode() which does some extra stuff?
01:29pboneor is there a reason not to do that stuff.
08:32mrgigglesthe mozilla-inbound tree is now closed (assertion failures from bug 1379814)
08:32firebothttps://bugzil.la/1379814 ASSIGNED, lhansen@mozilla.com Prevent dining philosophers deadlock in HelperThreads in a principled way
08:38mrgigglesthe mozilla-inbound tree is now open
08:42nbplth: locking inbound to avoid deadlocks? :D
08:50lthnbp, i guess... :(
09:42bbouviernbp, ping -- just to clarify your message on js dev internals re: tracelogger: you're not saying it shouldn't be removed, right?
09:44nbpbbouvier: I am saying that if we can we should consider migrating the scope into perf-html labels.
09:45nbpbbouvier: and try to get the same information from perf-html profiles.
09:46nbpI admit that it is cool to be able to script the analysis of tracelogger results, but if people want to remove it. I unfertonately don't have enough usage of it to complain :/
09:46nbpas opposed to iongraph.
09:48bbouviernbp, agreed, i think "if we can" really means "if someone cares enough to step up and do the work"
09:51tillnbp: I agree with your points on presenting too much information in the profiler for normal use. But what do you think about having different views, with the ability to show "low-level" information? I'm thinking along the lines of fitzgen's Culture of Performance essay here https://gist.github.com/fitzgen/e7e42883e902da6646e6b8bedf7c4661
09:52nbptill: honestly, I would think it would be interesting to present the generated assembly, with code-coverage, like bhackett did with his addon for a while.
09:53tillnbp: yes! I do miss that addon
09:54nbptill: while this is not for average users, this is one standard, that we do not have to document.
10:05nbptill: fitzgen summary of the performance tooling of Firefox is quite good. On the other hand you have to consider the overhead of every tool, as well as the domain knowledge.
10:05nbptill: if perf-html was able to display the same information of iongraph, I would not want it enabled by default, as it might cause memory profiling issues.
10:08nbpSounds like Heisenberg relation. The more detailed timely precision you want, the less precise you are going to be in memory :P (is memory analogous to energy :P)
11:18tillnbp: sure, it's not about having everything enabled by default. There always have to be different levels of depth and detail when it comes to tooling like this. But those levels should, where possible, be displayed in a unified way
14:24fitzgennbp: till: have you been following jimb's work to unify all performance sources into on sink?
14:25fitzgenhe ahs the ability to turn individual sources, or even whole subtrees, on and off
14:25nbpfitzgen: I did not follow, do you have pointers?
14:25fitzgentrying to find the bug...
14:26fitzgennbp: till: https://bugzilla.mozilla.org/show_bug.cgi?id=1369955
14:26firebotBug 1369955 ASSIGNED, jimb@mozilla.com Off-the shelf control, enumeration, and documentation of performance sources
14:38nbpfitzgen: I think it would be interesting to see it as a replacement of the file backend of the TraceLogger.
14:39nbpjandem: ^
14:40nbpjandem: we might have one way to keep the probe without the burden of handling the saving of the streamed content.
14:40tcampbellI believe h4writer was looking in to that at one point
14:41tcampbellaren't most of the fuzzbugs and maintenance related to the measurement itself though?
14:41tillfitzgen: ah, thanks - I had not been following that
14:42tilltcampbell: tracelogger feeding into a tool that people are already using might change the equation enough to keep it
14:42tcampbellfair point
14:48jandemtill: there's still the issue of who's going to use/maintain it
14:49tilljandem: oh, sure. I'm not saying it's a done deal in that case. Just that it's worth looking into how that'd change the equation
14:50tilljandem: but I agree that this does seem kinda theoretical, given that any such integration would probably be quite a bit of work that nobody currently has the cycles to do. So it probably doesn't make sense to keep TL around until such a day where it might theoretically become useful again
14:53jandemtill: yeah, it sucks to remove it because TL has its uses.. If we want to resurrect it the code will always be available in VCS though
14:53tilljandem: agreed
14:55nbpjandem: didn't we have a configure flag switch for tracelogger at one time?
14:56nbpjandem: maybe we could just introduce it again, and see where this perfstream is going.
14:56jandemnbp: we do, it it's enabled by default in debug builds and disabled by default in opt builds
14:56jandemif we disable TL by default everywhere, at build time, it will bitrot very fast
14:57nbp!summon jimb
14:57* nbp wait a minute
14:58nbpjandem: what if, we remove all the backend of TL, and just keep the interface of it?
14:58jandemwhat's the interface? the start/stop classes/functions?
14:58nbpyes.
14:58bbouvierand calls, i assume?
14:59tillthose seem easiest to restore from vcs, no?
15:00bbouvierif you also keep the calls to TL functions, it's easier to keep them around rather than restore them from vcs (the latter is more likely to cause rebasing pain)
15:00bbouvierbut it's dead code, so not ideal
15:01nbpI have to say, the worst part of TL interface was dealing with multiple threads.
15:02nbpand that's also the most likely part to bitrot.
15:26nbpok, looking at the patches, there is "stream", and no way to collect them from the different measures.
15:27nbpyet
15:47nbpGreat, I managed to reproduce a leak that is blocking the JSBC from landing. The only issue is that I reproduced while the JSBC is disabled too :/
15:48nbpstandups: Tracking hardware to reproduce issues only seen on CI.
15:48standupsOk, submitted #49449 for https://www.standu.ps/user/nbp/
15:49tcampbellhaha. well, nice work make it repro
16:01_6a68RyanVM: howdy :-) how did the Try run turn out?
16:06RyanVM_6a68: not well, hit some build bustage that I haven't had a chance to really look at it
16:06_6a68doh
16:20RyanVM_6a68: at this point, I hope you're not waiting on Try results to hunt this down :\
16:22_6a68RyanVM: nah, the Talos crashes haven't been observed by QA in ordinary usage, the shutdown timer has been tightened to hopefully avoid breaking Talos in the future, and we're going to attempt a Beta rollout next week
16:23RyanVM_6a68: the good news was that the crashes reproduced on Try *when pushing Beta to Try*
16:23RyanVMso we should be able to test fixes on top of Beta
16:23RyanVMI just haven't been able to figure out how to get Trunk to reproduce
16:24_6a68cool, nice to get some good news
16:25RyanVM_6a68: yeah, because that was how I figured out what needed backing out in the first place (I did a series of Try pushes with different patches from the regressing push backed out) and that's what clearly pointed to that bug as the culprit
17:00bzcan ~CallSetup gc?
17:00mrgigglesbz: Yes, |void mozilla::dom::CallbackObject::CallSetup::~CallSetup(int32)| can GC: https://pastebin.mozilla.org/9029482
18:34lukesfink: if you want to debug the final (DESTROY_RUNTIME) gc, what's an easy way to get a GC dump of everything that is alive?
18:49mccr8luke: js::DumpHeap before the GC
18:50lukemccr8: ah thanks, i had just found my way to that
18:50lukemccr8: what should i search for in that (quite big, it seems) log to find the roots
18:50luke?
18:50lukeoh, hah, i missed "# Roots" at the top
18:51mccr8luke: yeah, everything at the start through the first ==========
18:52lukemccr8: so i see a lot of roots, but i'm guessing it's because this DumpHeap GC isn't a shutdown GC
18:52lukemccr8: can i DumpHeap() *after* the DESTROY_RUNTIME gc?
18:52jorendorffbz: i take it this code is just so obscenely hot that we don't want to root obj for the benefit of debug-only code? http://searchfox.org/mozilla-central/source/dom/bindings/BindingUtils.h#1098-1100
18:52mccr8luke: DumpHeap isn't a GC, but yeah I guess all of the roots are active. I don't remember exactly when we drop roots.
18:52mccr8luke: yeah that should work, if you want to know what is left.
18:53lukegreat
18:53mccr8if you want to figure out why a particular object is alive, you can use the find_roots.py script from here: https://github.com/amccreight/heapgraph
18:53naveedjorendorff: ping
18:54naveedjorendorff: Could you please update https://docs.google.com/spreadsheets/d/1HQtGv1qySTEfWWukTKoGNz8GGXKEVMYTS6Bw67DnX2k/edit#gid=0 with your Q3 goals?
18:54jorendorffack yes
19:26lukemccr8: are "Preserved wrapper"s treated as roots?
19:30naveedjorendorff: \o/ ty
19:32mccr8luke: they are roots from the perspective of the GC, but not the CC.
19:34lukemccr8: would a DESTROY_RUNTIME gc mark them black?
19:34mccr8luke: no, they'd always get marked gray. (if there's a path from a black root also then of course it would end up black)
19:35mccr8luke: there's some fairly recent mechanism for dropping roots at some point during shutdown but I don't know anything about it.
19:35mccr8maybe it didn't land though.
19:37lukemccr8: so i wasn't able to do a DumpHeap() after the DESTROY_RUNTIME gc (assertions, crashes), so i did it right before, after gc.finishRoots(), and the dump shows a "Preserved wrapper" under Roots, marked Block (for a worker global scope)
19:38mccr8luke: you could try using find_roots.py on that object, with the -bro option, and it might show you a path from a black root to the object.
19:39lukemccr8: but it's reported under "# Roots."
19:39mccr8luke: yes, but that includes gray roots (unfortunately...)
19:39lukemccr8: incidentally, the same object (a workerglobalscope) is the mIncumbentJSGlobal of a callback
19:39lukemccr8: but the DumpHeap() is reported it as B(lack)
19:40mccr8luke: well, it'll be black because some other black object holds it alive. though from the GC perspective, it keeps alive things that are either black or gray.
19:40mccr8luke: what are you trying to figure out? Why some object is leaking?
19:42lukemccr8: yeah, checkEmptyArenaList() is reporting live cells
19:42lukemccr8: and i'm trying to figure out what is holding them alive
19:43mccr8luke: well, you should use the find_roots script. That's what it is for. :P
19:44mccr8if there are callbacks, DOM stuff might be involved and you may need CC logs. but maybe there's just some worker shutdown stuff that didn't run
19:46lukemccr8: does find_roots need both GC and CC logs?
19:46lukemccr8: or: what's the right way to produce a dump that find_roots can consume?
19:47mccr8luke: there are two separate scripts (which is not ideal...) so you can just run it on the GC log
19:48jorendorffbz: ping
19:50lukemccr8: ah, ok, where is find_scripts again?
19:50mccr8luke: https://github.com/amccreight/heapgraph/
19:55lukemccr8: ok, so when i ask about some random surviving function (as the target to find_scripts.py) it does say "via Preserved wrapper"
19:56mccr8luke: did you use the -bro option?
19:57mccr8By default, find_roots reports paths from gray roots, too. And it reports a shortest path.
19:57lukemccr8: no, but when i set it now (putting it after filename and target, right?) it has same output
19:57mccr8hmmmm weird.
19:57lukemccr8: i think it makes sense, this one dedicated worker scope is the only black root
19:58lukemccr8: nothing reaches it that isn't reached by it
19:58mccr8Alright. I'm just not sure how a wrapper ends up being a black root...
19:59lukemccr8: i'm assuming this "Preserved" bit?
19:59lukemccr8: http://searchfox.org/mozilla-central/source/dom/base/nsWrapperCache.h#245
20:00mccr8luke: yes
20:00mccr8I guess we do mark wrappers black sometimes, as part of a CC optimization...
20:01mccr8mozilla::dom::TraceBlackJS(). But I don't think that runs on workers.
20:01lukemccr8: so then if PreservingWrapper() is 'true' that could be the bug?
20:02mccr8luke: so, a wrapper (JS) always keeps alive the C++ DOM object. If the wrapper is "preserving" then the C++ object also keeps the JS object alive. And they'll both stay alive until they get CCed.
20:03lukeah, i think we're not running a CC here, in this DESTROY_RUNTIME GC
20:03lukei suppose we assume it has already run/
20:03luke?
20:04mccr8luke: hmm. On workers we synchronously run a CC at the end of each GC.
20:05lukemccr8: via a callback from the GC?
20:05mccr8yeah. let me see if I can find it
20:05mccr8luke: http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#1100
20:06lukemccr8: so if i want to get a complete picture, should i get a gc+cc dump?
20:08mccr8luke: yeah running with something like MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_THREAD=worker might be what you want to try next. That will grab both logs at the same time, whenever you CC.
20:08mccr8You also need to disable the sandbox if this is the content process though maybe you've already done that.
20:08lukemccr8: cool, in /tmp?
20:08mccr8yeah, on Linux.
20:08lukemccr8: how do i disable sandbox?
20:10mccr8luke: MOZ_DISABLE_CONTENT_SANDBOX=t
20:10lukemccr8: thanks!
20:13lukemccr8: incidentally, when i do a run of the test with a tweak so it doesn't leak, there aren't any roots when I DumpHeap at the same point
20:14mccr8Ah, ok. So yeah it sounds like you just need a CC log then to get more information.
20:14lukemccr8: ah, so the CC part will help me see why the wrapper is being preserved?
20:15mccr8luke: the wrapper being preserved is not the issue. But it will tell you why the C++ part of whatever is being held alive.
20:22lukemccr8: do i pass both gc-edges-* and cc-edges-* to find_roots?
20:22mccr8luke: one or the other depending on which log you want to analyze.
20:22mccr8luke: probably you'll do the GC log first, iwth the GC object, then it'll trace back to some preserved root edge to some object, then you pass that object along with the CC log.
20:23lukemccr8: ah, got it
20:25lukemccr8: is the last log the one with the highest number before ".log"?
20:26mccr8luke: yes
20:26mccr8you also have to make sure you have the right process and right worker fo course.
20:26mccr8the THREAD thing should at least filter out the main thread.
20:32lukemccr8: hmm, the DedicatedWorkerGlobalScope found to be the root of the GC graph is not present in the CC graph
20:33lukemccr8: what does [known=0] mean after the ========= in the cc dump?
20:33mccr8luke: this is being reported as rooted by a preserved wrapper? Maybe it is getting removed from the CC graph because it is marked black or something?
20:34mccr8luke: that's the result of cycle collection. known means there are no known references to the object.
20:34mccr8luke: maybe baku would have some ideas on how to investigate worker leaks?
20:35lukemccr8: i'm thinking that this PreserveWorkers flag is just being erroneously set and never cleared and thus being treated like a root
20:36mccr8luke: the CC is responsible for freeing preserved wrappers. Setting the flag shouldn't cause a leak.
20:36mccr8it gets set if you do, for instance, someDomThing.ThisFieldIMadeUp = "whatever";
20:37mccr8so it happens all the time
20:37lukemccr8: so if someone simply leaked a ref-count on the C++ worker-scope, how would that look in the logs?
20:37lukemccr8: oh, ok, i get it, so it's probably just from onmessage
20:37mccr8worker lifetime stuff is a little weird. it is possible the worker just isn't being shutdown properly.
20:39lukemccr8: the key change i'm making is destroying a PersistentRooted much later in worker shutdown than i did earlier
20:39mccr8luke: my guess just based on that is that you are moving it until after some CCs we do during shutdown, so there's not enough time for it to be destroyed.
20:40mccr8a C++ JS cycle can take a few iterations to break down, GC -> CC -> GC etc.
20:40mccr8particularly if promises are involved...
20:40lukemccr8: yeah, that sounds quite likely
20:41mccr8luke: these are leaks that show up in debug builds?
20:41lukemccr8: like JS_DestroyContext() is happening after this point in worker shutdown which comments that the global should be collected: http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2939
20:42lukemccr8: i bet i can fix this to do the clearing of roots earlier
20:42mccr8alright. Yeah this stuff can unfortunately be a bit finicky.
20:42lukemccr8: so i've heard
20:45lukemccr8: thanks for all the help, btw, this is probably it now that i think about it
20:45mccr8luke: Great! Really I should just write all of this down somewhere...
12 Aug 2017
No messages
   
Last message: 10 days and 16 hours ago