mozilla :: #e10s

12 Jul 2017
14:45FallenWhat value of dom.ipc.processCount are folks using? Is 10 a reasonable number, or do you developers go higher?
15:39firebotBug 1376998 NEW, Sign e10srollout and get it on the testing channel
15:40elanis this the correct ticket to track enabling multi for WebExtensions users on Firefox 54?
16:38mrbkapelan: yes
16:39elanit looks like we need to make sure Rehan feels good about approvals, etc
16:39elanmrbkap: ^
16:39elanI have been on PTO and have kind of lost track, is there anything I can do to help?
17:19firebotBug 1376493 WONTFIX, Aggregate String Scalars as Simple Counts
20:21mconleybillm: ping
20:22billmmconley: pong
20:22mconleybillm: hey - I'm working with the Activity Stream folks on a crash they're experiencing in automation when running Talos. Here's one of the .extra files for one of those crashes:
20:22mconleyA few things in there:
20:23mconleyI see IPCShutdownState=SendFinishShutdown, and ipc_channel_error=PStorageParent::RecvAsyncPreload
20:23mconleythe stacks themselves don't say much. Each crash seems to have the content process at a different frame - it's kinda random
20:23mconleyand in the parent, we're just inside a runnable getting paired minidumps for some reason
20:23mconleybut from those extra values, can I interpret that as a KillHard because we're getting an IPC error returned by PStorageParent::RecvAsyncPreload ?
20:27billmmconley: yeah, that seems right
20:28mconleybillm: outstanding, thanks
20:28dmosemconley: so why we would be in xpcom-shutdown anyway when those talos tests are still running?
20:28dmosemconley: which presumably they must be since they're in the crashing stacks
20:28billmmconley: you could try adding a reason here to confirm:
20:29billmthe child is in the process of shutting down when this happens
20:29mconleywhich might not mean that the whole browser is shutting down
20:29mconleyjust that the last tab belonging to a content process has perhaps shutdown
20:30billmI don't understand why the storage code would be failing if the parent isn't shutting down, though
20:30dmosewhat would cause a content process to shutdown if the parent isn't shutting down?
20:30mconleydmose: the last browser for a content process going away
20:30dmoseoh, we don't leave them around as a cache?
20:31mconleyI think gabor's stuff makes it so that short-duration content processes are recycled
20:31mconleybut ones that have been around for a few seconds get tossed once their tabs are all unloaded
20:32mconleydmose: so here's a bunch of places where initting the DB might go wrong:
20:34mconleydmose / Mardak: so, uh, is that enough information to do some printf-debugging or something on automation?
20:34mconleybillm: thanks again for the consult
20:34billmmconley: sure. probably wasn't much help though.
20:35mconleybillm: well, it confirmed my wishy-washyness over where the crash was likely coming from
20:37dmosemconley: seems like a reasonable strategy might be to do stack/printf debugging on automation with krism's patch backed out in order to trigger the problem more easily
20:37dmoseMardak: what do you think? ^
20:38dmosemconley: but that info is tremendously helpful; thanks!
20:39Mardakdmose: sounds good
20:42Mardakbillm: also related to preffing on activity-stream in m-c, we've been trying to track down leaking about:newtab that happens only if we force activity-stream about:newtab to run in the child process that dmose conditionally adds URI_MUST_LOAD_IN_CHILD
20:43billmMardak: sorry, I don't understand the last part of that
20:43Mardakthis happens even if we remove all the code from the activity-stream extension / bootstrap.js and also when turning off newtab preloading
20:44Mardakin m-c, tiles about:newtab runs in main process as it has been. if browser.newtab.activity-stream.enabled is true, we make about:newtab load in child because it's set up to use message passing
20:45billmMardak: so if you load it in the child, it leaks. otherwise it doesn't leak?
20:45Mardakwe can avoid the leak by just forcing activity-stream to be in main process just like existing tiles about:newtab
20:46billmMardak: is there a bug for this? the only way to investigate is to look at leak logs. it's a long process.
20:47Mardakthe current one is
20:47Mardakpreviously it was leaking consistently in another test but it mysteriously stopped last week
20:50Mardakbillm: any tips for ursula to investigate? she's done some debugging for the 2829 issue
20:57billmMardak: you need to log the cycle collections that happen here:
20:58billmMardak: the logs from those CCs will hopefully include the window being leaked and tell you what the path is to it
20:59Mardakare there examples of how to log?
21:00billmMardak: I think mccr8 probably knows better
21:01ursulabillm: i looked at the cycle collection logs yesterday
21:01ursulabut didn't really find anything helpful
21:01billmursula: when were the logs taken?
21:02ursulawhat do you mean?
21:02billmursula: well, how did you collect the logs?
21:02billmursula: we cycle collect a lot, and you need to look at the right cycle collections
21:03billmursula: often only the logs at shutdown matter, but in this case you need to look a little earlier
21:03ursulait's totally possible that i wasn't looking at the right stuff... i was using the find roots tool in:
21:03billmursula: how did you get the logs in the first place though?
21:04ursulai just ran the test with MOZ_CC_LOG_DIRECTORY and MOZ_CC_LOG_SHUTDOWN=1
21:04ursulaand also MOZ_CC_ALL_TRACES=1
21:05ursulaand disabled sandboxing
21:05billmursula: yes, that will only log shutdown CCs, which is too late to get any information for this kind of failure
21:06ursulabillm: ah
21:06mccr8ursula: you need to create an instance of;1 and then pass it in to that location that Bill pointed out.
21:06mccr8which creates and instance of nsICycleCollectorListener.
21:06billmmccr8: thanks, couldn't remember that part
21:06mccr8so that should log the actual CCs we do right before we check for the leaking windows.
21:06ursulai pass it into forceCC ?
21:07mccr8ursula: yes.
21:07mccr8this is the signature for that method:
21:07ursulaoh i see
21:07mccr8I think that should work. You'll still probably want the LOG_DIRECTORY and ALL_TRACES thing. and disabling the sandbox.
21:07ursulaand what do i look for in the logs?
21:07ursulansGlobalWindow still?
21:08mccr8ursula: yes.
21:08mccr8ursula: you look in the CC log for the nsGlobalWindow. Hopefully you can easily match it up with the one that the leak checker is reporting leaked.
21:09ursulamccr8: and it'll show me the path to how to leak was created?
21:09mccr8ursula: It will show you a path from an object that the CC thinks should definitely be alive, to the window.
21:10mccr8So you can figure out what the actual leaking thing is, that is entraining the window.
21:12ursulaok great, i'll give it a shot. thanks mccr8 and billm!
22:10jfmbillm: interesting cookie thread suggestion
22:10jfmbillm: I have already written up a plan for how to send all cookies to each child process and send updates to each process as they occur
22:11jfmbillm: the advantage of that plan to me is that I know exactly how to implement it and we can reuse a bunch of the work from the existing patches
22:11jfmbillm: the disadvantage of your thread plan is that I don't know anything about the IPC pieces that are involved there and I need to spend a bunch more time thinking about how to implement that properly
22:12billmjfm: ok. I just wanted to suggest it. the IPC part is pretty easy. I think the hard part would be allowing cookie access on a separate thread in the parent. but that doesn't seem too difficult.
22:15jfmbillm: is there any model code to look at?
22:16* jfm suspects PBackground might be similar
22:16billmjfm: the process hang monitor is somewhat similar, and much smaller than PBackground:
22:19billmjfm: this is probably the most relevant place to start, since it shows how the IPC is set up:
13 Jul 2017
No messages
Last message: 73 days and 19 hours ago