mozilla :: #e10s

13 Jul 2017
00:31mrbkapReading bug 1331680 makes me so happy that we didn't tackle it during e10s proper.
00:31firebot NEW, Consider not doing sync IPC for document.cookie getter/setter
00:41RyanVM|afkmrbkap: i'm worried about that even making 57 at this point
00:42mrbkapRyanVM|afk: Yeah, no kidding.
15:02bkellygabor: ping
15:03bkellymulti-e10s rollout question if you have any time
15:03gaborbkelly: in a meeting right now... can I get back to you after that?
15:53gaborbkelly: pong
15:54bkellygabor: so I noticed something that was preventing multi-e10s on my wife's mac and on my windows desktop today
15:54bkellynot sure if its expected or a bug
15:55bkellygabor: my cohort value: e10s.rollout.cohortSample.multi;0.724669
15:55bkellygabor: this gets me this pref set: dom.ipc.processCount.web;4
15:55bkellygabor: however, I still only see one content process... I believe because we have this set for some reason: dom.ipc.processCount;1
15:56bkellygabor: if I override dom.ipc.processCount;1 to 4, then I start getting multiple content processes
15:56bkellygabor: is this expected?
15:56bkellythis is all on FF54.0.1
15:57gaborbkelly: was dom.ipc.processCount set manually? (I mean user value)
15:57bkellygabor: no... about:config shows it in a light color indicating default AFAIK
15:58gaborwhat channel is this release or beta?
15:58bkellygabor: I also have this in FF54.0.1, but not sure if its related: dom.ipc.processCount.extension;1
15:58bkellygabor: release
15:59* gabor just realized that you already specified the version
16:00gaborthat's odd... does she have any addons installed?
16:00bkellygabor: no... and I'm seeing it on my desktop as well
16:01bkellyso two separate machines/profiles
16:02gabormrbkap: felipe: ^ do you guys know how can this happen? I haven't been following the system add-on very closely lately and this seems like a bug for me...
16:03bkellygabor: I just created a new profile and got the same behavior (after clearing cohort once and restarting a few times to make sure it took affect)
16:04bkellyso it doesn't seem like a profile migration problem
16:05bkellycreated another new profile... I got the multi4 cohort from the start so didn't have to reset it... same behavior
16:06gaborbkelly: this seems really odd to me, I'll do some experiment as well, thanks for reporting it!
16:06bkellyseems 100% reproduceable for me here... so hopefully its easy to debug
16:06bkellythanks for looking at it
16:07bkellygabor: I guess I had a follow-on question... for telemetry, are we just looking at the value of dom.ipc.processCount.web or something else to determine how many processes are actually running
16:07bkellywondering if this has messed up our data analysis at all
16:12gaborbkelly: this is really odd, I've just tested it and I didn't get multi either on the release channel :(
16:13bkellygabor: I'm on 1.70 of multi-process staged rollout addon
16:14bkellygabor: also, isn't it weird we have the dom.ipc.processCount.extension;1 pref? isn't that only in nightly right now?
16:17gaborbkelly: I don't know... it seems to be 1 on beta too:
16:17gaborI'm not too familiar with that flag
16:18gaborI need to check how the system add-on works, on nightly and beta we have dom.ipc.processCount 4 and in release it's 1, so it can be that the system add-on does the right thing on those channels but not on the release channel, which would be quite bad news...
16:25gaborbkelly: do you have dom.ipc.multiOptOut ?
16:51bkellygabor: no multiOptOut... also, if I had that I don't think I could set dom.ipc.processCount = 4 to fix the issue
16:52bkellygabor: I feel like it must have worked around the time of FF54 release since it got so much attention in the media, etc
16:52bkellyseems like someone would have noticed if it wasn't working then
16:52bkellygabor: I wonder if 54.0.1 broke something... or a new version of the addon
16:55gaborbkelly: I haven't tested it myself on a fresh profile on the release channel (I had a custom pref in my profile that I've just realized now). But I would assume that it used to work... unless everybody else had a custom setting as well
16:55RyanVMbkelly: we didn't update the addon in 54.0.1, at least not in-tree
16:55RyanVM1.7 is the one in-tree too
16:56gaborbkelly: I know that we have a signed version up for review and test for enable multi for users with web extensions only
16:57RyanVMa number of relevant changes have hit 55 for that
16:57RyanVMbug 1373707, bug 1377001 for example
16:57firebot FIXED, mrbkap Include users with only webextensions in Release to receive Multi using the system add-on
16:57firebot FIXED, mrbkap Changes to e10srollout for Beta 55
16:57gaborjimm: do you know if softvision has tested multi on the release channel after the 54 release?
16:59mrbkapbkelly: gabor: have you restarted?
16:59RyanVMsorry, m-r has 1.50 in-tree at the moment
16:59RyanVMso 1.70 would have been updated post-install
16:59gabormrbkap: yeah, a few times
16:59bkellymrbkap: yes... I even made a new profile and restarted
16:59RyanVM(hgweb had me on "tip" instead of "default" so I was looking at some oddball seamonkey branch)
17:00mrbkapbkelly: gabor: dom.ipc.processCount should be 1 and .web overrides it if so.
17:00bkellymrbkap: it doesn't seem to do that...
17:00bkellyin practice
17:00bkellyin release
17:00mrbkapgabor: softvision did test multi on release.
17:00bkellyRyanVM: I just heard "seamonkey broke multi-e10s"
17:00bkellyI know
17:01RyanVMlol, though
17:01RyanVMdamn nonlinear history :)
17:01bkellyit was probably thunderbird
17:01RyanVManyway, we're shipping version 1.50 of teh addon with 54.0.1
17:01RyanVM1.70 is on beta and I would assume we're updating release users to it as well post-install
17:02gabormrbkap: yeah I've checked it and all the logic seems to be fine, except if the the HasUserValue lies about dom.ipc.processCount... but that does not seem to be the case according to my debugger
17:02RyanVMso in theory this *could* be a bug with 1.70 and Fx54?
17:02RyanVMnot sure what QA we've done with that combo specifically
17:04mrbkapgabor: can you pastebin about:support somewhere?
17:04mrbkapbkelly: too?
17:05gabormrbkap: sure, by the way does it work on your machine on a clean profile? Getting a bit paranoid here...
17:11gaboractually... that wasn't the clean profile, this one is: but it does not seem to matter much
17:14mrbkapgabor: I can reproduce on OSX :-/
17:14mrbkapgabor: the prefs are set correctly and the code looks correct.
17:14mrbkapgabor: I wonder if the process selector is failing somehow.
17:14gabormrbkap: I really doubt it, that should not depend on the channel
17:15bkellygabor: mrbkap: do you have a way to load older addon versions?
17:15mrbkapgabor: I'm noticing now that ContentParent::MinTabSelect never creates a new process.
17:15bkellyshould I file a bug?
17:16mrbkapbkelly: yeah
17:16* mrbkap has a meeting in 10 minutes.
17:16mrbkapbkelly: assign it to me.
17:17jimmgabor: sv does a lot of general testing of releases before they go out.
17:17gabormrbkap: you mean the C++ version we fall back to if the JS version fails for some reason?
17:17mrbkapgabor: yes.
17:19gabormrbkap: so if we cannot load ";1" for some reason we would get in trouble
17:20firebotBug 1380725 ASSIGNED, release channel is only running 1 content process even when in the 4 process cohort
17:21gabormrbkap: can you think of any reason why we would fail to load that component on release all of a sudden?
17:27mrbkapgabor: something to do with packaging maybe.
17:43gabormrbkap: so the component seem to be registered and working fine
17:44gaborI could find it via the error console
18:00gabormrbkap: it's weird, I don't have processCount.web set to anything...
18:05gabormrbkap: uhh... so when I set dom.ipc.processCount.web to (integer) 4 after resets it turns into a string 4
18:18gaborbkelly: is the type of your dom.ipc.processCount.web an integer or a string?
18:27felipeI bet the problem is the integer vs. string here
18:27felipesampleName is a string "4" here
18:27felipebut how does it work on beta but not on release?!
18:30gaborfelipe: that's what I'm looking at... maybe on beta, in a previous version we set that value on everybody's machine to an integer and if it's already set then this line won't change its type?
18:31felipethat's possible..
18:31felipeyou could check that theory with a fresh profile on beta
18:32bkellygabor: I do indeed have a string
18:33bkellyso does this tell us when the problem was introduced?
18:33gaborfelipe: if I have a manual value then the add-on resets it for sure
18:33felipegabor: a manual value with a mismatched type?
18:34gaborfelipe: yeah
18:35gaborbkelly: good question... I guess at this point we have more questions than answers
18:35bkellygabor: well seems like it should be easy to fix through an addon
18:50gaborbkelly: yeah, at least... although we should sanity check our measurements...
18:51mrbkapbkelly: we have no automated testing of e10srollout
18:51* mrbkap could have sworn that he checked that this worked on beta, though.
18:51gabormrbkap: felipe: so on beta, there were lots of manual testing around this addon if I recall it correctly, was it ensured that for the multi-1 group we have only 1 process enabled?
18:54gaborI mean on beta the default was 4 so the same bug would have enabled 4 cp for the multi-1 bucket unlike what we're seeing on the release channel
18:54bkellymrbkap: I think gabor tested on beta and it does work there?
18:54mrbkapgabor: ?
18:54* bkelly is not caught up on backscroll
18:55mrbkapgabor: release and beta both default to 1
18:55gaborbkelly: ?
18:55bkellygabor: I thought you said you tested multi-e10s on beta and it did not have the single process problem we see on release
18:55bkellymaybe I'm confused
18:56gabormrbkap: oh, you're right
18:58gaborbkelly: sorry I was confused, sure I tested it a while ago... I wonder if it's working now there
19:02gabornope, it's broken on beta as well
19:05mrbkapThis is a kind of huge problem.
19:08gabormrbkap: are you in touch with someone from softvision? We should figure out what tests they run. There were a bunch of cases we tested if e10s-multi is enabled or not with certain add-ons installed, etc.
19:08gaborI'm trying to understand what went wrong and how.
19:09mrbkapgabor: I'd bet money that they only look at the cohorts and prefs.
19:10gabormrbkap: oh
19:11gaborcanuckistani: elan: I think we have a problem.
19:12elantalk me to
19:12elangabor: ^
19:12elanI read the thread, read the ticket, have been reading this conversation
19:13gaborelan: seems like there is a bug in the e10s-multi activation add-on, and the release population is not getting e10s-multi, further more there is a chance that we're turned off on beta as well because of that
19:13firebotBug 1376926 NEW, [e10s] Adjust Usage Hour Ratio Query to Differentiate Single vs. Multi
19:13elanI've pinged Harald to get this updated
19:14elangabor: that sounds bad
19:14elanwho do we need in a room to talk about this and figure out next action? mrbkap, felipe, who else?
19:15elanfirst, are we 100% certain that it impacts everyone?
19:15elangabor: ^
19:15elanlet's get into a room
19:15elane10s vidyo please
19:16gaborI'll be there in a minute, need to get my headset
19:16elanso, in this kind of situation, I think we identify solutions first
19:16* jimm reads some scrollback
19:16elanmake sure those solutions happen at top speed
19:16elanand then we take the opportunity to do a retrospective once the problem is resolved
19:16mrbkapelan: I have a patch.
19:16elandoes that sounds good?
19:18mrbkapelan: e10s vidyo room?
19:19jimmmrbkap: can you summarize the issue?
19:19elanjimm: e10s vidyo
19:19elanwe're in there
19:30mrbkapfelipe: the patch is in your queue.
19:38digitaraldelan: will check it out. trying to find chutten's query where he used the cohorts for slicing up multi
19:44mrbkapfelipe: should we be allowing webextensoins
19:53chuttendigitarald: anything I can help with?
19:54digitaraldchutten: you found out that we have e10s_cohorts on release. is there a query that buckets cohorts in a sensible way?
19:54RyanVMmrbkap: autoland is closed due to the OSX symbols issue - I'm feeling inclined to directly land the patch on m-c
19:54mrbkapRyanVM: Go ahead.
19:54chuttendigitarald: depends what you mean by "sensible way"
19:55chuttendigitarald: is this the query you were talking about?
19:55ursulamccr8: hey, passing in an instance of;1 to forceCC gives an assertion failure when i run the test... the test completes and i get some cc logs, is the assertion failure not a big deal?
19:55mccr8ursula: what is the assertion?
19:55ursulamccr8: Assertion failure: !aLog->mFile, at /Users/usarracini/mozilla-central/xpcom/base/nsCycleCollector.cpp:1672
19:56mccr8ursula: hmm it looks like the file isn't actually getting saved. But maybe it is still recording a different log...
19:57ursulamccr8: the logs that this should create get dumped on the directory i give it in MOZ_CC_LOG_DIRECTORY right?
19:58mccr8ursula: yeah. Maybe there are GC but not CC logs or something?
19:59ursulamccr8: the directory also has gc logs in it after the test finishes
20:00ursulamccr8: what's the difference between logging it here: or logging it in the second forceCC on line 45?
20:00RyanVMmrbkap: pushed to m-c
20:00mrbkapRyanVM: thanks
20:01mccr8ursula: we run the GC/CC a number of times in a row. so one is just run later.
20:01RyanVMmrbkap: autoland *might* end up complaining in the bug still when it attempts to land after trees open (not sure how it'll handle it) - safe to ignore if it does
20:01mrbkapRyanVM: ok
20:02ursulamccr8: ahh. also, in the cc logs that it does give at the end, i can't find a reference to nsGlobalWindow in a log associated with the process number that is said to be leaking in the test
20:03ursulaso i don't know if that's because the assertion failure is sort of messing up the collection, or if something else is going on
20:14mccr8Yeah I don't know. Sorry this stuff is half-broken. We don't have any testing for it...
20:22ursulano worries, thanks anyways
20:35digitaraldchutten: is there another way than cohorts to determine content process count?
20:35chuttenI believe we collect dom.ipc.processCount...
20:36chuttenYup, we do:
20:37digitaraldchutten: ^
20:38digitaraldre-running right now for beta & release
20:42digitaraldif stmo lets me `Error running query: Should not have nextUri if failed `
21:02digitaraldelan: pinged the data team as the query keeps failing with parquet errors.
21:03digitaraldwhich all look French to me, et ne les comprend pas
21:11chuttenWell, you're missing the = 1 case where someone decided to manually set it to 1, but yes: this should count the number of pings in those cases
21:11chuttenI would recommend limiting the queries by submission_date_s3 to speed things up
21:11chutten(and then limiting them again by sample_id if that's not enough)
21:12RyanVMI assume we're trusting the accuracy of dom.ipc.processCount too
21:18felipethis should be the most reliable process count info on telemetry:
21:18felipechutten ^
21:21felipedigitarald ^
21:25digitaraldchutten: is that available in aggregate or does it require a notebook?
21:25digitaraldfelipe: where do I see in about:? if I have multi process?
21:25digitaraldI see 1/1 in about:support
21:26felipedigitarald: that information is not there
21:26digitaraldfelipe: so 1/1 will never go higher than 1 ...
21:27felipethat means the number of windows which are using e10s
21:27felipeso if you have 3 windows, and none were created with "Open Non-e10s Window"
21:27felipeit will be 3/3
21:29digitaraldfelipe: ok
21:29digitaraldI think I knew that already at some point
21:38RyanVMfelipe: seems like in the long run, we should have a way of verifying how many CPs Fx is launching from within rather than going off what we're telling it do
21:39RyanVMa counter or something?
21:39felipeyeah, we should definitely add something like that in about:support
21:39jimmabout:performance lists child processes btw
21:39jimmeasy way to test
21:48RyanVMTBH, that seems like something we could add a regression test for then too
22:00elanjimm: are you telling me that if we had added "check number of child processes in about:support"
22:01elanto the test plan; this would have been caught?
22:02jimmelan: # of child processes isn't in about:support. the processes are listed in about:performance which is all about poorly performing tabs and memory consumption.
22:03jimmtake a look
22:03elansorry I meat about: performance
22:03jimmit is a useful way to see how many child processes you have running. although not great at telling you how many of those are content processes
22:04jimmbut you're right, we could have used that to validate better
22:08elangot it
22:09elanemail = super thumbs up
22:09elanJimm: ^
22:09elanthat was perfect
22:09elanthanks for that
22:18jimmelan: hey, fyi most likely will not make it to the cross functional tomorrow.
22:18jimmI have a local hoa thing I have to attend. will be in in the afternoon.
22:19elanI think that is ok; let me pull together the agenda and if you think of things to weigh in on; feel free to add it there
22:21mrbkapelan: fwiw, from my perspective, the hardest thing about this is that we have no automated tests for e10srollout.
22:22elanok; I feel like we need to do a retrospective
22:22elantomorrow is not the day to do it
22:22elanif that's ok... let's keep focused on the solutions but capture our retro thoughts
22:23mrbkapelan: Sure. I guess from my perspective the solution is done.
22:24mrbkapelan: I've tested that the new e10srollout code behaves correctly.
22:25RyanVMelan: yes, we'll definitely want a retrospective. I think we already have a few clear action items for the future coming out of this
22:26mrbkapRyanVM: Do you have thoughts about testing e10srollout?
22:26elanyes, I'm not questioning whether or not we have a retro
22:26elanI'm kind of reflecting on when and how
22:26mrbkaphm, about:support doesn't list the type of each preference.
22:27RyanVMI was about to say that we could test that the pref does what it's supposed to outside of testing e10srollout, but I'm thinking that wouldn't have actually caught this problem
22:27RyanVMsince our test would probably be setting the pref with the correct integer value
22:27RyanVMso hrm
22:28mrbkapRyanVM: I don't think this fits cleanly into any of our existing testsuites.
22:28RyanVMI'm wondering if UI tests are the answer
22:28RyanVMor the screenshots folks are using mocha IIRC?
22:29RyanVMbut it seems like we need some way for Gecko to tell us how many web content processes it currently has spawned independent of however we're setting prefs
22:30RyanVMmrbkap: anyway, UI tests would give us the ability to restart the browser
23:29elanmrbkap: I need you to connect with shell
23:29elanI'm hearing that we need to whitelist screen shots
23:29elanor else we will invalidate the experiment
23:29elanshell: ^
23:30shellSo we just want to remember that when screenshots became a webextebsion that goes to everyone.
23:30mrbkapelan: shell: is screen shots still a webextension?
23:30elanmrbkap see above
23:31elanShell: wairt
23:31elanthey didn't go to 100% today
23:31shellThat if we target people with no addons, and everyone has screenshots webextebsion...then we probably want to allow screenshots specifically
23:31elanI don't know what % they are at but they need to disable themselves in private browsing before they get to be at 100
23:31elanbut they aren't at 100%
23:31elanshell: ^
23:31shellAh, ok
23:31mrbkapshell: elan: I don't think we have to worry about this yet.
23:32mrbkapshell: elan: since we allow webextensions on beta.
23:32shellOk, so beta experiment will allow people with no addons on only webextebsions? Or will the be separate cohorts at first?
23:32shellor only
23:32mrbkapshell: elan: currently beta is 10/90 non-e10s e10s and 50/50 multi1 multi4 allowing webextensions (for the webextensions-multiN cohorts)
23:33mrbkapI see; the concern is that this will put *everybody* in the webextensions cohorts?
23:39mrbkapshell: is screen shots not a system addon?
23:44elanmrbkap: if screenshots is not shipping to 100% to the population do we have to solve this for beta 9?
23:44elanI'm just trying to figure out if I need to stop the presses or we can proceed with with b9
23:44elanplease confirm
23:45mrbkapelan: I'm not convinced that there's a problem, yet.
23:46elanin the event we think we need to respin beta, we just need to alert liz asap
23:46elanmrbkap: ^
23:52mrbkapHow does the screenshots rollout work?
23:52mrbkapcan I test it myself?
23:52mrbkapshell: ?
14 Jul 2017
No messages
Last message: 72 days and 20 hours ago