mozilla :: #jsapi

10 Aug 2017
00:34pbone\/win 14
01:01Caspy7can I ask about the status of Quantum DOM? Is it still planned for 57?
01:03emiliosfink: RyanVM: FYI none of those worked. :(
01:03emilioRyanVM: I think you can back out and the followup commit without needing servo changes.
01:03sfinkbleh. What's the latest?
01:03emiliosfink: so moveEntry on its own didn't work, I think because it's an indirect call?
01:04sfinkoh! Sorry, I didn't read it properly.
01:05emiliosfink: is the try run fwiw
01:05sfinkbleh. It does PLDHashMoveEntry moveEntry = mOps->moveEntry; then later calls moveEntry.
01:06emiliosfink: then there's
01:06sfinkthat's with the nsIAtom one?
01:06sfinkoh, yep
01:06emiliosfink: that is, whitelisting nsIAtom along with nsISupports didn't make moveEntry go away (maybe wasn't considered a virtual?)
01:07sfinkwell, that one was to make the whole Release go away
01:07RyanVMemilio: I'm going to backout to get autoland back to a good state and to take some of the pressure off you to sort this out right this second :)
01:07sfinkoh, hm. That would be a more direct way of doing this. I think we want to annotate PLDHashMoveEntry moveEntry = mOps->moveEntry
01:08sfinkargh, stale cut & paste
01:08* RyanVM will get the servo commit and the 3 revs from bug 1362338
01:08firebot NEW, Consider devirtualizing refcounting of nsIAtom
01:08sfinkannotate nsIAtom::Release()
01:08emilioRyanVM: no need for the servo commit fwiw
01:08emilioRyanVM: backing out the last two gecko patches will work
01:08emilioRyanVM: and you can do it right now if you want
01:09RyanVMi'm confused
01:09RyanVMthe hazard builds starting failing on
01:09emilioRyanVM: that is, 19b8d672b55d and 2980183d98fb
01:09emilioRyanVM: that commit busts the build on its own
01:09RyanVMemilio: I'd rather play it conservative here
01:10emilioRyanVM: so it was red
01:10RyanVMautoland has been broken for 13 hours now
01:10emilioRyanVM: but because it didn't build at all awaiting for the gecko patches
01:10emilioRyanVM: backing out those two gecko patches will fix it, I'm pretty sure, since are the ones that devirtualize that function per se
01:11sfinkemilio: ok, my latest attempt. Put '_ZN7nsIAtom7ReleaseEv$uint32 nsIAtom::Release()' just after
01:11sfinkuh... which looks like we ran into exactly this problem before when marking nsXPConnect final
01:12sfinkI should've remembered that
01:12sfinkI wonder if I should break down and do some limited dataflow analysis specific to PLDHashTable. It's such a pain.
01:12emiliosfink: symbol + signature, or just signature?
01:13sfinkthe exact string I gave
01:13emiliosfink: k, will try
01:13emilioRyanVM: thanks for the backout
01:13emilioRyanVM: now I can go to sleep without waiting for that try run :)
01:13* RyanVM was thinking it was awful late for you :P
01:14* emilio goes push that and power off
01:14sfinksorry about this. It really is a limitation in the analysis, and my wallpapering over the problem is bleeding through
01:14emilioRyanVM: kinda, yeah... My schedules aren't quite stable :)
01:15emiliosfink: no problem, it's fine... I'm responsible for a good amount the bindgen stuff, I know it's pretty hard :)
01:15sfinknassssty dragonses
01:16pbonesfink: is this related to the strings in the nursery stuff?
01:16* pbone trying to follow along.
01:16sfinkpbone: no, completely unrelated. This is a rooting hazard that emilio stumbled across.
01:16sfinkby simplifying the callgraph in a way that the analysis can notice, ironically enough
01:17pboneoh, it's good that he found it then ;-)
01:18sfinkit enabled the analysis to see things that normally it (conservatively) considers to be problematic, and which it hadn't been complaining about because I had a big exclusion for it
01:18sfinkwell, it wasn't a real hazard
01:18sfinkit uncovered a false positive
01:18pboneah okay.
01:19sfinkin an area of the code known to be problematic for the analysis (because it relies on function pointers that are set to values that the analysis can't really know, at least not without doing a bunch of dataflow work)
01:20pboneYeah, higher order stuff is tricky for analyses../
01:22sfinkpersonally, I think we should scrap the function pointer-happy hashtable, and replace it with either plain C++ vtables, or better yet, templates
01:22sfinkwhoa, I just said "better yet, templates"
01:22sfinkI think that may be a first
01:22sfinkI hate templates
01:23pboneI saw that.
01:23pboneI havn't see this code, but if there's a fixed number of things for the higher order values they could be replaced with an enum, then switch on the enum to decide what to call.
01:24pbone(I prefer higher order values, except like you said, analysis.)
01:24sfinkthe set of things isn't known within one compilation unit
01:25sfinkto use one of these hashtables, you create a table of function pointers specific for your use, and create a hashtable with that table
01:25sfinkpoor-man's vtable
01:25sfinkand I suspect for a given use, the values of all those pointers is constant, and so could be known statically
01:25pboneYeah, maybe templates then... :-(
06:59daurnimatorIs there a user-available API for creating a function from source and providing a source location?
06:59daurnimatore.g. eval("some_code()", {source: ""})
07:23araidaurnimator: no. I think you need to retrieve it yourself with XMLHttpRequest or fetch or something
07:23araior perhaps, if you want the code.js to run off main thread, Worker might be what you want
07:24daurnimatorarai: I *am* doing an xhr + eval
07:25daurnimatorI just wanted debugging to be easier for a user
08:15mrgigglesthe mozilla-inbound tree is now closed (bustage from bug 1320656)
08:15firebot NEW, BitwiseCast(const From aFrom, To* aResult) syntax will change with C++17 and gcc is warning us
08:25mrgigglesthe mozilla-inbound tree is now open
08:37bbouvierfirebot, what is 0xe5e5e5e5
08:37mrgigglesbbouvier: e5e5e5e5 is 0 bytes after the magic value e5e5e5e5
08:37mrgiggles0xe5 is jemalloc freed memory
08:37mrgigglesbbouvier: if you're seeing a crash with this pattern, you have a use-after-free on your hands
08:37firebotbbouvier: everyone knows that! 0xe5e5e5e5 is jemalloc freed junk memory
08:37mrgigglesbbouvier: and you'd better fix it. They tend to be security bugs!
08:38bbouvierCHILL OUT BOTS
09:10bbouvierwould it make sense to run the JS tests suite in a very constrained environment, using cgroups to lower available memory?
09:10bbouvieror is it equivalent to just use all the GC intrinsics to simulate OOM
10:25nbpfirebot: what is 0xe5e5e5e5
10:25mrgigglesnbp: e5e5e5e5 is 0 bytes after the magic value e5e5e5e5
10:25mrgiggles0xe5 is jemalloc freed memory
10:25mrgigglesnbp: if you're seeing a crash with this pattern, you have a use-after-free on your hands
10:25firebotnbp: 0xe5e5e5e5 is jemalloc freed junk memory
10:26mrgigglesnbp: and you'd better fix it. They tend to be security bugs!
10:26nbpwhoa what a question I asked
10:29bbouviernbp, the same happened to me around 10:37am utc+2
10:30nbpindeed, I should read the log.
10:30nbpI just wanted to confirmed that this was jemalloc :/
12:22araievilpie: you're going to post another patch that adds warning with JSCompartment field, in bug 934669, right?
12:22firebot ASSIGNED, Deprecate Object.prototype.{,un}watch, and make them warn when used
12:22evilpieyeah sure
12:24araiokay :)
15:25tcampbellAn of the ARES-6 perf problems is that in , the let allocates an environment, but it doesn't look like it needs to
15:25tcampbellarai: does the presence of a yield* change anything? It looks to me like our analysis is just confused by it.
15:26tcampbell(attaching debugger to parse now..)
15:30araianalysis at which point?
15:32araimaybe something related to the stack values used by the yield* loop?
15:32tcampbellsorry, analysis being when the parser decide if we need to PUSHLEXICALENV or simply pretend the user said |var|
15:34tcampbelllet me try to rephrase. Looking at that code, do you see any reason why turning the |let| into a |var| would have *any* difference in behaviour? If it was a normal function, there shouldn't be. I don't think the yield* changes anything?
15:35* tcampbell should probably read the JSOP_RESUME logic
15:35araiyeah, I also think suspend/resume can be related
15:37tcampbellwild guess: maybe the suspend/resume marks the variable as being preserved, and we confuse that indication with the need for it to be in an allocated lexical environment
15:38tcampbellI'll do some digging, just wanted to make sure I wasn't missing something obvious about generators :)
15:39araiah... might be a bit different
15:39araipushlexicalenv is emitted even if there's not yield*
15:39araibut just being generator changes the result
15:41tcampbellah, good point
15:45araialso emitted when there's |with (o) {}|
15:45araimaybe SharedContext::allBindingsClosedOver ?
15:46tcampbellI think you are right
15:47tcampbellI wonder if it is possible to make the lexical binding non-lexical if it wouldn't be outside a generator
15:51mrgigglesthe mozilla-inbound tree is now closed (linting failure)
15:59araithis removes pushlexicalenv (but should be buggy)
16:02tcampbellarai: I wonder if we can just split the bindingsDynamicallyAccessed case from the generator case
16:02tcampbellso we can still force to stack, but avoid needing lexical environments
16:13araihmm, not sure how those conditions are connected to the generator implementation...
16:21tcampbellI guess I'll play around with it. I'm hoping I can split the closedover flag and update AnalyzeEntrainedVariables
16:25tcampbelloh, hmm.. that may only be a debug function
16:26mrgigglesthe mozilla-inbound tree is now open
16:34_6a68Hi all, I'm trying to understand how a system addon could have caused this crash
16:34firebotBug 1387598 NEW, Intermittent sessionrestore_no_auto_restore | application crashed [@ GetNameOperation]
16:35_6a68Here's the diff that seems to have caused the (intermittent, windows-only) regressions:
16:35_6a68Looks like it's somehow crashing in the JS interpreter, but I'm not sure how to debug further
16:52bbouviertill ^
16:53bbouvieror shu maybe ^
17:06tcampbellnaveed: do we have a meeting?
17:38djvjtcampbell: team meeting or 1:1?
17:38tcampbelloh, was just a 1:1
17:38djvjphew, thought I missed an e-mail
17:38tcampbellyou didn't miss anything
17:38tcampbellnext jit meeting is next wednesday
17:45djvjjandem: ping
17:47till_6a68: the patch you linked to does more than just switch to native Promises - it also chains Promises by replacing appStartupPromise instead of calling .then() on the same promise all the time
17:48till_6a68: that might change the timing of things, which might for example uncover preexisting bugs that so far didn't show symptoms
17:50till_6a68: sorry, the patch doesn't have any "switch to native Promises" in it, I just realize. The rest is still valid: I think it's most likely that changed timing uncovered preexisting issues
17:51_6a68till: sure, that makes sense. but how could addon code cause that deep of a crash?
17:51_6a68till: possibly related, apparently this code is causing all kinds of strange stack traces when crashing ts_paint on beta:
17:51firebotBug 1386333 FIXED, Remove Screenshots rollout pref
17:51till_6a68: it shouldn't be able to, and it's certainly not really responsible for it
17:52_6a68Another interesting one was
17:52firebotBug 1380267 FIXED, Crash in AsyncShutdownTimeout | profile-change-teardown | Extension Shutdown: jid0-GXjLLfbCoAx0LcltE
17:54till_6a68: I agree with RyanVM - those are scary stack traces
17:54_6a68till: we're supposed to be shipping this thing to 55 like, any day now
17:55_6a68who is a good person to dig into these bugs? addons folk steered me over here
17:55_6a68Also, is it possible that these are just pgo artifacts, or does this point to something deeper?
17:56till_6a68: these crashes are from normal opt builds
17:57_6a68The Talos tests use pgo builds on windows, I'm told, but you're right about the other crash bug
17:57RyanVM(they might be mislabeled - I need to verify which ones they're running on)
17:57till_6a68: so this doesn't happen if you don't remove the pref, just set it to the desired value?
17:57_6a68till: I dunno, we run Talos tests with each export, and didn't see this
17:58_6a68but again, I think there's a class of bugs emerging, all related to embedded webextensions not being properly sandboxed?
17:58RyanVM_6a68: it's not PGO - my Try pushes were definitely with PGO disabled
17:58_6a68RyanVM: good to know
17:58_6a68jgruen: ICYMI, here's some logs
17:59_6a68start near the bottom, and
18:00till_6a68: the stacks don't contain actual promise reactions, so it seems extremely unlikely that promises are the culprits, which leaves changed timings
18:00till_6a68: what's the reason for chaining the promises instead of adding reactions to the one?
18:01_6a68till: ironically, I was trying to fix a race condition where the webextension could be initialized twice
18:01crowbotIssue #3257: Empty and unusable icon where Screenshots is supposed to be in Fx55 RC1 -
18:01_6a68that's a RyanVM bug, too :-)
18:01till_6a68: if delaying each additional reaction by one turn of the microtask queue isn't explicitly desired, then that's strictly worse because it slows things down
18:01tillah, ok
18:02RyanVMtill: are there any differences in promises between NIGHTLY_BUILD and RELEASE_OR_BETA?
18:02RyanVMjust wondering how to explain why this showed up on an uplift
18:03_6a68I'm wondering if we should start an etherpad and collect what we know from these different screenshots bugs, or if the issues are different enough that it wouldn't be worth it
18:04_6a68There's a little added urgency here, based on the fact that we're supposed to start turning on 1% of release 55 next week
18:04_6a68er, turning on Screenshots
18:04tillRyanVM: tiny amounts of refactoring, but absolutely nothing that'd affect the timing. Given that promises don't show in the stacks, I don't see how they'd otherwise be involved
18:04_6a68also, QA hasn't turned up any of these bugs, they show up way later, which I find strange
18:04_6a68till: thanks again for your help. These stack traces exceed my neckbeard level
18:04tillRyanVM: except if the refactoring would somehow have fixed a bug that causes stack or heap corruption in the old code
18:06till_6a68: you're welcome. I'm afraid I don't know how to help further, either
18:06_6a68till: hmm, any idea who might know? Is this a "just ask mossop" kind of problem?
18:06till_6a68: my suggestion would be to try to change back the chaining, but then the other bug will show up again, I guess
18:07jgruentill, RyanVM issues showed up on win7 specifically, any reason that might be the case?
18:07jgruenor is that an artifact of where we test
18:07RyanVMthere were a couple win10 x64 as well
18:07RyanVMbut mostly win7 32-bit
18:07RyanVMnothing on linux/mac
18:07_6a68till: I guess I'm wondering who can look for the underlying bug. I'm just not comfortable trying to mask the issue by tweaking the addon code, we don't know it'll actually prevent users from crashing
18:08_6a68any suggestions would be great, or I can ask around in fx-team maybe
18:08tillRyanVM: it doesn't show up on linux32, though
18:08RyanVMwe don't run talos on linux32
18:08tilloh, right
18:08tillcan we?
18:08RyanVMno machines to do it
18:09tillI mean as a one-off, not regularly
18:09RyanVMwe don't have any physical machines to do it
18:09tillah, ok
18:09tillwell then
18:09_6a68we have AWS accounts
18:09RyanVMthey all got repurposed
18:09_6a68can we use VMs for this kinda thing?
18:09RyanVMyou can talk to jmaher about attempting it in a VM since we don't care about perf here
18:10RyanVMin-tree hacking would be needed beyond my expertise
18:10_6a68till: I'm not sure I understand why that would be a useful diagnostic exercise
18:10_6a68Are we trying to figure out if it's an out of memory type of error?
18:10till_6a68: I understand the desire to get to the bottom of this, and it's certainly vastly preferable. It might also be impossible in the given time frame
18:11_6a68till: yeah, I understand. I just want to gather what information I can, so that the Deciders are well-informed when they make yet another go/no go call
18:11till_6a68: partly because debugging under Linux is easier, partly because it rules out a lot of potential causes
18:11_6a68ah, that makes sense
18:11till_6a68: e.g., this could be caused by something MSVC does, for all I know
18:12till_6a68: smaug might be able to help analyze this, too
18:13djvjtcampbell: hey, where can I find example code for the CacheIR pattern matching to identify them?
18:13_6a68till: ok, do you think the multiple bugs are connected, or should I just ask about the Beta backout?
18:13till_6a68: I really don't know, I'm afraid :(
18:14tillRyanVM: oh, I just realize that it's possible that microtask handling was changed on Nightly. I'm not aware of any changes, but it's possible
18:16jgruenwait so this could be a compiler error?
18:17tcampbelldjvj: umm
18:17tcampbelloh. I know what you mean. let me see
18:18djvjI'm trying to find methods on CacheIRStubInfo that let me visit the CacheIR bytecode...
18:18djvjtcampbell: Or should I just open up a CacheIRReader on the raw byte pointer it exposes?
18:19djvjah, it seems that's how its done...
18:19tcampbelllook at CacheIRReader uses in BaselineInspector
18:19djvjyeah just came across those..
18:20_6a68till: ok, I'm trying to summarize what we do and don't know in an etherpad. would you mind taking a look briefly and adding any corrections or comments?
18:20tcampbelldjvj: what's your end goal?
18:20djvjthis feels really brittle to me.. kind of feels like this matching code should be methods hanging off CacheIRStubInfo instead..
18:20djvjtcampbell: working on std_Array now. Ion currently needs the baseline ICCall_Native stub to exist, because it uses that to pull out a template object for the array.
18:21tcampbellyeah, I got the impression it was just putting the mess with the other baselineinspector mess. The whole inspector is less ideal
18:21djvjtcampbell: if we move to custom std_Array CacheIR stub, we won't generate the ICCall_Native stub, and the ion code needs to pull the data from the CacheIR stub instead
18:22tcampbellah, yeah baselineinspector is exactly what you'd want it seems
18:22tilljgruen: it has happened before. More likely that it's something in our code, perhaps in win32-specific code, though
18:24till_6a68: based on the crashes we see, I'd expect these to affect users, too :(
18:24till_6a68: to confirm - this only happens on beta, not nightly at all?
18:24_6a68till: I think so. RyanVM ^^ is that right? beta not nightly?
18:25RyanVMcertainly the ts_paint issues only hit on Beta
18:25RyanVMwe could try a 57-as-Beta Try push to see if it hits there too
18:25RyanVMbut 57-as-Nightly doesn't appear to be
18:26jgruenRyanVM: what would i be looking for on crashstats?
18:26RyanVMno clue
18:26RyanVMthe stacks are all over the place in the ts_paint crashes
18:31tillRyanVM: are the ts_paint crashes completely reproducible?
18:31RyanVMnearly 100% on Beta, yes
18:32tillone thing to try is bisecting the patch
18:32tillbut 57-as-beta would also be useful
18:32_6a68till: looking more closely at bug 1386333, all that bug did was remove a pref
18:32firebot FIXED, Remove Screenshots rollout pref
18:32_6a68so the synchronization code I was thinking about isn't the change that triggered this
18:33jgruen_6a68: this is dumb, but what if we tried a patch that just forced the pref to false instead of removing
18:33_6a68jgruen: we could, my concern is that there may be a class of really gnarly process-crashing bugs that we keep bumping into
18:34till_6a68: ah, good point
18:34jgruenpossible there's just some bat shit error where there's a race inside the webExtension b/c as of the pref existing at last shutdown?
18:34_6a68jgruen: I mean, the fix for one of the other crasher bugs was to add a shutdown timeout
18:34firebotBug 1380267 FIXED, Crash in AsyncShutdownTimeout | profile-change-teardown | Extension Shutdown: jid0-GXjLLfbCoAx0LcltE
18:34till_6a68: I think the pref *removal* is a red herring: the crashes only occur on beta, and this patch happens to enable the extension on beta
18:35_6a68till: ah, good point
18:35tillRyanVM: can you do a 57-as-beta run with the extension enabled?
18:36RyanVMtill: bug 1386333 is landed on trunk, so I'm just going to push to try on top of m-c tip
18:37tillRyanVM: sounds good
18:37RyanVMif that's broken, then we can try to bisect on trunk
18:37_6a68till: you know, up till now, we've had shield pref users on in beta
18:37_6a68so we never started with the pref flipped
18:37_6a68it may indeed be a super subtle timing issue (but where...)
18:37till_6a68: ah, that's an interesting data point
18:39_6a68dunno if it's useful to mention, but screenshots double-delays startup. most of the code doesn't run until sessionstore-windows-restored fires, then the webextension shell is inited, and the webextension code isn't loaded until the user clicks the button for the first time
18:40_6a68there may be certain assumptions about the timing of addon startup that we're violating
18:40_6a68especially as the webextension code is new
18:42RyanVMok, did the 57-as-Beta Try push. Probably won't have results for 1.5-2hr
18:45jgruenRyanVM: just to make sure i'm understanding correctly will that tell us if is due to some underlying diff btwn 57 and 56
18:45firebotBug 1386333 FIXED, Remove Screenshots rollout pref
18:45RyanVMbetween Nightly and Beta
18:45RyanVMwe have code that's compiled different depending on whether it's nightly or beta
18:45jgruenokay got it
18:45RyanVMso literally the same rev could behave differently depending on which branch we're on
18:46jgruenthis is a subtlety about which i was unaware, and I assume beta and release are compiled the same way
18:46clouserwthanks for kicking that off RyanVM
18:46jgruenmeaning we're borked either way :)
18:47_6a68RyanVM: what's the Try link?
18:47jgruenand yes, thank you and till, who is staying on late
18:48_6a68thx! I'll add it to the etherpad
18:49RyanVMif these tests are green, then it's a difference between 56 and 57. If they fail, it's a difference between Beta and Nightly.
18:49RyanVMeither way, we can bisect once we know
18:49RyanVM(and I'm betting on the latter)
18:52tillgiven how recent the fork was, I'd bet on the latter, too
19:13jgruenRyanVM: what would the real world equivalent of a ts_paint crash look like, out of curiosity?
19:14jgruenwould a tab go down or hang or something else?
19:14RyanVMcontent process crash
19:14jgruenwould it summon Baphomet?
19:14jgruenokay cool
19:37mrgigglesthe mozilla-inbound tree is now closed (packaging bustage)
19:45mrgigglesthe mozilla-inbound tree is now open
22:06RyanVM_6a68: till: exciting, all green
22:10_6a68RyanVM: fascinating. what are the next steps?
22:14RyanVM_6a68: let me think about it for a bit
22:28RyanVM_6a68: i need to make dinner, I'm going to stew on it a bit more
22:28RyanVMthose weren't the results I was expecting to see
22:28_6a68Sounds good to me. Thanks for your help today!
22:33RyanVMI think the next thing I'm going to try is running the simulation on a trunk rev closer to the merge a week ago + the pref change on top
22:33RyanVMif *that* fails, we can bisect what made it green
22:40_6a68Sounds great
22:42RyanVM_6a68: ok, next round:
22:42RyanVMbottom push is on top of rev 357ef8a67e88, which was the merge to m-c that first included bug 1386333
22:42firebot FIXED, Remove Screenshots rollout pref
22:43RyanVMone on top is rev 52285ea5e54c, which was m-c immediately after the bump to 57 with that patch applied on top
22:44RyanVMoh hrm, that top push might need another patch on top of that to avoid completely busted runs for other reasons
22:48RyanVMk, new push on top might be a tad greener than the middle one
22:48RyanVMor at least, less broken for unrelated reasons
11 Aug 2017
No messages
Last message: 11 days and 7 hours ago