7 Sep 2017
01:00nagios-scl3New Sysadmin OnDuty is ryanc
04:51johnbryanc: ignore any alerts out of BER3 for the next hour... moving optics and getting things ready for the second circuit. thanks
04:52ryancAlright johnb
05:16justdaveSo we just got the RFO for that mysterious config change in Paris from Level 3. And it was essentially "we can't figure out what the RFO is"
05:17justdavethey've got no idea how or why that config change was made, but they can prove the live config was different than what was in the backup and they had to restore it.
05:18ryancjustdave: OK
05:19justdaveAnd I know from talking to him on the phone that it theoretically wasn't the change we requested getting done early because he was the one supposed to be making that change and he hadn't done it yet.
05:35johnbryanc: should be coming back... Telia dampend us for the up/down for a few
05:38fauwehjustdave: that is a troubling thing to hear from a carrier.
05:40johnbgood thing we are getting rid of Level3 sooner than later
05:40johnbnothing but a pain all around
05:43justdavethis is just re-enforcing the reason we were already planning on leaving them
05:46fauwehtotally. that's a completely unacceptable RFO lol
09:00nagios-scl3New Sysadmin OnDuty is Usul
12:45tofumatthey ops, seems like is down
12:48Usultofumatt: hightlight my name
12:49* Usul looks
12:50tofumattUsul: oops, sorry 'bout that.
12:50Usulso I get a white page is that what you get tofumatt ?
12:52tofumattsomeone in #amo reported it was a 504
12:52tofumattdidn't check myself
12:52Usulloads for me now ....
12:52Usulcan you check again ?
12:52tofumattseems back
12:52tofumattYou fixed it!
13:40dhouseHi Usul, can I open an RCA for bug 1397674? Are rca bugs still created or is the process in ServiceNow?
13:40firebot NEW, talos-ix-linux64 pool is not doing any work
13:45Usulprocess is in service now
13:46Usulpir can you confirm ^^^
13:46* Usul is just back from PTO
15:01Usulcrap netsplit
15:27unixfairyjen: are you aware of the google drive issues
15:33mpoessyunixfairy: hi. i will pm you
16:44Usuldhouse: I'll open the RCA
16:46arrdhouse: we generally run our own RCA process for internal releng stuff
16:48dhousearr: okay cool
16:48dhouseUsul: did you make a bug? I can move it over and reassign, or you can skip making the bug/rca
16:49UsulI'm doing the things will assign the bug to you
16:49Usulso you can assign on whom needs to work on RCA
16:51firebotBug 1397821 is not accessible
16:53Usuland sorry for the delay
16:55dhousethank you for setting it up
17:00nagios-scl3New Sysadmin OnDuty is fauweh
18:54whimboofauweh: hi. is there anything wrong with DXR again?
18:54whimbooI do not get any response here
18:56whimboowell, looks like this file is huge!
18:56whimbooso yeah, it results in an internal server error
19:02fauwehwhimboo: no issues noted on our side
19:02fauwehmight check in #vcs?
19:02fauweh17mb is definitely large heh
19:03fauwehbut shouldn't generate an ISE I wouldn't think
19:03fauweh(and I get the same response)
19:04whimboofauweh: does dxr load this file directly from
19:07fauwehwhimboo: I do not know about how that flows
19:36bcfauweh: Hey. I'd like to take down to finalize the change to max_filedesc. Should be on the order of 15 minutes. If you could downtime it to shut the nagios alerts up while I'm working on it, that would be great.
19:59fauwehhey bc, sorry for the delay, it is downtimed for you now
19:59bcfauweh: Thanks. I'd already started updating the other workers so I'll have to wait for that to complete before working on proxy. Will be a bit longer than 15 minutes.
20:00fauwehok no worries, we can always extend if needed
20:50bcfauweh: All done.
20:55fauwehbc: great! and I got a host up notice for proxy too
20:55bcThanks. The max_filedesc change is now permanent.
20:56fauweh++ thanks bc!
22:07justdavedo we have a timeline of that paris outage incident somewhere yet?
22:08saljustdave: is it for the rca?
22:10justdaveyeah, sort of. Got the ear of a channel manager at level3 and want to send him a rundown of what happened to help explain why we're so pissed at them right now. :-)
22:10saljustdave: failover happened at 16:30 UTC thats what went out on the resolution email from statuspage
22:10salbut thats not the time when they reverted the change
22:10sallet me check for that time
22:11sal6:17:10 PM <@justdave> sal: FYI, PAR1 is fully back online
22:11salthats when lvl3 reverted fixed whatever they messed up and the office went back to normal
22:12justdavelooks like 8:47UTC is when nagios alerted
22:12justdaveer, wait, I&#39;m looking at the wrong bug
22:12* justdave glares at the awesome bar for not being awesome enough
22:13saljustdave: the start time is noted in the rca doc
22:13salhaha awesome bar!
22:13justdave1397270 there we go
22:14saljustdave: and here&#39;s the rca bug
22:14firebotBug 1397272 NEW, RCA 2017-09-06 PAR1 office connectivity failure
22:14justdave6:39am Pacific is when the ticket was filed with level3 that never got touched despite getting escalated twice
22:14justdavethat&#39;s the important times I need to give him is when the ticket was filed and when we escalated
22:17saljustdave: I called twice not sure if they noted, at 1320 and 1420 pst
8 Sep 2017
