mozilla :: #moc

9 Aug 2017
01:00nagios-scl3New Sysadmin OnDuty is jlaz
04:05safwanjlaz: Hey
04:05safwanSome days ago, you have pinged in sumo room for some issue with sumo
04:05safwanCan you rephrase it?
04:14jlazsafwan: ahh, when trying to load sumo, i would reach a 302
04:14safwanjlaz: Yap. 302 to your proper locale
04:14jlazit did not last for long though, maybe a few mins, before recovery
04:15safwanjlaz: You were redirect with 302 with blank page?
04:15jlazcorrect, but it would eventually load the correct page (to the correct locale)
04:21safwanoh!
04:22safwanmaybe a request hang
04:22safwanthis redirect is done by django end
04:22safwanand maybe any problem
04:22safwanthere was no deployment going on that time
04:25safwanjlaz: BTW, in which timezone you are in?
04:27jlazsafwan: i am on pacific time
04:33safwanfirebot: help
04:33firebotsafwan: help info /msg'ed
07:08aselagea|builddutygood morning
07:08aselagea|builddutykms02.ad.mozilla.com seems to be having issues :-/
07:09aselagea|builddutylots of socket timeouts in #buildduty
07:14jlazhmmm
07:15jlazill downtime these, these checks are not yet live
07:15aselagea|builddutyjlaz: thanks
07:16jlazno problemo, sorry about that
07:20ryancYup
09:00nagios-scl3New Sysadmin OnDuty is pir
13:11arrpir: looks like ryanc added a bunch of checks to nagios that are using the wrong hostname for ntp checks in mdc1. Can you downtime all of those, please, till he gets a chance to fix them?
13:11arr(any host using ntp time check in mdc1)
13:11arrfor releng
13:11arrthey're bombarding irc with alerts
13:12pirarr: I'll need something more specific than "a bunch of checks"?
13:12arrall "ntp time" checks on releng-mdc1
13:13* pir takes a look
13:16arrpir: think I got em all, nm
13:18piryeah, I just got a git error trying to commit commenting them out...
13:18arrI don't think you can easily comment them out
13:18arrsince they're using a hostgroup that crosses datacenters
13:18arrI just downtimed them
13:19pirbut they're configured in the releng mdc1 file
13:19pirwhich should just be mdc1
13:19arrI don't want to disable all monitoring, just ntp time
13:19pirI didn't suggest disabling all monitoring
13:19pirI was just going to comment out ntp time in mdc1
13:20arrwhich file?
13:20arrbecause I think it's defined in the common services.pp
13:20pirgit-internal/puppet/modules/nagios4/manifests/prod/releng/mdc1.pp
13:20pir# service_description => 'ntp time',
13:21arrhelps if I do a git pull
13:21pirbase services.pp only applies them to 'nagios-releng'
13:22arrand now I see where I can fix it
13:22pirwhich is not mdc1
13:27arrwow, there's a lot of non-mdc1 stuff in that mdc1 config file
15:01smaugDoes anyone know how to use irccloud?
15:01pirYou log into it and type?
15:02pirsorry, that's a very vague question
15:03smauginstructions say I should load https://irccloud.mozilla.com/ and login
15:03smaugit only gives me okta login
15:03smaugbut then "Sorry, you can't access IRCCloud because you are not assigned this app in Okta."
15:03pirare you an employee?
15:03smaugyes
15:03pirfile a servicenow ticket?
15:04smauguh
15:04pirin general the moc deals with service issues, the servicedesk deal with individual user issues
15:04smaugaha
15:04pir#servicedesk or the hub should be able to get you an answer
17:00nagios-scl3New Sysadmin OnDuty is sal
17:32rmfdDid you guys happen to notice anything go down in MDC1? If so, what and for how long?
17:32pirrmfd: when?
17:32rmfdAs far back as 45 minutes ago?
17:33pir17:24:31 UTC -> 17:25:45 UTC a set of alerts came in
17:33cknowlesrmfd: check out #postscl3 - we've been updating with status of the network from our watching of things.
17:33gcox1638UTC, yes, we lost mdc1 #postscl3
17:33rmfdI'm interested in what the MOC saw on their monitoring
17:36pirar 16:38 UTC... nothing
17:36cknowlesyeah, over in #sysadmins - they saw several alerts for mdc1 stuff around 10:23 pacific.
17:36pirbut yeah, #sysadmins is the place to be for that
17:37gcoxhttps://gcox.pastebin.mozilla.org/9029274 # mdc1 alerts from #sysadmins over the last hourish.
17:37pir16:43:25 UTC master1.ldap.mdc1.mozilla.com has an NRPE timeout
17:37pirrecovered at 16:47:40 UTC
17:45rmfdWhat network gear are you currently monitoring in MDC1?
17:47pirnot much. There's an open bug that's very confused over IP addresses
17:48fauwehrmfd: I sent you the nagios URL via PM if you want to check that list
17:48pirhttps://bugzilla.mozilla.org/show_bug.cgi?id=1387223
17:48firebotBug 1387223 is not accessible
17:49pirhttps://bugzilla.mozilla.org/show_bug.cgi?id=1387225
17:49firebotBug 1387225 is not accessible
17:50pirtime for me to get out of here, 'night all
17:50cknowlespir++
17:50salpir++
10 Aug 2017
No messages
   
Last message: 13 days and 3 hours ago