mozilla :: #taskcluster

10 Aug 2017
07:54franziskusjhford: when do you come in today?
08:59aobreja|builddutywcosta: regarding autologin issue Bug 1376807,there was a change of plan,the change will be rescheduled probably for the next week after discussion with folks on #releaseduty, sorry for bothering you :-(
08:59firebothttps://bugzil.la/1376807 NEW, jwatkins@mozilla.com Autologin is failing on the mac machines
09:00aobreja|builddutythere is a release in progress and we don't want to influence the status with this change
09:01aobreja|builddutywcosta:I will update the bug,thank you for yours support and sorry again for this delay
09:02aobreja|builddutyI will be on pto next week but maybe someone from the team will do the change,if not will come back to it after next week
09:08wcostaaobreja|buildduty: np
09:08* wcosta goes back to bed
09:42xidornbased on what data do we chunk tasks currently?
10:01jgrahamxidorn: That doesn't seem like a TC question, but I'm not sure there's a good answer for "what data". What problem are you trying to solve?
10:27xidornjgraham: so, I enabled a bunch of new mochitest tasks for stylo today, and it seems there is one chunk which runs specifically slow on debug build, and it usually takes ~88mins, and intermittently takes over 90min and cause timeout failure.
10:28xidornjgraham: so I wonder whether it is possible to split chunks in different way to make it not timeout
10:29jgrahamxidorn: so you probably want to talk to ahal on #ateam but it sounds like you should split that up
10:29jgrahamYeah, so it's not automatic (or, I don't quite know how mochitest does it; if there are fixed lists of manifests per chunk or if it does 1/N manifests per chunk)
10:30jgrahamBut anyway it's possible to change including just for that specific configuration
10:31xidornjgraham: ok, thanks
11:59Callek....hrm weren't all L3 workers logs restricted, or just L3 Builders?
11:59* Callek sees https://tools.taskcluster.net/groups/MxdlUuJTS6GSH4B3K_6HYA/tasks/HnyZwzE1TiWrFzXyLf6TPw/details having a live log.
12:06garndtJust builders, and we do still upload that artifact at the end
12:06garndtWe just do not stream it live during task execution
12:41gerard-majaxhm
12:41gerard-majaxjust thinking about something annoying
12:41gerard-majaxthe fact that on github, when a PR has one task that fails, it stays in a failed status
12:42gerard-majaxI think it stays that way even if the same task has been retriggered and succeeded
12:47dustinjust builders
12:47dustinoops
12:48dustinirccloud is weird, I swear it didn't show me that scrollbcak until I typed sometihng
12:48dustingerard-majax: yeah, retriggering just adds to the set of tasks, which is still not 100% success
12:48dustinI don't think there's any way for github to know that passing task B is "the same as" failing task A
12:49dustinfor tc-github to know
12:49gerard-majaxdustin, I don't know the details of tc-gh, but tc-gh has ways to know that task A and B are the same, do they?
13:41tarekdustin: hellO!
15:11bstackhttps://github.com/Teevity/ice
15:15franziskusgarndt: did you have a chance to look at https://bugzilla.mozilla.org/show_bug.cgi?id=1389035? Weird things are happening
15:15firebotBug 1389035 NEW, nobody@mozilla.org Disk space full
15:16garndtI did, but ran into a meeting. The worker defaults to requiring 10gb free to be able to claim a task, but it does not limit that task to only being able to use 10gb of disk so it (or another task running on the machine) would easily eat space while that task si running
15:18franziskusI see. maybe the worker should require a little more then? or Say at the beginning that it can't handle it.
15:19garndtit's possible to override that to be a different default
15:19garndthow much space do you think tasks will typically need running on that worker type?
15:19garndtI think your team is the main team using the hg-worker
15:21garndtthanks for running that dustin
15:22jhfordEli: i was a little too good to go
15:22Elilol
15:23franziskuswell, teh docker image is 4.1gb at the end. I'm not sure how much space it needs during generation
15:27garndtok, let's change it to be 20gb per task and see where that takes us
15:29garndtdone
15:56whimboogarndt: hi. is that known?
15:56whimboohttps://login.taskcluster.net/?target=https%3A//treeherder.mozilla.org%3A443/%23/login&description=Treeherder
15:56whimboointernal server error
15:56whimbooi cannot login from treeherder
15:57whimbooi will file a bug
15:57garndthrm, I get it too, I'm not sure if it's known. dustin any ideas^
15:57dustinsorry, I messed something up
15:59whimbooi filed bug 1389135
15:59firebothttps://bugzil.la/1389135 NEW, nobody@mozilla.org login.taskcluster.net is down due to internal server failure
16:07franziskusthanks garndt I'll retrigger and see if it works
16:07dustintarek: hi?
16:07whimboodustin: thanks!
16:48tarekdustin: hey, I was wondering how to deal with big files in task cluster. Let's say a 300m zip file a test would need to run
16:48tarekwould cloud-mirror be the thing to use ?
16:49garndttarek: cloud-mirror is a service that users do not have to act with directly.
16:49garndtwhat are you attempting to do?
16:50garndthandling large artifacts is not an issue, we already do that for docker images used by tasks where they can be many gigabytes
16:50tarekgarndt: a test that would download a big profile (300m zip) to do some perf tests
16:50tarekneat
16:51tarekis there an example I could study to see if it fits the use case ?
16:54garndtmost of our tasks that do such a thing use mozharness I believe to download the files. There isn't anything fancy about it. Do you have something that can run locally that does such a thing? Perhaps I can translate that into an example for you
16:55tarekfor now I just hack talos' ffsetup that clone the profile so it would use the big one
16:56tarekI guess I can come back to you with a more advance prototype that pulls it from S3
16:56tarekadvanced
16:56garndtwhere does this profile come from? is it something users put somewhere, or is it automatically generated by another job?
16:56tarekfor now it's a profile generated by a script (see https://bugzilla.mozilla.org/show_bug.cgi?id=1365296)
16:57firebotBug 1365296 ASSIGNED, dstrohmeier@mozilla.com Create reference profiles for heavy users based on Telemetry data
16:57tarekthe idea is to generate it from time to time and run tests with it to see if performances are impacted with big profiles
16:58garndtis it one test run that you have, or are there multiple that you would like to run in parallel?
16:59garndtbased on what I see, you probably want a set of tasks, one that generates the profile and uploads it as a task artifact, and the downstream test jobs that use that profile for whatever they're doing
16:59tarekI don't know yet
16:59* tarek nods
17:00tareksounds like I can defer the task that generates the profile for now and focus on the downstream test jobs
17:00tarekthanks a lot for your feedback - I'll dig :)
17:01garndtno problem, let me know if I can help put anything together for ya
17:01tarek++
17:01dustinthans greg :)
17:08garndtnp
17:31garndt!t-rex deploying new amis to geck-t-linux-large that includes some misc fixes as well as a fix for superseding.
17:32dustinwe should rename it awesomeseding
17:32dustinnow that it works
18:57RyanVMI'm getting " abort: repository /home/worker/checkouts/gecko: timed out waiting for lock held by befbe76f813a:7 " failures on a Try push for my decision task. I heard yesterday that requires killing the worker?
18:57RyanVMhttps://tools.taskcluster.net/task-inspector/#flfu3WpGTZeRtxPzfs0iuw
18:57dustinthat sounds like a server-side thing
18:57dustinhgmo
18:58dustinhm, now that I look at the logs maybe not
18:58RyanVMgps pointed at bug 1297153 yesterday
18:58firebothttps://bugzil.la/1297153 NEW, nobody@mozilla.org Detect and recover from active locks and transactions
18:58RyanVMsee also: bug 1388944
18:58dustinah, ok
18:58firebothttps://bugzil.la/1388944 DUPLICATE, nobody@mozilla.org Gecko Decision Task frequently fails with "repository /home/worker/checkouts/gecko: timed out waitin
18:58dustinI can kill the instance
18:58RyanVMthx
18:59dustinboom
19:04garndtthere was a PR to try to fix this in the worker, I wonder if we can resurrect it
19:04garndtI'll check in on it
20:02gerard-majaxgarndt, dustin, hello, are you working?
20:02gerard-majaxgarndt, dustin, I'm properly logged in with my LDAP account, and I cannot retrigger that task: https://tools.taskcluster.net/groups/COqxkk0cTR29xYTBdBSbCA/tasks/PfC9eC6fQnGCpmm2tjl-pA/details
20:03gerard-majaxgarndt, dustin, "You are not authorized to perform the requested action. Please sign in and try again."
20:04garndtdid you just recently sign in? could you try signing out and back in?
20:04gerard-majaxI tried that already yes
20:05gerard-majaxI have been able to retrigger other similar tasks earlier today
20:06gerard-majaxgarndt, In other words you are missing scopes from one of the options: * Option 0: - "queue:create-task:lowest:aws-provisioner-v1/deepspeech-worker"
20:06garndtoh, there's more to the error
20:06garndthrm ok
20:06gerard-majaxgarndt, this is what I see looking in the debugger when looking at the PUT 403
20:07garndtbut that error did not show in the retrigger popup?
20:07gerard-majaxgarndt, the other day, pmoore|PTO-back-aug21 had to play around queue:create-task:lowest:aws-provisioner-v1/deepspeech-worker and queue:create-task:aws-provisioner-v1/deepspeech-worker
20:07gerard-majaxnope
20:07gerard-majaxpopup only says the error I pasted at first
20:07gerard-majaxI will need to be able to perform a lot of retriggers tomorrow :/
20:08gerard-majaxone of my test is intermittent
20:08garndtEli: we used to display scope errors int he dialogs....it doesn't seem to appear in this case ^
20:08garndtwhat's the login you used?
20:08garndtokta and your ldap creds?
20:08gerard-majaxgarndt, yes
20:09Eligarndt: i dont think that has changed
20:09katsseeing a bunch of "abort: could not lock repository /home/worker/workspace/build/src: Permission denied" on jobs on my try push. anybody know what's going on? https://treeherder.mozilla.org/#/jobs?repo=try&revision=a611cdd4761b43d9ed29d8ffe3388f91bc1a3f1c
20:11gerard-majaxgarndt, okay, I only have queue:create-task:aws-provisioner-v1/deepspeech-*
20:11gerard-majaxgarndt, not queue:create-task:lowest:aws-provisioner-v1/deepspeech-*
20:11garndtyea I was just checking some stuff out
20:12garndtI'm nto sure how you were able to retrigger earlier though
20:12gerard-majaxwell we changed scopes
20:12gerard-majaxit is possible I was relying on previous credentials matching
20:13gerard-majaxgarndt, can you add it?
20:13garndtgerard-majax: want to check again?
20:13gerard-majaxgood
20:13gerard-majax:)
20:13gerard-majaxthanks!
20:14garndtnp
20:14RyanVMkats: someone with the ability to do so needs to kill the instance that task is trying to run on
20:15garndtkats: I killed the 2 instances causing an issue and issues a request to purge the caches
20:15katsRyanVM: garndt: thanks
20:15garndtnp, sorry about that!
20:17gerard-majaxgarndt, 20 retriggers :)
20:50garndtwcosta: when looking at the coalescing thing, you're not going to have luck with os x and windows because looking at the in-tree task stuff, coalescing seems to only be setup for linux builds
20:51garndthttps://dxr.mozilla.org/mozilla-central/search?q=coalesce-name&redirect=false
20:52wcostastory of my life
20:53* garndt hugs wcosta
20:55wcostaS2
11 Aug 2017
No messages
   
Last message: 7 days and 29 minutes ago