mozilla :: #taskcluster

11 Aug 2017
07:50franziskusanyone around?
11:01gerard-majaxdustin, https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu
11:01gerard-majaxdustin, we used to have stable, route-based links there
11:01gerard-majaxdustin, it's not the case anymore (
11:01gerard-majax:(
12:33dustinI don't think you can have a slash in a route name?
12:34dustinoh
12:34dustinyou mean you could right-click and copy the link
12:34dustinyeah, that's a regression -- file a bug plz
12:34dustinor re-open the old one
12:34dustinhttps://bugzilla.mozilla.org/show_bug.cgi?id=1355090
12:34firebotBug 1355090 FIXED, dustin@mozilla.com In the indexed artifact browser, use links constructed to include the route, not the taskId
12:36gerard-majaxdustin, reopened :)
12:37dustinthanks :)
12:37gerard-majaxI'm trying to kill taskcluster: https://tools.taskcluster.net/groups/PFHA9NiyTK216P8t-XUufw
12:37gerard-majaxmuch retriggers :)
12:57gerard-majaxdustin, would you know if we can bump the amount of machines for deepspeech-worker?
12:57gerard-majaxdustin, I have many retriggers pending, I'd like them to complete faster
12:57gerard-majaxdustin, so more machines for an hour or two would be enough
12:57gerard-majaxdustin, taskGroup: https://tools.taskcluster.net/groups/PFHA9NiyTK216P8t-XUufw
12:58dustinthere are 21 running now
12:58gerard-majaxyeah
12:58dustinwhich is odd since maxCapacity is 20 :)
12:59gerard-majaxI'm a magician, remember.
12:59dustinok, bumped to 30 :)
12:59dustinhaha
12:59gerard-majaxdustin, would it be possible to go to 50 ?
13:00gerard-majaxdustin, just the time needed to absorb the current load, it can be switched back to 20 after
13:00gerard-majaxI'm waiting for thos eretriggers to complete to be able to finish a PR
13:00dustinwe'll calculate the cost and select an appropriately-priced beer in Cancun :)
13:00dustindone
13:01gerard-majaxthanks :)
13:05gerard-majaxdustin, and 50 running :)
13:06dustinping me when your queue is empty and I'll dial it back
13:08gerard-majaxdustin, it should be okay within 20 mins, some tasks takes ~3min some ~8min
13:16gerard-majaxdustin, you can probably start to decrease
13:16gerard-majaxdustin, there are now less than 20 jobs to finish :)
13:17dustink, thanks
13:18franziskusgarndt: looks like 20gb is a too much for the machines :( now we can't run anything.
13:25garndtHrm. I'll take a look soon. I'm relocating somewhere that can caffeinate me
13:28dustinI support this relocation
13:46garndtfranziskus: ugh, yea I see what I did there....
13:47franziskus;)
13:57franziskusgarndt: looks like jobs are being picked up again
13:58franziskusthanks!
13:58franziskuslet's see if the original issue is solved as well :)
14:02garndtsorry about that, I'm just tweaking some things behind the curtain here
16:11mystorWhat triggers a task failing on taskcluster for IDLENESS_LIMIT_EXCEEDED?
16:15dustinthat doesn't sound familiar to me...
16:15dustinhttps://github.com/search?q=org%3Ataskcluster+IDLENESS_LIMIT_EXCEEDED
16:16dustinoh, that search is not what I thought :)
16:16dustinanyway, where are you seeing that
16:18dustinhm, googling finds https://bugzilla.mozilla.org/show_bug.cgi?id=1333957
16:18firebotBug 1333957 NEW, nobody@mozilla.org Make "Aborting task - max run time exceeded!" a Treeherder-parseable message
16:19dustinand indeed https://github.com/taskcluster/generic-worker/blob/7158947c8e23bf72d1dbd0c90fb43c64e1cd9eff/process/process_windows.go#L39
16:20dustinI still have no idea what it means
16:20gerard-majaxmystor, process running for long time but nothing on stdout ?
16:21dustinthat's about all I can guess too
16:21dustinthanks :)
16:21gerard-majaxI do remember hitting that once
16:21gerard-majaxI also had tasks generating too much log and getting kileld for that
16:21dustinah
16:21dustinhttps://github.com/taskcluster/generic-worker/blob/a262293216c8e5a39b15f87a1e2b5783c923c2d1/main.go#L181
16:22dustinhm, maybe that's unrelated
16:22gerard-majaxjust chatting, indeed
16:22gerard-majaxI'm technically officially PTO
16:22gerard-majaxso I cannot talk about work here.
16:23dustinwouldn't want to get reported to the authorities
16:23dustinmystor: anyway, can you put a comment on the bug I just linked?
16:23dustinI'm starting to suspect its's a bug in generic-worker, which thinks it's idle while it's running a task
16:36mystorhey, sorry for disappearing
16:39mystordustin: bug 1333957?
16:39firebothttps://bugzil.la/1333957 NEW, nobody@mozilla.org Make "Aborting task - max run time exceeded!" a Treeherder-parseable message
16:39dustinyes
16:40mystordustin: cool, So basically this is just "we ran out of time to keep running tests"?
16:40mystordustin: Not "we got no output for > 3 minutes" or something like that?
16:40dustinno, I think it's "I'm going to shut down because I'm not running a task"
16:40dustinwhich is clearly not the case, since it's running a task
16:40dustinbut I'm not sure
16:41mystorHmm,
16:44garndtwhat task did this happen on?
16:45mystorgarndt: https://treeherder.mozilla.org/logviewer.html#?job_id=122463149&repo=mozilla-inbound
16:45garndtah yea, the task took too long https://treeherder.mozilla.org/logviewer.html#?job_id=122463149&repo=mozilla-inbound&lineNumber=4706
16:45mystorgarndt: This was caused by a patchset I was working on and I'm trying to figure out what caused it (x86 windows only)
16:46garndtmax runtime is probably set to 60 minutes
16:46garndtand this was running for about that long
16:46mystorgarndt: The timestamps for the most recent test and the timeout seem pretty close together, so the tests were probably still making progress, but being slow, right?
16:48garndtyea, somewhere in running the test suite took awhile
16:48garndtsuite started at 4:00 and the task was killed at 4:57
16:48dustinthat sounds like a billing hour..
16:48mystorgerard-majax: cool, thanks
16:48mystorgarndt: thanks
16:48garndtdustin: sounds like maxruntime being hit :)
16:48mystor(sorry random person I accidentally pinged)
16:49dustinI think that's a different error
16:49garndt[taskcluster 2017-08-11T04:57:41.999Z] Aborting task - max run time exceeded!
16:49dustinIDLENESS_LIMIT_EXCEEDED is only in process/process_windows.go
16:49garndthrm
16:49dustinhuh
16:50dustinanyway, max run time exceeded is definitely the proximal cause
16:50dustinwhy that also triggered IDLENESS_LIMIT_EXCEEDED, who knows
16:51dustinlooking around that code, I don't know what a Verdict is either
16:51dustinah
16:51dustin case r.SuccessCode&(subprocess.EF_INACTIVE|subprocess.EF_TIME_LIMIT_HARD) != 0:
16:51dustin return IDLE
16:51mystorahh, ok
16:51mystorSo basically the IDLENESS_LIMIT_EXCEEDD is a red herring?
16:52dustinI say "ah" like that meant anything to me
16:52dustinyeah, I think so
16:52mystorAnd we just use it as the default if we hit TIME_LIMIT_HARD?
16:52dustinwhatever that is, yes
16:52mystorcool - I'll stop looking into things being idle
16:52mystorI'm guessing it's 1 h
16:52dustinthere's a maxRunTime configured for the task, yes
16:52dustinthat can be increased, but for most tests it's 1h
16:53gerard-majaxdustin, so this i what I suggested earlier?
16:55dustinyeah
16:55dustinwe didn't have the task at the time so all I could guess on was IDLENESS_LIMIT_EXCEEDED
16:55dustinseeing the task shows it's something else
16:56garndt11:20 AM <gerard-majax> mystor, process running for long time but nothing on stdout ?
16:56garndtgerard-majax: ^ even if things are loggigng to stdout we will still kill it when hitting the maxruntime
16:57dustinBuildbot used to have a no-output timeout, but I don&#39;t think any of the TC workers do
16:57gerard-majaxi&#39;m pretty sure I have hit that
16:57gerard-majaxbut maybe a long time ago
16:57mystorcool, I think I get what&#39;s going on then
17:41RyanVMis &quot;Retrigger Task&quot; from the Actions dropdown in the Task Inspector equivalent to |taskcluster task rerun| from taskcluster-cli?
17:41gerard-majaxRyanVM, I think retrigger will create a new taskId
17:41RyanVMso closer to a TH retrigger then?
17:50dustinRyanVM: yes
17:54RyanVM*sigh* up to retrigger #6 on this job trying to get a non-broken windows worker
17:54garndtcloser, not quite though...I think retrigger right now in the task inspector will just retrigger that task with a new task ID, not recreate the node and dependencies in the graph like retrigger in TH will do
18:09RyanVMyay, retrigger #8 is getting a not-slow worker
22:12dustinlogins are broken again, working on it
22:16dustinback now
22:18dustinhm, what else can I break at 6:15 on a friday...
22:18* dustin eyes mozilla-taskcluster
22:20glandiumis it possible to use docker images created on taskcluster with the github integration?
22:21dustinish
22:21dustinyou&#39;d have to hard-code the task reference, and when that task expired your task would stop working
22:21dustinadmittedly that&#39;s usually in 1 year
22:21dustinalternately you could pull the image and push it to docker hub under some identifier you own
22:22dustinand just refer to it that way
22:22glandiumI guess my question was too open ended :) can I create a docker image from .taskcluster.yml on github and then use it?
22:26glandiumOne problem is the optimization part, which would require a decision task-like task
22:27dustinright
22:27dustinthere&#39;s currently an extra scope required, as well, to enable use of docker-in-docker
22:27dustinwe could enable that for a specific repo tho
22:27dustinother than that, yes, absolutely
22:27glandiumand there&#39;s a scope to create tasks, isn&#39;t there?
22:28dustinyou already have that (for the github-worker workerType)
22:32dustinok, off to walk the dog
22:32dustinhi & bye :)
22:32glandiumargh, I was about to ask how I get that scope
22:41jonasfjyou&#39;ll essentially end up reinventing the decision task logic...
22:42jonasfjI wish we had some sort of easy to use... docker image that contained a subset of that...
22:43glandiumjonasfj: I don&#39;t mind reinventing the decision task logic. Plus, I&#39;ll have enough jobs that I&#39;ll want to generate the task definitions anyways
22:43jonasfjI&#39;ve been dreaming of making a simple decision task that would just combine json-e + a few macros to do everything.
22:43jonasfjin that case: warning dind is horrific... steal the in-tree logic that communicates directly with dind using curl
22:44jonasfjor use my experimental qemu stuff to build images :) hehe
22:45jonasfjwith the qemu stuff you get full docker control and can upgrade it... but I&#39;m still a bit of having it deployed the right way
22:45jonasfjcurrently working towards that so I can rebuild in-tree docker images using a per-task VM
22:48glandiumjonasfj: how do I get the right scopes for dind?
22:49glandiumalso, what index paths are allowed for github projects?
22:50jonasfjwe have &quot;garbage.*&quot; for whatever, but let&#39;s make up a scheme for github projects...
22:50jonasfjnote: you&#39;ll want to use garbage for development...
22:51glandiumbtw, scopes appear empty on the task inspector
22:52jonasfjyeah, I was just looking up doesn&#39;t seem like dind requires any...
22:52glandiumjonasfj: dustin was saying there is, though
22:52glandiumI guess I&#39;ll just try...
22:52jonasfjbut dind is only enabled on some workerTypes, the gecko-images and github-worker I think too
22:53jonasfjmaybe dustin was just hoping there was :)
22:53glandiumah yeah, docker images are using aws-provisioner-v1/gecko-images worker type
22:53glandiumcan I use that from github?
22:54jonasfjno, but I think github-worker also allows dind
22:54jonasfjnote: We can always grant you or your repo additional scopes if you end up missing some
22:54glandiumI guess the important part is using the image_builder docker image
22:55jonasfjoh, yeah, you can just use the one we use in gecko...
22:55jonasfjhmm, maybe I think that one actually runs mach...
22:55glandiumso if I want things in the index, I add a route for index.garbage.glandium.something?
22:55jonasfjyeah
22:56jonasfjand if complains of scope we&#39;ll add index.garbage.* to wherever it&#39;s needed, garbage.* is a great prototyping area
22:56jonasfjglandium: curious what is this for how stable do you want it?
22:57glandiumjonasfj: I want to replace travis for git-cinnabar builds and tests
22:57glandiumand possibly appveyor
22:58jonasfjso in tc-worker I use the following:
22:58jonasfjhttps://github.com/taskcluster/taskcluster-worker/blob/master/.taskcluster.yml#L176-L197
22:58jonasfjthis task is a per-task VM, the clone-and-exec.sh is inside the VM
22:58jonasfjand it clones from env vars: REPOSITORY and REVISION
22:58jonasfjthen does &quot;exec $@&quot;
22:59jonasfjso in the case I linked it does &quot;make tc-worker-env-tests&quot; which is the makefile I have in my repo
22:59jonasfjthis is a minimal ubuntu server VM downloaded from S3
23:00jonasfjso one could make an ugly bash script that installs docker and runs docker build :)
23:01jonasfjdownside: the workerType is stable yet, but a stable variant will arrive for gecko image builds in some future
23:01jonasfjnote: I think the vm already has docker installed too
23:03glandiumerr, the taskcluster image_builder image doesn&#39;t have git
23:06jonasfjyeah,
23:07jonasfjand it doesn&#39;t contain docker cli either.. because newer docker cli clients don&#39;t work with dind, because our dind image is so old
23:07jonasfjhence, why it&#39;s not the nicest option..
23:11glandiumok, let&#39;s try that ubuntu-worker image
23:14gpsjonasfj: https://hg.mozilla.org/hgcustom/version-control-tools/file/tip/testing/vcttesting/docker.py
23:14glandiumjonasfj: why did I only get one task with this https://github.com/glandium/git-cinnabar/blob/tc/.taskcluster.yml ?
23:14glandiumcorresponding task group: https://tools.taskcluster.net/groups/BLb540v9RsifTRTI7I3vFw
23:15jonasfjI&#39;ll give you some scopes...
23:15jonasfjglandium: is glandium/git-cinnabar/ the canonical repo?
23:15glandiumjonasfj: yes
23:15glandiumah, got a mail
23:16glandium(for the insufficient scopes)
23:16jonasfjyeah...
23:16jonasfjglandium: did it tell you want scopes it wanted?
23:17glandiumjonasfj: https://pastebin.mozilla.org/9029522
23:17jonasfjglandium: try again: https://tools.taskcluster.net/auth/roles/repo%3Agithub.com%2Fglandium%2Fgit-cinnabar%3A*
23:21jonasfjif you want the scopes with your LDAP in the task-creator, you can just request membership of:
23:21jonasfjhttps://mozillians.org/en-US/group/taskcluster-contributors/
23:21jonasfjand add your LDAP email as a secondary email to your mozillians account
23:23jonasfjthat groups gets: mozillians-group:taskcluster-contributors
23:23jonasfjwhich is a lot of random scopes...
23:23jonasfjincluding: project:taskcluster:worker-test-scopes :)
23:24jonasfjalso setting task.payload.interactive = {} on the qemu-worker things will give you shell and display
23:24glandiumthe scopes on that group seem too broad
23:25glandiumjonasfj: got another email with the same complaint
23:26glandiumsee it seems I do need queue:scheduler-id:taskcluster-github too
23:26glandiums/see/so/
23:26jonasfjbstack: ^ ?
23:27jonasfjhmm,... but it already creates tasks with that schedulerId
23:28jonasfj - clone-and-exec.sh
23:28jonasfj - dpkg -l
23:28jonasfj - ps aux
23:29jonasfjnice trick, if only it wouldn&#39;t cause bash to do exec &quot;dpkg -l&quot; &quot;ps aux&quot;
23:29glandiumah, I thought it was one command per line
23:30glandiumso it&#39;s arguments to the script
23:30jonasfjthe clone-and-exec-sh is: https://github.com/taskcluster/taskcluster-worker/blob/master/examples/ubuntu-image/data/clone-and-exec.sh
23:31glandiumoic
23:33jonasfjglandium: okay, try again -- I think tc-gh relies on some legacy scopes..
23:33glandiumerr I should have changed the command
23:33glandiummeh
23:34glandium\o/ yay
23:34glandiumhttps://public-artifacts.taskcluster.net/NUlbD4n7R9eWZ1pEWSZMbA/0/public/logs/live_backing.log
23:36glandiumok, so docker-engine is installed and running. I&#39;m not sure that includes the docker command line, though
23:36jonasfjI think it does...
23:38jonasfjglandium: https://tools.taskcluster.net/groups/GCAG19i7QlGXlWwSgjsdAQ/tasks/GCAG19i7QlGXlWwSgjsdAQ/runs/0/artifacts
23:38jonasfjglandium: it should have an interactive shell so oyu can see what&#39;s in there...
23:38glandiumjonasfj: so, once I&#39;m done creating my image, I do a docker image save and ... put it in the artifacts directory?
23:39jonasfjnot save... I think it&#39;s &quot;docker export <image-name> | zstd -c -3 > image.tar.zstd&quot;
23:39glandiumjonasfj: https://docs.docker.com/engine/reference/commandline/image_save/
23:39glandiummmmm https://docs.docker.com/engine/reference/commandline/export/
23:40jonasfjah
23:41jonasfjso: docker save <image-name> | zstd > image.tar.zstd
23:41jonasfjor something like it :)
23:42jonasfjglandium: note: zstd must be installed from source on ubuntu, like: https://github.com/taskcluster/taskcluster-worker/blob/master/tc-worker-env.Dockerfile#L17-L20
23:42jonasfjhmm... somehow my interactive task failed... maybe sleep isn&#39;t a real command :)
23:43glandiumjonasfj: is it in that ubuntu-worker image already?
23:43jonasfjprobably not...
23:45jonasfjglandium: you could also do: task.payload.command = [docker, run, -v, /var/run/docker.sock:/var/run/docker.sock, you-image]
23:45jonasfjand just use the qemu to launch docker with docker socket mounted :)
23:45jonasfjwhich is probably easier to test/develop with locally
23:46jonasfjin fact, it&#39;s what my makefile does... it runs my tests inside docker.. because customizing a VM is painful..
23:47jonasfjanyways, if this works for you let me know... and I&#39;ll be sure to migrate you along with a stable QEMU workertype is available (at this point I haven&#39;t published payload schema as docs yet)
23:48glandiumjonasfj: I&#39;m not sure how qemu is related to what I&#39;m going to do, though :)
23:49glandiumor is that worker type in qemu?
23:49jonasfjglandium: that workertype is running tc-worker on packet.net and the worker is create a QEMU VM per task
23:50jonasfjthe task.payload.image URL is an ubuntu VM :)
23:50glandiumI see
23:50jonasfjI hope to use it for building docker images in-tree, since upgrading docker is a lot easier then...
23:51jonasfjof course to make that easy, I need to build qemu images in-tree too. Thankfully we have nested virtualization, even if it&#39;s slow :)
23:55glandiumjonasfj: what scope do I need to add to allow myself to connect to the interactive shell of a task I create myself?
23:56jonasfjglandium: you have queue:get-artifact:private/* because you&#39;re in team_moco (I hope you are)
23:56glandiumah, maybe the owner?
23:57jonasfjqueue:get-artifact:private/interactive/shell.html will do...
23:57jonasfjwhich is implied from queue:get-artifact:private/*
23:57jonasfjif you are not in team_moco join the mozillians group and you&#39;ll get some scopes :)
23:58glandiumah, I was logged on tc with my non-moco login
23:58jonasfjoh, right... those still aren&#39;t merged...
12 Aug 2017
No messages
   
Last message: 10 days and 13 hours ago