mozilla :: #pdfjs

14 Apr 2017
15:36github_pdfjs[pdf.js] yurydelendik created package-version (+1 new commit): https://github.com/mozilla/pdf.js/commit/3b36a1709a9c
15:36github_pdfjspdf.js/package-version 3b36a17 Yury Delendik: Changing package.json version to 1.0.0...
15:37github_pdfjs[pdf.js] yurydelendik pushed 1 new commit to master: https://github.com/mozilla/pdf.js/commit/5feb2a253fc432a817b71c2f9cd5a7051b64e255
15:37github_pdfjspdf.js/master 5feb2a2 Yury Delendik: Merge pull request #8286 from mozilla/package-version...
15:37soakbotPull 8286: Changing package.json version to 1.0.0. https://github.com/mozilla/pdf.js/pull/8286
15:37github_pdfjs[pdf.js] yurydelendik deleted package-version at 3b36a17: https://github.com/mozilla/pdf.js/commit/3b36a17
15:39github_pdfjs[pdf.js] yurydelendik pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/5feb2a253fc4...f6d4de989883
15:39github_pdfjspdf.js/master 30bee9f Yury Delendik: Moves Uint32ArrayView and hasCanvasTypedArrays into compatibility.js.
15:39github_pdfjspdf.js/master f6d4de9 Yury Delendik: Merge pull request #8283 from yurydelendik/uint32arrayview...
15:39soakbotPull 8283: Moves Uint32ArrayView and hasCanvasTypedArrays into compatibility.js.. https://github.com/mozilla/pdf.js/pull/8283
15:49github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from f462dcb to 7cc8a1f: https://github.com/mozilla/pdf.js/commits/gh-pages
15:49github_pdfjspdf.js/gh-pages 7cc8a1f pdfjsbot: gh-pages site created via make.js script...
16:00github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from 7cc8a1f to 0bf8667: https://github.com/mozilla/pdf.js/commits/gh-pages
16:00github_pdfjspdf.js/gh-pages 0bf8667 pdfjsbot: gh-pages site created via make.js script...
16:17mukulyury: Hi
16:17yuryhi
16:17mukulI am working on streams-getTextContent branch
16:17mukulbut is failing unittest for getTextContent
16:18mukulI think i have to modify the test
16:21mukulwhat you say?
16:30yurysomething wrong then
16:30yurymukul: why the test is failing?
16:31mukulbecause it is for sendWithPromise
16:31yurythe core api functionality must stay without change
16:31yuryit's a stable api
16:32yuryso getTextContent function output must stay as was, just internals might change
16:33yurymukul: do you know why getTextContent outputs different result?
16:33mukuldifferent result?
16:34mukulI am not getting what do you mean by different results
16:34yurywhich unittest fails?
16:35github_pdfjs[pdf.js] Snuffleupagus pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/f6d4de989883...bc1f4dd9c806
16:35github_pdfjspdf.js/master 8e681ce Yury Delendik: Change amd to cjs path in ES6 modules
16:35github_pdfjspdf.js/master bc1f4dd Jonas Jenwald: Merge pull request #8285 from yurydelendik/webcjspath...
16:35soakbotPull 8285: Change amd to cjs path in ES6 modules. https://github.com/mozilla/pdf.js/pull/8285
16:35mukulyury: https://github.com/mozilla/pdf.js/blob/master/test/unit/api_spec.js#L966
16:37yuryso data[0] is different from data[1]
16:37yury?
16:37mukulbecause i am modifying the code to run sendWithStream in api.js at getTextContent
16:38mukulit should call read() to get data
16:38yurygetTextContent must return array
16:38mukulof items and styles?
16:39yuryright, object of items array and styles map
16:39yuryonly internals of getTextContent will change atm
16:39yuryyou can add streamTextContent function which will return stream
16:40mukulokay, that means i have to call read inside getTextContent function
16:40mukulto return the data
16:40yurybut getTextContent behavior is immutable
16:41mukulHmm..., we can create brother function as you are saying earlier
16:41mukullike streamTextContent for getTextContent
16:43github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from 0bf8667 to 686c41e: https://github.com/mozilla/pdf.js/commits/gh-pages
16:43github_pdfjspdf.js/gh-pages 686c41e pdfjsbot: gh-pages site created via make.js script...
16:50SnuffleupagusAs Yury suggested previously, I also think that it's a good idea to add a new |streamTextContent| function to the API. That way you won't inadvertently break the existing getTextContent API when adding Stream support.
16:50Snuffleupagusmukul: ^
16:59* yury makes today as "bug reporting day" and will try to create bugs/tests for various tools he used this week
16:59yurySnuffleupagus: yeah, babel is getting killed by glyphlist
17:00yuryit can still process unicode in 20sec
17:03Snuffleupagusyury: Yeah, that's a really bad one. But even if you *only* convert |src/shared/util.js| and nothing more to an ES6 module, it seems that's enough to slow everything to a crawl o.O
17:04yurywhen you will update the PR I will try to profile
17:04SnuffleupagusAlready on it, give me a couple of minutes :-)
17:05yuryrunning in from speed system.js is not really a concern
17:05yuryif it is only one time hit
17:05yuryduring startup
17:07yurywe might stop using this method anyway
17:07yuryI just wanted to experiment with two ways
17:07yuryit's nice to not perform transform in a browser
17:14yuryI wonder if modern browsers&#39; <script type=&quot;module&quot; loading will work for us
17:20mukulSnuffleupagus: yes, i also thinking the same
17:21mukulby this we can do streams-lib project with breaking the existing code
17:21mukul*without
17:21mukul:)
18:27mukulyury: ping
18:27yurypong
18:28mukulhere is the streams-getTextContent branch https://github.com/mukulmishra18/pdf.js/tree/streams-getTextContent
18:29yurymukul: actually I wanted you to use sendWithStreams in the getTextContent
18:29yurynot create parallel function
18:30yurymukul: as you said, use .read() inside getTextContent
18:30mukulHmm...., we can also do this way
18:31yurymukul: I don&#39;t see how you return items one-by-one
18:32yuryyou just sending entire textContent as one big chunk -- this will not save memory
18:35mukulOkay, then we have to modify the handler in worker to send data in chunk?
18:40yuryYes!
18:41mukulI think we have to change extractTextContent to send data in chunks
18:41yuryand futhermore modify evaluator to send not array but chunks
18:42yurybut for now just split result of extractTextContent by smaller chunks
18:42yurymukul: I just want to see complete implementation of sendWithStream
18:43yuryfor now more than half functionality still missing, modifying handler and getTextContent will give you test coverage
18:45mukulHmm..., we have to modify getTextCotent in evaluator?
18:54yurynot at the moment
18:54yurymukul: small steps
18:55yury... but if you can make it work .. :)
18:56mukulso instead of returning partialEvaluator.getTextContent() we can return a readableStream
18:56mukulfrom document.js
18:57yury... once sendWithStream works
18:57Snuffleupagusyury: Sorry for the delay, PR #8284 has now been updated with new module paths.
18:57soakbotPull 8284: Convert the files in the `/src` folder to ES6 modules. https://github.com/mozilla/pdf.js/pull/8284
18:57yurythanks
18:59mukulyury: so i am going one more level up, then we have to return array of items of textContent in chunk
18:59mukulfrom worker.js
19:00mukulalso styles object :)
19:01yurythere is desiredSize
19:01yuryso you process only desiredSize items with new styles with return those via enqueue
19:02yurySo it&#39;s still be textContent object with few items and new styles
19:03mukulokay, how to process only desiredSize items?
19:03yurythat said in the evalutor you may pass sink since you will need to control
19:04yuryat https://github.com/mozilla/pdf.js/blob/master/src/core/worker.js#L901 enqueue <= desiredSize items (not more)
19:05yuryper onPull
19:07yuryat https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L970 , you read() stream and add to the resulting object chunks
19:08yurysendWithStream shall optionally receive stream parameters/options as described at https://streams.spec.whatwg.org/#rs-constructor
19:09yury`size` is a function that calc amount of items in the chunk, `highWaterMark` is a chunk size
19:10mukulokay, then we need to define highWaterMark as desiredSize
19:10yuryokay
19:11yuryI&#39;ll give you some freedom to define those
19:11yury:)
19:11mukul:-D
19:11yurythere are two places ^ you need to worry about at the moment
19:11mukul?
19:12yuryit does not make sense to go any futrther and change internals of the pdf.js until sendWithStream will work for the getTextContent and its handler modification
19:13mukulyes, and i think it also mess up things
19:14yuryas it stands atm, sendWithStream is not ready yet but it&#39;s moving in right direction
19:15yuryonce it&#39;s done getTextContent patch will look simple
19:16mukulyeah,
19:16yurycan you give me code for getTextContet with sendWithStream
19:16yurytheoritical code (because it will not work atm)
19:18yurymukul: it will be based on https://streams.spec.whatwg.org/#example-read-all-chunks
19:18mukulTheoritical code?
19:20yurythe code you will right
19:20mukulI will modify the streams-getTextContent branch to work getTextContent with stream
19:20* yury does not need entire patch just new getTextContent body
19:21yuryit shall take you 10-20 min, but it will simply your life
19:23mukulsimply or simplify o.O
19:29yurysimplify
19:29mukul:) i am working
19:40yurySnuffleupagus: https://github.com/mozilla/pdf.js/pull/8287/files#diff-b9e12334e9eafd8341a6107dd98510c9R117 ?
19:47github_pdfjs[pdf.js] Snuffleupagus pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/bc1f4dd9c806...fd51a7cb8c70
19:47github_pdfjspdf.js/master 5855c0a Yury Delendik: Allow to convert (some of) ES6 code to ES5.
19:47github_pdfjspdf.js/master fd51a7c Jonas Jenwald: Merge pull request #8287 from yurydelendik/babel-es2015-preset...
19:47soakbotPull 8287: Allow to convert (some of) ES6 code to ES5.. https://github.com/mozilla/pdf.js/pull/8287
19:49Snuffleupagusyury: Tested locally before merging, and it seems to work perfectly :-)
19:50yurySnuffleupagus: tried https://pastebin.mozilla.org/9019029
19:51yuryso we are good to go for classes and lambdas
19:52yuryfor await I think we need to change some settings
19:53yuryfor now we are good
19:54SnuffleupagusSure, but given the discussion in bug 1167472 we may want to hold off on using Classes in /src/core/ files for now.
19:55yurysounds like a plan
20:06github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from 686c41e to 137fef1: https://github.com/mozilla/pdf.js/commits/gh-pages
20:06github_pdfjspdf.js/gh-pages 137fef1 pdfjsbot: gh-pages site created via make.js script...
20:08yurySnuffleupagus: okay, make glyphlist and unicode as common.js modules
20:08yurythe same story as I had with webpack
20:10yurySnuffleupagus: I have 8sec startup without those two
20:12* yury is fine with 8src
20:13yurySnuffleupagus: what times do you have?
20:15Snuffleupagusyury: The load times in |gulp server| mode is OK for me, but the run time for tests is catastrophic. With nothing more than https://github.com/mozilla/pdf.js/pull/8284/commits/17ff307fd21edd191ce6d7653b84de0275d63f06 applied, the run time for |gulp unittest| goes from ~25 seconds to ~75 seconds :-(
20:15yuryhttps://pastebin.mozilla.org/9019030
20:18yurySnuffleupagus: yes, it&#39;s recompiling all modules
20:18yurydo you think it&#39;s amd style naming that fails
20:18yury?
20:19yurywe probably need to fix unit tests as well
20:21SnuffleupagusI really have no idea what the problem might be, unfortunately.
20:21Snuffleupagus... but if it&#39;s recompiling modules over and over at every import, that could certainly explain why things aren&#39;t working
20:22Snuffleupagus*very well
20:32mukulyury: ping
20:33Snuffleupagusyury: Bottom line, for me the performance suffers too much with all /src files converted to ES6 modules (even using your pastebin above). So trying to do any sort of serious development and/or testing locally would just be too slow to be practical.
20:40yurymukul: pong
20:40yurySnuffleupagus: we will fix it
20:40mukuli am not getting how to enqueue the data with <= desiredsize
20:41mukulI am just looping through the textCotent and defining onPull function
20:45mukulHow to access size function inside the worker?
20:47yurygood question
20:47yurymukul: do you have link to my pastebin
20:47yuryto my jsbin
20:49yurymukul: http://jsbin.com/wiyovuv/edit?html,js,output
20:50mukulokay
20:52mukulthen, how we are going to do it?
20:57yurynot sure what you are asking
20:58mukulI am not clear how to enqueue the data in in worker
20:58mukulso that we can get chunks
20:59mukulI thought we can loop through the textContent and enqueue items and styles on by one
21:00mukulbut it seems to messy and wrong approach to me, as we are not using highWaterMark and size function anywhere
21:00mukulAm i going wrong way?
21:01yuryyou are going to get desiredSize
21:01yurynew version http://jsbin.com/wiyovuv/edit?js,console
21:02yurybut it will not work since pull call has not promise like we has in start
21:02yuryhence there are error
21:03mukulwhen we are going to resolve the promise in pull?
21:03mukulafter every success pull?
21:04yurysure
21:05* yury is not interested in sendWithStream implementation but rather getTextContent modufication
21:25yurywithout good example of future getTextContent and its handler code, it&#39;s hard to understand what need to be done with sendWithStream
21:29mukulhmm...., i am thinking about it
21:29mukulyury, we have to send desiredSize elements from items array
21:29mukul?
21:30mukuland when it is empty we start sending styles
21:30mukuldoes it makes sense?
21:31yuryprobably styles with first chunk of items
21:31yuryand, yes, desiredSize elements from items array
21:32yuryideally, new styles must appear when item(s) starts using it
21:33yurywe are not going count styles into desiredSize
21:34mukulthen for an array of items [item1, item2...] we have styles: {style1, style2...}
21:34mukulitem1 uses style1, item2 uses style2 and so on
21:34yuryitem2 can use style1
21:35yurywe don&#39;t know
21:36mukulThen it is better to have all styles in main thread before we start fetching items
21:36yurysure
21:36mukulas if we start rendering continuously it may cause problem, not to have required style for an item
21:47yurywe can cache styles
22:01github_pdfjs[pdf.js] yurydelendik pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/fd51a7cb8c70...f8b3b75e424a
22:01github_pdfjspdf.js/master e7a3ea2 Tim van der Meij: Firefox extension: remove unused preference cleanup from `bootstrap.js`...
22:01github_pdfjspdf.js/master f8b3b75 Yury Delendik: Merge pull request #8289 from timvandermeij/firefox-extension-cleanup...
22:01soakbotPull 8289: Firefox extension: remove unused preference cleanup from `bootstrap.js`. https://github.com/mozilla/pdf.js/pull/8289
22:03github_pdfjs[pdf.js] timvandermeij pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/f8b3b75e424a...27c3c33eec4e
22:03github_pdfjspdf.js/master 89df5ef Yury Delendik: Moves &#39;web&#39; target to the gulpfile.
22:03github_pdfjspdf.js/master 27c3c33 Tim van der Meij: Merge pull request #8288 from yurydelendik/mv-make-web...
22:03soakbotPull 8288: Moves &#39;web&#39; target to the gulpfile.. https://github.com/mozilla/pdf.js/pull/8288
22:07yuryonce we done with make.js removal, we need to upgrade to gulp4
22:52yurySnuffleupagus: got it, https://github.com/systemjs/plugin-babel
22:52yurywe need to add babelOptions: { es2015: false }
22:52yurywe don&#39;t need transforms
22:53yurySnuffleupagus: we are running workers and new worker instance requires new compilation
22:54yury(that&#39;s is probably good to use in addition to making glyphlist, unicode commonjs
22:55yuryeventually we need to unit test compiled code
23:03Snuffleupagusyury: Nice find, but for me that only seems to help marginally :(
23:04yuryswitching unit tests for compiled lib?
23:04yuryto
23:05SnuffleupagusWe probably need to switch the all test (browsertest, makeref) as well in that case, since they&#39;re slow as well. This wasn&#39;t as easy as I&#39;d hoped.
23:06yuryI was going to do it anyway, but wanted to setup source maps first
23:07yurySnuffleupagus: so we can migrate api?
23:07yuryI mean display/
23:08yuryand keep core/ or shared/ for later
23:11yuryAnother backward solution: since we are going with es2015:false, we can just use regex to make commonjs module from es6 :)
23:11yuryand use it as transpiler for systemjs :)
23:11SnuffleupagusYes, it seems that just /src/display would be OK for now, I can test more tomorrow and prepare a patch if everything seems to work!
23:11yurybabel just does not work for us
23:12SnuffleupagusYour last solution/workaround sounds intriguing :)
23:16yuryseeking bdahl&#39;s help with troubleshoting https://github.com/mozilla/pdf.js/issues/8290
23:17SnuffleupagusIs it actually broken though, or is it just that the Babel conversion adding in PR #8287 is super slow on the bots?
23:17soakbotPull 8287: Allow to convert (some of) ES6 code to ES5.. https://github.com/mozilla/pdf.js/pull/8287
23:18yurygood question
23:19yuryworks okay on previews, no?
23:19yurycould be just bot
23:21* Snuffleupagus decides to call it a night
23:21yurygood night
23:55bdahlyury: don&#39;t you have the rdp for new windows bot?
23:55bdahlanyway i started the script up again
23:58yurybdahl: I don&#39;t know how to type password
23:59bdahlyury: i thought password was in file
23:59yuryyeah, but it says it&#39;s invalid
15 Apr 2017
No messages
   
Last message: 160 days and 10 hours ago