mozilla :: #pdfjs

20 Apr 2017
01:31ArashHi
01:32ArashIs anyone here?
01:33ArashI need help with scrolling to a specific section of the pdf
14:53mukulyury: Hi
15:02yurymukul: hi
15:04mukulI published sendWithStream brach https://github.com/mukulmishra18/pdf.js/commit/89e5b2b2d662277c81f773565cd69faeb54bc370
15:05mukuland it is working on my local machine with streams-getTextContent branch
15:05yuryso viewer works and text selection works too?
15:07yuryand unit tests
15:07mukulI didn't tested those, let me test.
15:09mukulHmm..., unit test is passing
15:12mukulyury: viewer is also working
15:42yurymukul: really good
15:43yurymukul: did you check if size function works?
15:43yuryincrease highWaterMark to e.g. 100
15:47yurymukul: extract streamTextContent from the getTextContent and let the later to use the former
15:48mukulyou mean getTextContent will call streamTextContent?
15:48yurycorrect
15:49mukulokay
15:50mukulyury: how to check size function working
15:50yurybreakpoint or just console.log
15:51mukuli don't understand how increasing highWaterMark will help
15:51mukulI am using helloworld pdf
15:51mukuli am console logging into size, but i not giving anything
15:51mukulmaybe it is not called
15:52yurynot good... can you look into ref implementation to find out how it works
15:53mukulokay, i am looking into it
15:53yurywe will need tests that will check stuff we are doing at getTextContent
15:54yuryunit tests that are
15:55yuryonce we put all ducks in a row (sendWithStream + unit tests), you can submit a PR
15:55mukulyes, i also think we need unit tests
15:56yurypolyfill needs in commonjs form
15:56yuryneeds to be
15:56yurygetTextContent will need more work
15:57mukulI think we changed polyfill into commonjs form earlier
15:58yuryokay
17:40mukulyury: ping
17:40yurymukul: pong
17:42mukulI checked the reference implementation of ReadableStream and it is not calling size function to calculate desiredSize
17:42mukulrather it is using it when we call enqueue
17:43mukulour enqueue operation satisfy https://github.com/whatwg/streams/blob/master/reference-implementation/lib/readable-stream.js#L1082
17:43mukuland hence size function is not called
17:43yurysounds fine
17:43mukulit is fulfilling the read operation with above if condition
17:44yurywe need to send data faster and in chunks
17:46github_pdfjs[pdf.js] Snuffleupagus pushed 4 new commits to master: https://github.com/mozilla/pdf.js/compare/2928578164fc...c44fd3d6e21c
17:46github_pdfjspdf.js/master 3888a99 Jonas Jenwald: Remove the `URL` checks in the `createObjectURL` utility function, since the `URL` polyfill have made them redundant...
17:46github_pdfjspdf.js/master 84472b3 Jonas Jenwald: Change `getPDFFileNameFromURL` to ignore `data:` URLs for performance reasons (issue 8263)...
17:46github_pdfjspdf.js/master 7bd8b97 Jonas Jenwald: Change `PDFAttachmentViewer` to only open PDF attachments in the viewer, instead of downloading them, when `PDFJS.disableCreateObjectURL = false`...
17:47yurymukul: so we need to make desiredSize as a property of sink (vs param of pull)
17:48yuryand send and update it during pull/start
17:48yuryenqueue will reduce it, so it needs second parameter e.g. size
17:50mukulwe are going to update desiredSize?
17:50yurymukul: bottom line it will expected from us to send as fast as we can, stop when desiredSize reaches 0 and wait for next pull
17:50yurymukul: only at streamChunks
17:50yurystreamSinks
17:50yuryjust to keep track of it on other size
17:50yuryside
17:51yurymain side will have it at the controller
17:52yurymukul: so it time to change extractTextContent
17:53yuryso pass sink here https://github.com/mozilla/pdf.js/blob/master/src/core/document.js#L339
17:54mukulOkay, it is called in worker
17:55mukulso i have to pass sink from there
17:56yurymukul: next function at https://github.com/mozilla/pdf.js/blob/master/src/core/evaluator.js#L1469, will enqeueu current textContent and will reset it
17:59mukulnext function is called whenever we have textContent chunk?
17:59yurymukul: once this will start sending data directly to we will take care of desiredSize
17:59yuryit's called when some items in textContent where created
18:00yuryall you need to do send this chunk as is and zero/reset textContent that you sent
18:00mukulOkay, and reset
18:01yurythat's why I chose this format in first place
18:01mukullike stateManager.restore()
18:02yurynope
18:02yurymukul: like https://github.com/mozilla/pdf.js/blob/master/src/core/evaluator.js#L1232-L1235
18:04yuryonPull can do nothing for now, but it will be needed when will start handling sink.desiredSize, just don't forget to
18:05mukulso i have to pass the sink and enqueue the textContent whenever it created and reset the textContent
18:05yuryso i have to pass the sink and enqueue the textContent
18:05github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from f1ab499 to 6729300: https://github.com/mozilla/pdf.js/commits/gh-pages
18:05github_pdfjspdf.js/gh-pages 6729300 pdfjsbot: gh-pages site created via gulpfile.js script...
18:05yurywhen next is call enqueue and and reset the textContent
18:06yuryis called
18:06yurysink.onPull = function() {}; sink.onCancel = function() {};
18:09mukulwe defined that in handler
18:09mukulwe have to define it somewhere else also?
18:09yuryremove that there, also don't return promise from the handler
18:10yuryuse this promise to .then(()=>sink.close(), (reason) =>sink.error(reason));
18:11yurymukul: would you like to prototype it in pastebin?
18:13mukulHmm..., actually i have to try somewhere for small code base.
18:13mukuljust like we started with sendWithPromise
18:13yurysure
18:14mukulit is actually confusing
18:15mukulcan you please tell me the working of next function
18:15yurymukul: what confusing? we implemented everything in sendWithStream
18:15yuryit somewhat works except desiredSize
18:16yurymukul: getTextContent's next function is called when parsing needs to take a break
18:16yurye.g. it spent to much time or promise needs to be resolved
18:17yurymukul: so it's good place to send a chunk of processed data back to the main thread
18:17mukulwe are using `try { promiseBody(resolve, reject) }`
18:18yury?
18:18yurypromiseBody is a function that is doing incremental parsing
18:19yuryhttps://github.com/mozilla/pdf.js/blob/master/src/core/evaluator.js#L1468
18:19mukulOkay it is calling the function recursively
18:23mukulThen we have to modify the next function to enqueue the data and reset textContent object
18:23yuryright, what will happend we will just send data as soon as it's ready
18:24mukulin future can we use pull to call next?
18:24yury(ignoring pull requests and desiredSize, which needs to be fixed)
18:24mukulOkay, for now i am ignoring these things. :)
22:15github_pdfjs[pdf.js] timvandermeij pushed 2 new commits to master: https://github.com/mozilla/pdf.js/compare/c44fd3d6e21c...96cb599e938d
22:15github_pdfjspdf.js/master d76f299 Jonas Jenwald: [Firefox addon] Remove the `window.FirefoxCom` hack from `web/viewer.js`, since it was made redundant by the `setExternalLocalizerServices` re-factoring (PR 7202 follow-up)...
22:15github_pdfjspdf.js/master 96cb599 Tim van der Meij: Merge pull request #8322 from Snuffleupagus/rm-window-FirefoxCom...
22:15soakbotPull 8322: [Firefox addon] Remove the `window.FirefoxCom` hack from `web/viewer.js`, since it was made redunant by the `setExternalLocalizerServices` refactoring (PR 7202 follow-up). https://github.com/mozilla/pdf.js/pull/8322
22:35github_pdfjs[pdf.js] pdfjsbot force-pushed gh-pages from 6729300 to d4ba64f: https://github.com/mozilla/pdf.js/commits/gh-pages
22:35github_pdfjspdf.js/gh-pages d4ba64f pdfjsbot: gh-pages site created via gulpfile.js script...
23:30bdahlyury: did you ever have any luck figuring out why the bot is so slow?
23:31yurynope
21 Apr 2017
No messages
   
Last message: 3 days and 20 hours ago