freenode :: #microformats

27 Apr 2017
02:08ben_thatmustbeme Woo, making great progress on my rewrite of the parser
02:08aaronpkwow awesome
02:09ben_thatmustbemeSuper basic parsing is already working.
02:11Loqiben_thatmustbeme has 2 karma in this channel (203 overall)
02:53ben_thatmustbemeIt's actually pretty interesting as I'm learning little edge cases of microformats I didn't know about
02:54gRegorLoveI think I need some clarification on the implied URL parsing related to:
02:54Loqi[gRegorLove] #110 Fix implied u-url when multiple links
02:54gRegorLove"else if .h-x>a[href]:only-of-type:not[.h-*], then use that [href] for url" from
02:54Loqi[Tantek elik] microformats2 parsing specification
02:55gRegorLove&quot;.h-x > a[href]:only-of-type&quot; means .h-x has only one direct child <a>, correct?
02:56tantekhas only one direct child that is an <a> tag with an &#39;href&#39; attribute
02:56gRegorLoveMeaning, :only-of-type doesn&#39;t restrict sibling elements from having <a> as children
02:57tantekcorrect - haven&#39;t run into that case though
02:57gRegorLoveSee the github issue. The second link is inside a sibling <b>
02:58gRegorLoveMaybe a product of weird MediaWiki formatting
02:59gRegorLove(Speaking of edge cases, ben_thatmustbeme. Heh)
03:01gRegorLovetantek: So is the parser technically correct in this example?
03:02ben_thatmustbemeI haven&#39;t gotten to much of the implied properties part yet. May get messy, not sure yet
03:02gRegorLoveselectoracle is your friend when you get there:
03:03gRegorLovemf2py also returns the implied URL for that HTML
03:04gRegorLoveAnd microformat-shiv
03:05ben_thatmustbemeI suppose it would, assuming the > means direct decendant in the html
03:05ben_thatmustbemeI suppose it would be correct
03:06* tantek is distracted by some in person things - may have to look into later tonight
03:07ben_thatmustbemeAnd it doesn&#39;t mean descendants that are not inside sub [h,p,e,dt,u]-*
03:08gRegorLoveYeah, it&#39;s direct descendant afaik.
03:08gRegorLoveReasoning probably being to prevent really unexpected implied values
03:09gRegorLoveYeah, moving the </b> to the end gives no implied URL
03:10ben_thatmustbemeSo the conclusion is, stop issues <b> tags already
03:11ben_thatmustbemeAlso if it weren&#39;t direct descendants the parsing would get way more messy
03:21KartikPrabhugRegorLove: is mf2py giving the correct implied URL not the one in the <b>
03:21KartikPrabhuand so is pin13
03:23KartikPrabhuso they seem to playing by the parsing rules
03:24KartikPrabhuif you put a u-url on the /2017/Bellingham link then they both return that link as expected
04:48gRegorLoveKartikPrabhu: The HTML&#39;s already been fixed to get the desired u-url explicitly. the issue appeared to be php-mf2 not following the implied u-url algorithm correctly.
04:48KartikPrabhuaah ok. I was wondering if mf2py is doing it right, and I think it is
04:48gRegorLoveBut after review, it appears it is parsing correctlly, just the weird HTML didn&#39;t give the desired u-url as a result
04:49gRegorLoveAll of the parsers are doing it, and it appears all it takes is moving the </b> to the end, then no implied u-url
04:49KartikPrabhuyeah, that is what the parsing-algo says atm
04:49gRegorLoveSo pretty sure there&#39;s no parsing bug. Will await tantek&#39;s confirmation to be sure.
04:50KartikPrabhualso, traversing down children of h-* is going to be very annoying
04:50gRegorLoveYeah, the more I looked at it, the reasoning for the very strict implied algo makes sense
04:51gRegorLoveshort version: if you really want the property, add it explicitly :)
04:52KartikPrabhuyeah I think that is true for more complex markup
04:52KartikPrabhubut implied-properties are cool too :P
05:05gRegorLove!tell tantek summarized the conversation on github:
05:05LoqiOk, I&#39;ll tell them that when I see them next
05:05Loqi[gRegorLove] #110 Fix implied u-url when multiple links
08:40Loqi[@rashidnoorani] for all types of researched predefined #schemas. #gids17 #microformats. (
08:40tanteklol &quot;researched&quot;
08:40Loqitantek: gRegorLove left you a message 3 hours, 34 minutes ago: summarized the conversation on github:
08:41tantek!tell gRegorLove thanks!
08:41LoqiOk, I&#39;ll tell them that when I see them next
13:18ben_thatmustbemehmm, noticed a difference between pin13 and unmung as far as stripping whitespace
13:20ben_thatmustbemespecifically the html:
13:21KartikPrabhubefore the <p> tag?
13:22KartikPrabhuthat might be due to the HTML parsers used and not the mf2 part
13:24KartikPrabhuin fact pin13 removes the next line \n in the value and ummung does not
13:28ben_thatmustbemethat too
13:29KartikPrabhuben_thatmustbeme: what is your HTML so I can try it on my mf2py
13:29Loqisome such thing
13:30ben_thatmustbemethanks loqi
13:30Loqiyou&#39;re welcome, ben_thatmustbeme
13:30* ben_thatmustbeme hands loqi the dictionary entry on sarcasm
13:31KartikPrabhuinteresting, my mf2py installation preserves the space before <p> in html property and keeps the \n in the value property
13:31KartikPrabhuben_thatmustbeme: try it here
13:36ben_thatmustbemelikelty some of this is due to what is considered whitespace by the language
13:36ben_thatmustbemethough some don&#39;t try to strip at all, others do
13:37ben_thatmustbemeor rather what the stripping function considers whitspace
13:38ben_thatmustbemetrying to understand the .e-*.h-* interaction in my parser, making me rethink a few things
13:39ben_thatmustbemewould that be the only time you can have anything other than type, properties, children and value?
13:39ben_thatmustbemeis having an html as well
15:52ben_thatmustbemeworking on
15:52Loqi[Tantek elik] microformats2 parsing specification
15:52ben_thatmustbemei&#39;m confused what the difference is between the name and photo sections for example
15:52ben_thatmustbemevs .h-x>img[src]:only-of-type:not[.h-*]
15:53ben_thatmustbemejust getting lost in them a bit
16:22gRegorLoveben_thatmustbeme: First one means: .h-x with an img[src] as its only child where the alt is not empty and the img does not have an .h-x
16:22LoqigRegorLove: tantek left you a message 7 hours, 40 minutes ago: thanks!
16:23gRegorLoveSecond is: .h-x with only one img as a child and the img does not have .h-x
16:23ben_thatmustbeme&quot;with an img[src]&quot; mean with and image with a src attribute
16:24ben_thatmustbemedang, i just wrote this as only-of-type instead of only-child
16:26ben_thatmustbemei think it was the difference in ordering that was confusing me
16:26ben_thatmustbemeimg:only-child[alt] vs img[src]:only-of-type
16:41ben_thatmustbemelast questions gRegorLove to make sure i have this right,
16:41ben_thatmustbemeif it has more than one img tag, say 4, one has h-*, one has no alt, one has an empty alt, one has a non-empty alt and no h-*....
16:41ben_thatmustbemeoh wait, only, ONLY CHILD, basically cuts that all
16:42ben_thatmustbemei guess thats a question for only-of-type
16:43ben_thatmustbemebut i&#39;m just going to assume its actually only of that type, not only of that with that has attribute ...
16:49gRegorLoveCorrect, I&#39;m pretty sure only-of-type applies only to the selector it comes after, not the following attributes
16:51KartikPrabhuyes, that&#39;s how it works in CSS too
16:52gRegorLoveAre you using xpath in the parser?
16:53ben_thatmustbemeits using nokogiri and i&#39;m descending the tree myself
16:54ben_thatmustbemethough i suppose that might make more sense huh
16:55gRegorLoveMaybe, not sure. Was just going to suggest php-mf2 has several of them, like in parseImpliedPhoto()
16:56ben_thatmustbemei sort of don&#39;t want to look directly at other parsers, lest it confuse me more
16:56gRegorLoveHaha, fair enough.
16:56LoqigRegorLove: lol
17:21KartikPrabhuben_thatmustbeme: that is actually a good idea. independently written parser might find inconsistencies in the already existing ones
17:21ben_thatmustbeme*write a big pile of code to handle implied properties* *rerun tests* *number changes from 56 failers to 55 failures* *SIGH*
17:21ben_thatmustbemeyeah, that was the other reason
17:22KartikPrabhuben_thatmustbeme: also please document the &quot;space collapsing&quot; difference you found.
17:22ben_thatmustbemesure, where?
17:22KartikPrabhuerr good point :P
17:24KartikPrabhuben_thatmustbeme: maybe see ?
17:24gRegorLoveMay be related to Haven&#39;t checked the HTML you&#39;re referring to
17:24Loqi[ghost] #69 `<br>` between `<span>` tags are not interpreted as whitespace
17:25Loqisome such thing
19:34ben_thatmustbeme\me wipes brow, failing on 43 of the 92 tests now but i&#39;m only testing the v2 folder yet
19:34ben_thatmustbemepretty good progress though
19:38ben_thatmustbeme curious on this one, I don&#39;t see why the child h-card h-org has a value attribute
19:38LoqiMitchell Baker
19:39KartikPrabhuben_thatmustbeme: all h-* get atleast a value
19:40KartikPrabhuso people can use value as fallback text representation for any h-* in case they don&#39;t understand the particular vocabulary
19:41ben_thatmustbemeexcept for those in items[] ?
19:41KartikPrabhuI think all h-* get a value
19:41KartikPrabhudo you have an example?
19:41ben_thatmustbemethe parsing for that one
19:42ben_thatmustbemealso, not finding the part in the parsing spec of where it gets that value from
19:42ben_thatmustbemei see it for if .p-*.h-* etc
19:43KartikPrabhuoops maybe I mispoke
19:43KartikPrabhumf2py does not give value for that markup in any h-*
19:44KartikPrabhuvalue is for e-* things I think, so you have html property and a value property for plaintext representation
19:46KartikPrabhustrange pin13 i.e. php-mf2 does give a value just like the tests!
19:46Loqi[Tantek elik] microformats2 parsing specification
19:46ben_thatmustbemeso value is used inif p-*.h-* e-* u-*.h-*
19:48ben_thatmustbemethat section under value: is not terribly clear
19:48KartikPrabhubut that is only if the child microformat is also a property
19:49KartikPrabhuin this example markup it sin&#39;t
19:49ben_thatmustbemei don&#39;t see anywhere that value: should be set for children
19:50KartikPrabhumight be a bug in the tests, maybe leave a !tell to tantek to confirm
19:50KartikPrabhubut then either php-mf2 is wrong or mf2py is
19:51KartikPrabhuben_thatmustbeme++ for thorough checking of mf2 tests
19:51ben_thatmustbemenot sure what unmung uses
19:51Loqiben_thatmustbeme has 3 karma in this channel (204 overall)
19:51KartikPrabhumf2py i am guessing
19:51KartikPrabhuso it doesnot have the &quot;value&quot;
19:51ben_thatmustbemei&#39;m basing all of this parser on the tests, so if it doesn&#39;t pass things, i&#39;ll know
19:52KartikPrabhuyes, that is good. you are simultaneously checking the tests, the spec and other parsers :P
19:52KartikPrabhuI think I did something like this while writing code for mf2py :P
19:52KartikPrabhubut now have forgotten everything
19:53ben_thatmustbeme!tell tantek hitting what is either an error in the mf2 tests and a bug in php-mf2 or something missing in the spec and a bug in mf2py. children elements seem to be getting a value: set, but not sure why.
19:53LoqiOk, I&#39;ll tell them that when I see them next
19:54ben_thatmustbeme!tell tantek h-card/nested.html parses without value for child h-org h-card in mf2py and with one via php-mf2
19:54LoqiOk, I&#39;ll tell them that when I see them next
19:55ben_thatmustbemethis might actually answer a LOT of my non-passing tests
19:55ben_thatmustbemejust looking through
19:55ben_thatmustbememy only real points left to add are proper date parsing, and backcompat... i think
20:09ben_thatmustbemethis one is wrong in the other direction, p-affiliation h-card should have a value
20:11ben_thatmustbemeat least the parsers seem to all agree on that one, pretty clear thats a bug in the test
27 Apr 2017
Last message: 20 minutes and 35 seconds ago