From fffb11ce891a6446eeeceac80ef82ea3dd704ad4 Mon Sep 17 00:00:00 2001 From: Norman Walsh Date: Wed, 8 Feb 2017 14:25:16 +0100 Subject: [PATCH] Created Minutes for day two: 8 February 2017 (markdown) --- Minutes-for-day-two:-8-February-2017.md | 577 ++++++++++++++++++++++++++++++++ 1 file changed, 577 insertions(+) create mode 100644 Minutes-for-day-two:-8-February-2017.md diff --git a/Minutes-for-day-two:-8-February-2017.md b/Minutes-for-day-two:-8-February-2017.md new file mode 100644 index 0000000..dfaa258 --- /dev/null +++ b/Minutes-for-day-two:-8-February-2017.md @@ -0,0 +1,577 @@ +# XProc Workshop Minutes, Day 2, 8 February 2017 + + + +## Background links: + +- Spec repo: +- Online specs: +- Issues list: +- Norm’s @depends proposal: + - + - +- Norm’s proposal for issue #30: + - + - + + + + +## Agenda / Minutes + +- Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert, + Gerrit, Martin, Henry, Liam + + + + +## New Action Items + +- Martin to write a proposal for @p:message and how to extend + the messaging to log levels and other features using possibly + p:pipeinfo or a new p:message element. + +- Nic and Ari to discuss the status of the community group + and report back on xproc-dev by 15 Feb 2017. + +- Norm to propose a template for the extension steps repo. + + + + +### Time-boxed review of this issue + +- About requirements, not proposals +- How do documents figure into our relations with our XPath interface +- We need precision in the requirements + +Henry begins by projecting a document: + + +Liam: XPath is very primary; but we’re talking about +doing other kinds of expressions. Yesterday, +I wondered if handles could be +functions. Now I’m wondering if the expression language +could be pluggable. +Henry: That’s been in-and-out. I think for 3.0, it’s +a big step to introduce, at the language level, +multiple expression languages. +Norm: I think it’s a question of how to mix the languages. + +Norm: + +Observation: ./function() is conceptually the same as function($x) + +1. The output of a step can go in a variable, $x. +2. I must be able to get the properties of the variable, $x. +3. $x must have identity +4. select=”p:document-properties($x)” + +David: Why a document node, why not a string. +Norm: It could be a string. +Henry: The problem of collision is arises. +Norm: A string would work, but string-length($x) would be 5 or 19. + +Norm: We could also just make it a view: an XPath document, a +JSON object, a text node, a hexBinary node. + +Romain: We need access to the content from the empty document +node. +Norm: So you want to be able to use the binary functions on +the hexBinary. +Romain: Yes. +Henry: Stipulate the empty document node solution for a moment. +And the existance of a way to ask for the type of something. +If the answer isn’t XDM or JSON, you’d like to be able to say +coerce to XDM (string, hexBinary). + +It would also be useful to be able to go the other way, coerce +a hexBinary into an image/jpeg. + +Achim: if I get a text/plain document, do I get a string or +a blob? +Romain: You get an empty document node, you can coerce it into +string. +Achim: Users are going to find that confusing. But we can +tell users that they have to explicitly take the risk of attempting +to get a large amount of data. +Henry: We also talked about trying to put some logic into +p:load. + +Norm: We’re 50 minutes into the hour. +Henry: Here’s the procedural question: do we have enough of an +answer to proceed. It feels to me like we do. +Norm: I think we do. I propose that we proceed with some spec drafting +using the empty document node answer and see if it works or raises +other problems on closer inspection. + +Any objections? None heard. + +David: For me it’s hard to talk about coercing media types. +What it means is to process an entity in a different way. +It’s not coercion. +Norm: It’s not really coercion; it’s changing the label. +David: If you have the load step and you say you want to +load it as text/plain. If you use an HTTP URI, then this would +mean that the implementation would send an Accept: header +of text/plain. +Norm: Alternatively, you just treat the octet stream returned +as text/plain. +David: The other way around is just a claim. +Norm: I agree, “coercion” isn’t the right word. + + + + +### Overrides? + +Matthieu: I’d like to be able to override steps on p:import, +is that something we could consider? +Norm: Yes, but I suggest we have an issue about that and have +the discussion when we’ve had a chance to think about it. +Gerrit: We solve this by dynamically generating the pipeline +and then importing that. + + + + +## Review of open issues + +On reflection, trying to do this in the face-to-face seems +like it won’t end well. + + + + +### Proposals for resolving two issues + +Norm: The depends attribute, PR #33. +Henry: Remind me, what’s it for? +Norm: It’s for the case where you have a step with a side-effect +which isn’t manifest in the data flow: don’t start step B until +step A has finished, irrespective of what you think you know. +… It’s been implemented and appears to be sufficient. +Liam: If you called it ‘wait-for’ instead of ‘depends’, then +it would have been clearer. +Henry: The simplicity of this depends on the answer to my +question about streaming tomorrow: no. +Christophe: I prefer ‘wait-for’. +Henry: I prefer ‘depends’ because it leaves open the question +of what exactly it means; we may need that space in the future. + +Some discussion of the semantic variance between “depends” and +“wait-for”. + +Liam: I don’t think we really have to find an answer here. +Norm: Let’s try taking this one to email. + +Proposal: Merge the PR. Any objections? + +None heard. + +Norm: Allow attribute value templates in extension attributes, +PR #32. +Achim: Some of the extension attributes are handled in static +analysis and some dynamically. It only makes sense for attributes +that are being evaluated dynamically. +Norm: What attributes are evaluated statically? +Achim: depends are extension attributes today. +Norm: Errmm…can we finesse this by saying that processors +are free to forbid AVTs in any extension attributes that they +wish? (Added comment.) + +Some discussion of what the semantics of forbidding might mean. +Could mean curly braces not allowed, could mean not interprted +as an AVT. Up to the implementation. + +Proposal: Merge the PR with that note, at editor’s discretion +Any objections? + +None heard. + + + + +### Proposed list of issues that warrant face-to-face time: + +**\*** How to improve debugging + **\*\*** issue 18 + +- Norm: The observation that implementations should do better + with error messages is a point taken. +- Achim: What do I get if an error occurs is one question. + Another question is what do I get on p:catch? How good + is the error vocabulary. Today we only get the name of + the error. Maybe we could improve that. +- Norm: It’s very difficult to define error output. I think + a proposal in this area would be a very good thing. +- Achim: The question is how useful would this be to users? +- Martin: Sometimes it would be very useful. We’re currently + using Schematron that we extended with spans to handle + reporting where the error occurred. +- Achim: Step names or step types both seem like they’d be + valuable. Just more information. +- Norm: Someone should make a proposal. +- David: I find it hard to get the error messages at all. + It was never a problem to figure out where the error was, + but getting the message is hard. +- Norm: Yeah, my implementation sucks. +- David: A tutorial on p:log would be good. +- Norm: Yes, it’s a short tutorial but it would be good to have. + +**\*\*** @cx:message + +- Norm mutters on about @cx:message…proposes @p:message + to avoid stepping on the “message” option name. +- Martin: What about adding a terminate attribute? +- Gerrit: Isn’t this p:error? +- Norm: Do you want to do this conditionally? +- Henry: The advantage of having it as an attribute on + a step is that it’s much simpler vis-a-vis the plumbing. +- Norm: So a step with p:terminate on it runs the step + and then aborts. +- Henry: I propose p:abort with a message. +- Romain: Can we add a severity to p:message so that it + works like proper logging? +- Gerrit: Maybe we can have p:error work like logging and + messaging. +- Norm: Risking design on the fly: if p:message is a single + string then print it. If it’s a sequence of two strings + then the first string is a log level and the second string + is the message. What the processor does with the log + level is implementation defined. +- Achim: What about a function that returns the log level? +- Geert: I’ve been experimenting with overloaded steps. + We could put the message and other options in the step + content. That makes the mechanism extensible. +- Norm: Yes. We also have p:pipeinfo, you could use that. +- Henry: Maybe pipeinfo is a better way to do this altogether. +- Norm: I’m still attracted to @p:message for the simplicity + of printing a message, and maybe p:pipeinfo for log levels + and such. +- Romain: Or p:message element for more complicated messages. +- Norm: Right. So this wasn’t simple. I think we need a proposal. + + ACTION: Martin to write a proposal + + + + +### Return to yesterday’s discussion for a moment + +- Achim: There’s one more point in the specs dealing with XML + vs. non-XML documents that I find inconvenient. If I have a + heterogenous sequence, when this sequence hits a select + expression, an error is thrown. + + + + …I’d prefer to ignore the select expression for non-XML. +- Henry: We now have so many things flowing through pipelines + that I think this kind of defaulting behavior will be + surprising. Note also that under Norm’s proposal, these + will be documents and you’ll get the empty sequence. +- Achim: Ah, yes. We’ll just have to live with it. Nevermind. +- Norm: We might be able to provide less verbose solutions + with, for example, a function that takes a sequence of + nodes and an XPath expression and does the right thing. + + + + +### Reopening the question from this morning + +- Henry: The fact that a sequence of pipe content types can be + heterogenous means we may need to think about new tooling. + Dispatching on type will be more common so we might like to make + that easy. + + + + +### Important workflows (for publishing use cases) + +- Henry: I was just wondering if there were obvious mismatches + between the functionality of 1.0 (steps or architectural) from + the perspective of publishing workflows. If you regularly say + “oh rats, it’s that problem again”… +- Romain: In our publishing workflows, we’re dealing with file + sets. The way we do that is we have the in-memory documents + that can be processed by XProc and we have an XML representation + of the filesets. A directory structure, for example. A lot + of our steps have these two things as inputs and outputs. + We have to connect them explicitly everywhere. +- Henry: Two things? +- Romain: The in-memory documents and the fileset description. + It would be convenient if some (more) connections could be + implicity. If outputs of a particular type were automatically + connected to specific inputs, for example. + +Some discussion of how a zip step is wrapped for this purpose. + +- Romain: More implicit connections would be useful. +- Gerrit: So the connections are grouped automatically? You + want to have multiple primary ports? How are they connected? +- Norm mumbles something about using media types to connect + ports. +- Romain: There are other ideas here: both implicitly connecting + from preceding steps and for grouping connections together. +- Martin: So if you have a p:xslt step and it’s preceded by two + steps that produce XML and XSLT, they’d both be connected? +- Romain: If I have a step that produces HTML and binary images + and the following step receives an HTML document and binary + images, I want them to be connected implicitly. +- Norm: I think this is an interesting idea; but it’s complicated + and we need a proposal to review. +- Romain: That’s one thing that we do often. It depends on + how we define document sets. +- Henry: That (re)raises the question of whether we need the + concept of document sets. Whether this is the same as the + document collection idea or whether it’s more pipeline + appropriate, I’m not sure. But in any case, “in the publishing + workflow we often move document collections around” is worth + considering. That’s not something we directly support in XProc. + Wether the idea of document sets as I have them in my head from + the Markup Pipeline from 15 years ago is actually what’s needed, + I’m not sure. But thinking hard about sets is worth doing. + I don’t think it’s for 3.0. It’s too big a change. It raises + a whole bunch of questions. Whether you have to have steps + designed to work with document sets or whether there’s a story + about default plumbing is unclear. +- Romain: We might be able to leverage non-XML document ideas to + solve this. +- Henry: If you want to change the composition of a set, you need + the whole set, and doing that on a sequence representation of + the set will be very un-intuitive. + … So we could imagine having document collections flowing + throw pipes, not sequences of documents but collections to which + you have random access. +- Norm: Uh, er, maybe. :-) +- Gerrit: It’s hard to say if there are other things because we + have worked around them. We have, for example, a catalog resolver + for non-XML types so that we can refer to fonts and things. + It’s an XProc step, we don’t need it anymore. We created + extension steps to do image-metadata extraction, resizing, + etc. Maybe one can have an EXProc steps at some point for + processing images. We have an extension step that does unzip + a bit differently than the proposal for pxp:unzip. It extracts + the whole archive to disk and then other steps are able to + work on them (on disk). This could be improved with the new + concept that you have binary data flowing through steps. +- Martin: What we currently use is a step called file-uri + to work with URIs and operating system paths. +- Gerrit: This is also encapsulated in a step. +- Romain: At the language level, there’s not much missing in XProc + 3.0. There are a variety of utility steps that could be + standardized or not. +- Henry: What about interfaces to databases? +- Gerrit: I once wrote an issue about whether it could be a + good idea to have implicit validation. Could you read + the xml-model PI and do the right thing. You’d also want + to have an easy way to prepend the PIs to documents. + And you’d want to have a way to produces SVRL report + for validated documents. +- Norm: So in addition to a general p:validate step, this + includes the idea of, for example, a @validate attribute + on p:output to say “validate any document with a + xml-model PI”. + +Some discussion of the xml-model PI: + + +- Jirka: It is now also an ISO standard. + +Probably lunch. Probability, 100%. + + + + +### What’s the status of the W3C community group? + +Nic Gibson joins us by telepresence. + +- Norm: What about the community group? +- Nic: Ari and I both think it would be good if we tried to do something. + +Ari in particular seems interested. The question is, what value is there +in having both a mailing list and a community group? + +- Liam: I don’t think there mutually exclusive? +- Ari: That’s the question. +- Liam: There are a couple of lists. +- Norm: I think there’s only one, xproc-dev. +- Liam: I think the community group has a mailing list as well. I think + +we should keep using the xproc-dev list. + +- Norm: What’s the title of the community group these days? +- Nic: Data Pipeling Use Cases. +- Norm: Does keeping a moribund community group open help us, hurt us, + +or is it neutral. + +- Nic: The question is, do Ari or I (or anyone else) have time to make + +it not moribund. + +- Norm: Is that something you can answer it now? +- Nic: I think it would be good to talk to Ari about it. And I have some + +actions that I never got around to finish: mailing various lists and +anyone who’s ever posted to xproc-dev. + +- Norm: You’d want to craft the message carefully. +- Nic: I recon that Ari and I can chat by the time the XML Prague conference + +is over. + +ACTION: Nic and Ari to discuss and report back on xproc-dev by next week. + +- Norm: Anything else? +- Liam: At this point “no”, eventually we’ll want to see some sort of a draft + +of use cases. + +- Achim: The description of what the community group does is out of date + +with respect to our current plans. We need to say that XProc is still +alive even if the working group no longer exists. We need to assert +that the community group is the center of XProc activities. + +- Norm: That is an interesting point. It sounds like we need to rewrite + +the description. Maybe change the name. + +- Liam: I’m not sure if you can change the name of a community group. +- Norm: Ok. Let’s see what comes out of Nic and Ari come up with and the + +consider next steps. + + + + +### What’s our thinking on the “resource manager”? + +- What are the *semantics* of pipelines? + - What are the lowest-level abstractions needed to + describe/discuss pipelines? +- There’s metadata flowing through the pipe on the one hand + and a resource manager for local copies of things being + fetched and stored. And then variables are right in the + middle. + +On reflect, we all feel that we’ve covered these items sufficiently +earlier today or yesterday. Henry may come back with a simplified +proposal after further consideration. + + + + +### Any other business? + +- Achim: We should discuss how we can encourage the community + to suggest new steps. +- Norm: Couldn’t we just use the ‘extensions’ XProc repo? +- Romain: Yes, and we could make a custom template for that. + +ACTION: Norm to propose a template for the extension steps repo + +- Norm: After we have the template, let’s avoid “blank page syndrome” + by populating the repo with the existing exproc steps. Maybe + then have the exproc.org redirect there. +- Romain: And then have PRs to add them to the step spec. + +Some discussion of how to organize the specs and repos. Must have +a single entry point for the user. + +- Achim: If we put the exproc.org steps there, we should all read + them again and see if we can clarify them. Norm and I have + interpeted some of them differently. +- Romain: Should the extension steps target XProc 3.0 or 1.0 or + what? +- Achim: I think we should target 3.0. +- Norm: Yeah… +- Romain: When is a step considered ready to be implemented? + And how can I tell if the implementation is conformant with + the spec? +- Achim: We should have test cases. +- Romain: Something like semver perhaps. +- Norm: So if you want version 1.3.5 of a step, you look to see + if the implementor claims to support 1.3.5. + + + + +### Next steps + +- How do we do this? +- Henry: The best way this works in open source projects is BDFL. +- Achim: We should divide the work up, the test suite, the + documentation could be done separately. +- Norm: Achim and I seem to be signed up to do the spec editing. +- Achim: It would be nice to have one more editor who isn’t + an implementor. +- Norm: Henry, you’re the obvious candidate. +- Henry: I’ll say yes, but you have to tell me if I’m doing + a good job. +- Achim: It would be nice to have a user. +- Norm: Yeah. You have someone in mind? +- Gerrit: I’ll work on editing too. +- Henry: This is the SGML working group model: editors, a core + group, and a broader group. + + + + +#### Work items + +- A spec: Norm, Achim, Gerrit, Henry as time permits +- A test suite: David +- Step proposal curator: Geert +- Documentation: Christophe, Matthieu + + + + +#### Status updates + +- Monthly reports to xproc-dev on the second Tuesday of the + months starting on 14 March 2017. + + + + +#### Where do we start? + +- The documents at spec.xproc.org are the current head of development. + + + + +#### How long do we expect this to take? + +- Goal: approaching functional completeness by XML Prague 2018 + (Henry proposes beta release) + + + + +### Next meeting? + +- XML Prague 2018? +- Henry: Having one at the beginning or end of the summer might + be useful, but we don’t know yet. +- Maybe XML Amsterdam? + + + + +### Thank our hosts + +- Thanks to Jirka, XML Prague, and the University of Economics. + + + + +### Adjourned + -- 2.11.4.GIT