1 # XProc Workshop Minutes, Day 1, 7 February 2017
5 - Spec repo: <https://github.com/xproc/1.1-specification>
6 - Online specs: <http://spec.xproc.org/>
7 - Issues list: <https://github.com/xproc/1.1-specification/issues>
8 - Norm’s @depends proposal:
9 - <https://github.com/xproc/1.1-specification/pull/33>
10 - <http://xpspectest.nwalsh.com/depends/head/xproc11/>
11 - Norm’s proposal for issue #30:
12 - <https://github.com/xproc/1.1-specification/pull/32>
13 - <http://xpspectest.nwalsh.com/fix-30/head/xproc11/>
16 <a id="orgee13667"></a>
21 <a id="org88438ac"></a>
25 - Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert,
26 Gerrit, Martin, Henry, Liam [afternoon]
29 <a id="org08ee05f"></a>
33 - Norm to ask Nic for feedback about the state of the community group
34 - Norm to write a message (periodically?) to xproc-dev explaining
35 that it’s being used both for user and spec-dev discussions.
36 - Norm to send Romain a couple of example tests to try converting
38 - Norm to write better documentation of the test suite format
40 - Henry to write a straw man for the non-XML documents question
43 <a id="orgcb61fce"></a>
45 ### Review of the agenda
47 - Review of the test suite mechanics
48 - What’s the status of the W3C community group?
49 - Do we need to make it more present?
50 - What version number do we use? What is “V.next”?
51 - Are we going to call it “XProc”?
52 - Development infrastructure
53 - GitHub? Wiki’s provide no notifications
54 - What are the important workflows?
55 - Center of gravity of group is clearly publishing
56 - Revisit “pain points” from the bottom up
57 - Are database (relational or XML) interfaces important?
58 - What’s our thinking on the “resource manager”?
61 <a id="orgcf8688b"></a>
63 ### Brief review of the previous workshop minutes
65 - What about backwards compatiblity?
66 - A goal, not a requirement; should be able to write a tool
67 to transform old pipelines to the new spec
68 - Do 1.1 pipelines ever have to run 1.0 steps?
69 - Open question remains: are we going to try to do a “1.1” quickly
70 that’s a small(ish) delta on 1.0 or are we doing something grander?
73 <a id="org02f97a6"></a>
75 ### What is it called, what version is it?
77 - Norm mutters on about the options a bit. 1.1, 2.0, 3.0?
78 - Martin: I’d like a 2.0 version to better engage folks who
79 aren’t currently using it.
80 - Achim: Then maybe we should call it 3.0?
81 - Geert: I think 3.0 is a real possibility. Leaves behind the
83 - Norm: We could also change the name.
84 - Henry: No, we should keep it. It doesn’t have to carry baggage.
85 - Achim: I think 3.1 makes sense; it’s using XPath 3.1 and XDM 3.1
86 - Matthieu: I don’t think these have to be this closely aligned.
87 What about other versions of XPath in things like Schematron.
88 - Christophe: If we go to 3.x, we need to bring something new.
89 If we only optimize 1.x, it won’t be “new enough”.
90 - Matthieu: Wouldn’t the same be true for 2.0?
91 - Henry: We have to come back to the question of what it is we’re
93 - Christophe: It’s not clear what 1.1/2.0 is. But maybe 3.0 could
97 <a id="org953a61e"></a>
99 ### What is it we’re building then?
101 - Norm gives a potted summary of the “1.1 will be a better 1.0”;
102 no, “we need a totally new 2.0”. No, the community wants a better
103 1.0 (at least an XML syntax). So here we are. What do all y’all
105 - Romain: I’d like two things: a better XProc 1.0, trying to fix
106 the usability shortcomings. And the other would be a more ambitious
107 2.0 with a compact syntax, etc.
108 - Norm: I think the only certainty is you can’t have both.
109 - Romain: Pragmatically then an improved XProc 1.0.
110 - Achim: I agree; the compact syntax in particular has technical
112 - Henry: None of us are programming language designers; most design
113 fails. There’d be real benefit to expertise in that space. We
114 should stay with what we’re good at.
115 - Achim: So a better XProc 1.0, is that the consensus?
116 - Matthieu: I think there are things that we can do to make it
117 easier to use that go a little bit beyond “an improved 1.0”.
118 - Achim: The current spec drops the XML centric view.
119 - Henry: There’s a spectrum between “a data flow language” on one
120 hand and “a web-centric language with a first-class place for
122 - Geert: Do we know what we want with non-XML documents? Are we
123 building database migration tools? I think we all want the same:
124 an occasional CSV, images, zips. As publishers, we want to work
125 with XML documents. I don’t need the extreme spectrum of non-XML
126 documents, it’ll be an XML workflow with the occasional non-XML
128 - Henry: I think the crucial first step is to say that “here’s a JPG,
129 I can route it, I can package it,” but I don’t need to have
130 steps that work with it.
131 - Norm: I think JSON documents are the elephant in the room.
132 - Martin: Do we agree that if we want to use JSON, we have to
133 represent them as XML internally?
134 - Norm: Well. Maybe. That’ll always be possible, but other users
136 - Henry: Documents have media types. But that’s not a complete answer.
137 Is it just what you got off the wire, is it a serialization, or
138 does it move XDM data structures around. The question then is:
139 do we support anything other than documents with a media type
142 Some discussion of the steps that would/could/might implement
143 operations on random documents.
145 - Achim: What about the mapping to XDM because we use XPath
146 as the expression language.
147 - Norm: I think you just have decide what it means.
148 - Henry: I think the minimum needed to declare victory is to
149 give implementations some freedom leaving “error” as a reasonable
150 alternative. We don’t need to provide *in the spec* the extension
153 - Henry: There’s a question of what flows through the pipeline:
154 is it just sequences of XDMs? Right now we have both: a port
155 can declare if it accepts a sequence or not. We could get
156 rid of that in two ways: by saying everyone has to accept
157 sequences or by saying that the compilation step will take
158 care of that: if a sequence arrives, a step will be invoked
162 <a id="org5170dce"></a>
164 #### Properties of what we’re building
166 - It has an XML syntax reminiscent of the XProc 1.0 syntax
167 - It has a 3.1 data model
168 - It uses XPath 3.x as the expression language
169 - Variables/options are now allowed to be any data model instance
171 - Metadata flows through the pipes
172 - Documents in the pipes have media types
173 - Attribute value templates (AVT) and
174 text value templates (TVT) are supported
175 - Ability to import user-defined function (XSLT or XQuery)
176 - New if/then/else compound step?
177 - Variables everywhere
178 - Dynamic evaluation of pipelines
180 ##### Step improvements
182 - SVRL output for all validation steps (a report port)
183 - XML-centric step vocabulary
184 - Move the eXProc.org steps into a standard step document
187 <a id="orgeec1887"></a>
189 ### What is it called, what version is it? (Continued)
191 - Norm: Now that we have more clarity on what we’re building,
192 do we know what to call it?
193 - Henry: I think “XProc 3.0” is good. If we ever do the
194 research project, we call it something else.
196 General murmers of agreement around the table.
198 - Norm: “XProc 3.0” is proposed; any objections?
200 The “2.0” version has been burned; it’s too difficult to explain
201 what it means. “3.0” is a clean break and will use the XPath 3.x
205 <a id="orgdb3e29d"></a>
207 ### What are the *semantics* of pipelines?
209 - What are the lowest-level abstractions needed to
210 describe/discuss pipelines?
212 - Norm waffles on a bit; his simplification is just
213 steps with connections between them.
215 Some discussion of variables and ordering of steps. There’s
216 clearly a question of expressing the order in a pipeline syntax
217 that’s not quite the same as what the underlying implementation
220 - Henry: The draft I have (which I’ve never published) retains
221 from XProc 1.0 the idea of a context that has a bunch of
222 bindings. It also has the idea that setting a variable is
223 a little step. Then I realized that that set variables off to
224 the side of the flow of documents through the pipeline between
225 “real” steps. I was worried about pushing/popping context
226 but maybe that’s not the right question; that’s an artifact
227 of the surface syntax.
228 … The other aspect of complexity in the system I was developing
229 was probably not necessary for what we’re calling 3.0. It attempted
230 to distinguish between steps/pipes and sources/syncs.
231 … If you have something that steps outside the dependency flow,
232 how do you manage the synchronization issues (for example,
233 updates to a database) that arise. That something is a resource
235 - Norm: I’d not considered the question of synchronizing something
236 like database access across several steps.
237 - Henry: The Markup pipeline had many extra steps to compile
238 schemas and stylesheets that it put in the resource manager.
239 The steps that implemented schema validation and XSLT always
240 expected a URI (discharged by the resource manager) to their
241 stylesheets and schemas. That imposed an extrinsic order.
242 - Henry: Why a resource manager? It was a simplification: it was
243 easier to think about producing an output document and giving it
244 a name and then later using that name. It was an alternative
246 - Norm: Use pipes if you can, but a resource manager does handle
247 the case of wanting to generate a document that will be
248 XIncluded (for example).
249 - Henry: You could also imagine that you have code that produces
250 a RELAX NG document and your Relax engine doesn’t have a standard
251 input port, it expects a URI.
252 … The other use for a resource manager was for internal use.
253 Some notion of preprocessing or compilation that you want to
257 <a id="org409905c"></a>
259 ### Development infrastructure
261 - Norm: I think it’s important to do the work in the open.
262 Github is certainly convenient in terms of tooling. The
263 wiki notification problem is real, but are there other
265 - David: What do we need?
266 - A mailing list (we have one)
267 - Github is good for code and specs
268 - Documentation wiki is just not very good for docs
269 - Achim: Do we have a mailing list? The xproc-dev list
270 isn’t very good for 3.0 discussions because many folks
272 - David: If we discuss in email then maybe the don’t need
274 - Norm: The other possibility is to just commit documents
275 to a repo; we can get notifications for that.
276 - Geert: We need to make sure we aren’t overwhelming the
277 “occasional users” on the xproc-dev mailing list.
278 … There will be other work on tests, tutorials and other
279 documents that we can make to help 3.0 users.
280 - Norm: I think it would be great if other community members
281 could take on maintaining the tutorials and things. It’s
282 very hard to do both the spec and tutorials.
284 Some discussion of writing a primer or tutorial. Norm observes
285 that if someone writes one, we can check it into the repo and
286 it’ll be published automatically.
288 Some discussion of existing resources: the spec, tutorials, etc.
290 - Christophe: I volunteer to work on some tutorial material.
291 - Matthieu: I will work on that as well.
293 (Thunderous applause.)
295 - Achim: The other point is test cases. We need new tests.
298 <a id="org06fd04f"></a>
300 ### Test suite/running tests
302 - Norm summarizes the current state of tests
303 - Romain: There’s also an XSpec test.
304 - Achim: There’s also the problem of multiple outputs.
305 … If, for example, you don’t support PSVI, then the output
306 of the test should be “an error” if you don’t specify PSVI
307 and a document if you do. Both of those should qualify as
309 - Norm: We could probably extend the test suite to cover that.
312 <a id="org6bd3e15"></a>
314 ### Review of the current working drafts
316 - Current state of the specs
318 - Achim leads discussion of
319 <https://github.com/xproc/Workshop-2017-02/wiki/Some-comments-on-XProc-1.1>-(2017-01-22)
321 - Discussion highlights:
323 - What do we do about the p:document-properties() function; if the document
324 is an argument to the function, what XDM value does it have when the
326 - Some discussion of the impact of sequences on this function
327 - Achim: What is the content-type of a sequence of three integers?
328 - Henry: Maybe only documents have a media type.
329 - Norm: “(1,2,3”) content-type=“x.vnd/xdm-item”
330 - Henry: We have said that arbitrary documents flow through the
331 pipeline; but variables can hold arbitrary XDM types
332 - Achim: But only documents have metadata
333 - Henry: Variables can contain any XDM item. Pipelines
334 consist of sequences (possibly unitary) of documents with
335 a media type and associated metadata.
337 Further discussion of what limits we can/must/should put on
338 the kinds of things that can flow through a pipeline.
339 What happens to an XQuery that produces three numbers.
341 Achim outlines the ideas from his comments document about mapping
342 from arbitrary documents to XDM instances.
344 Henry: I’d do it by having the output be a sequence of documents
345 plus metadata. What we get is the media type and maybe a URI. (Or
346 the filename in the case of a package.) Maybe you have to run a
347 step to do the “file” magic to add a media type if it was absent.
348 Now I can write a choose, based on the media type, that is inside
349 a for-each. I can use document-properties() for that single
352 Achim: I think we can put the properties in the context so that
353 we don’t have to load the document.
355 Henry: You’re absolutely right on two counts: it has to not be
356 an error and you have to be able to get at the properties.
358 Henry and Romain discuss the propsect of using a handle and
359 a special load function to get the data.
361 David: I’m concerned about using MIME types. What about encoding
362 and MIME type parameters?
364 Achim: Mapping from text/plain to string isn’t quite right.
365 But there are only four types.
369 Achim: Two questions: what flows on ports and what happens if
370 that becomes a context node. Why not restict whatever arrives
371 on the port to XDM? The problem is with the current, abstract
372 notion of documents in the spec.
374 Henry: We could just us an XProc private namespace to wrap
375 a handle for everything else.
377 Achim: What happens if some non-XML representation appears as
378 a context node. The spec says that this is implementation
379 defined or an error. I tried to rescue this by having a
380 mapping just for the case where the non-XML document node
381 is used as a context element.
383 David: We’re not talking about mime types, we don’t mean
384 mime types. We’re talking about types. We can map from
385 media types to our types, but we should have our own types.
387 Liam: I don’t really think that’s a good idea. This is what
388 mime types are for. You could make parameters available as
391 Norm: If we were going to go the handles route:
393 1. An XML document would be … an XML document
395 1b. An JSON document would be … a JSON XDM representation
397 1c. XDM items other than documents … ???
399 1. A text/\* document would be
400 <p:handle type=”text/plain” handle="12345">…the text…</p:handle>
401 2. Any other document would be
402 <p:handle type=”media/type” handle="5678"/>
404 The argument to the p:document-properties function is a node
405 and it has to be a valid node or handle.
407 Achim: This is the solution I like least. We can do things
408 which are not XML. When people ask, we show them XML documents.
410 Henry: No, this is just a way of having blobs that flow
411 through the pipeline.
413 Henry: Step implementations never see the handle markup.
415 Norm: This abstraction is too leaky.
417 Henry: There’s basically a difference of view here. I’d like to
418 deal with the general case in the general way and look at special
419 cases for text later. Others are trying to deal with the special
420 cases but don’t care so much about the general case.
422 Achim: What bothers me is whatever is a representation, what
423 happens if it becomes the context node.
425 Geert: I’ve been wondering if the way NetKernel solves some
426 issues is relevant. They have a URI for every object and these
427 URIs have types. There’s a “transreption” when necessary to use
428 the representation. Most of this is an implementation detail.
429 If you want to use it, for example for XSLT, they define a
430 mapping from each representation to something appropriate
433 Norm: Yes, I think a layer for doing transformations of
434 representations is a good idea.
436 Geert: One of the things that makes it work is that
437 the transreption library is small and fixed.
439 Romain: I think we are missing key use cases that would help
440 us make this decision.
442 Achim: We could compare the solution with some different use
443 cases. We have the one that I defined.
445 David: I’m thinking about cases where I create RDF and I want
446 to serialize it as ntriples.
448 Liam: Going the other way would give you non-XML: loading
449 some ntriples into an RDF graph.
451 Norm: zip manipulation, RDF graphs, JSON objects, text documents