Minutes-for-day-one:-7-February-2017.md

   1 # XProc Workshop Minutes, Day 1, 7 February 2017
   2
   3 ## Background links:
   4
   5 -   Spec repo: <https://github.com/xproc/1.1-specification>
   6 -   Online specs: <http://spec.xproc.org/>
   7 -   Issues list: <https://github.com/xproc/1.1-specification/issues>
   8 -   Norm’s @depends proposal:
   9     -   <https://github.com/xproc/1.1-specification/pull/33>
  10     -   <http://xpspectest.nwalsh.com/depends/head/xproc11/>
  11 -   Norm’s proposal for issue #30:
  12     -   <https://github.com/xproc/1.1-specification/pull/32>
  13     -   <http://xpspectest.nwalsh.com/fix-30/head/xproc11/>
  14
  15
  16 <a id="orgee13667"></a>
  17
  18 ## Agenda / Minutes
  19
  20
  21 <a id="org88438ac"></a>
  22
  23 ### Introductions
  24
  25 -   Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert,
  26     Gerrit, Martin, Henry, Liam [afternoon]
  27
  28
  29 <a id="org08ee05f"></a>
  30
  31 ### New Action Items
  32
  33 -   Norm to ask Nic for feedback about the state of the community group
  34 -   Norm to write a message (periodically?) to xproc-dev explaining
  35     that it’s being used both for user and spec-dev discussions.
  36 -   Norm to send Romain a couple of example tests to try converting
  37     to XProcSpec.
  38 -   Norm to write better documentation of the test suite format
  39     (and fix the schema)
  40 -   Henry to write a straw man for the non-XML documents question
  41
  42
  43 <a id="orgcb61fce"></a>
  44
  45 ### Review of the agenda
  46
  47 -   Review of the test suite mechanics
  48 -   What’s the status of the W3C community group?
  49     -   Do we need to make it more present?
  50 -   What version number do we use? What is “V.next”?
  51     -   Are we going to call it “XProc”?
  52 -   Development infrastructure
  53     -   GitHub? Wiki’s provide no notifications
  54 -   What are the important workflows?
  55     -   Center of gravity of group is clearly publishing
  56     -   Revisit “pain points” from the bottom up
  57     -   Are database (relational or XML) interfaces important?
  58 -   What’s our thinking on the “resource manager”?
  59
  60
  61 <a id="orgcf8688b"></a>
  62
  63 ### Brief review of the previous workshop minutes
  64
  65 -   What about backwards compatiblity?
  66     -   A goal, not a requirement; should be able to write a tool
  67         to transform old pipelines to the new spec
  68     -   Do 1.1 pipelines ever have to run 1.0 steps?
  69 -   Open question remains: are we going to try to do a “1.1” quickly
  70     that’s a small(ish) delta on 1.0 or are we doing something grander?
  71
  72
  73 <a id="org02f97a6"></a>
  74
  75 ### What is it called, what version is it?
  76
  77 -   Norm mutters on about the options a bit. 1.1, 2.0, 3.0?
  78 -   Martin: I’d like a 2.0 version to better engage folks who
  79     aren’t currently using it.
  80 -   Achim: Then maybe we should call it 3.0?
  81 -   Geert: I think 3.0 is a real possibility. Leaves behind the
  82     legacy.
  83 -   Norm: We could also change the name.
  84 -   Henry: No, we should keep it. It doesn’t have to carry baggage.
  85 -   Achim: I think 3.1 makes sense; it’s using XPath 3.1 and XDM 3.1
  86 -   Matthieu: I don’t think these have to be this closely aligned.
  87     What about other versions of XPath in things like Schematron.
  88 -   Christophe: If we go to 3.x, we need to bring something new.
  89     If we only optimize 1.x, it won’t be “new enough”.
  90 -   Matthieu: Wouldn’t the same be true for 2.0?
  91 -   Henry: We have to come back to the question of what it is we’re
  92     naming.
  93 -   Christophe: It’s not clear what 1.1/2.0 is. But maybe 3.0 could
  94     bring clarity.
  95
  96
  97 <a id="org953a61e"></a>
  98
  99 ### What is it we’re building then?
 100
 101 -   Norm gives a potted summary of the “1.1 will be a better 1.0”;
 102     no, “we need a totally new 2.0”. No, the community wants a better
 103     1.0 (at least an XML syntax). So here we are. What do all y’all
 104     want?
 105 -   Romain: I’d like two things: a better XProc 1.0, trying to fix
 106     the usability shortcomings. And the other would be a more ambitious
 107     2.0 with a compact syntax, etc.
 108 -   Norm: I think the only certainty is you can’t have both.
 109 -   Romain: Pragmatically then an improved XProc 1.0.
 110 -   Achim: I agree; the compact syntax in particular has technical
 111     problems.
 112 -   Henry: None of us are programming language designers; most design
 113     fails. There’d be real benefit to expertise in that space. We
 114     should stay with what we’re good at.
 115 -   Achim: So a better XProc 1.0, is that the consensus?
 116 -   Matthieu: I think there are things that we can do to make it
 117     easier to use that go a little bit beyond “an improved 1.0”.
 118 -   Achim: The current spec drops the XML centric view.
 119 -   Henry: There’s a spectrum between “a data flow language” on one
 120     hand and “a web-centric language with a first-class place for
 121     non-XML formats”.
 122 -   Geert: Do we know what we want with non-XML documents? Are we
 123     building database migration tools? I think we all want the same:
 124     an occasional CSV, images, zips. As publishers, we want to work
 125     with XML documents. I don’t need the extreme spectrum of non-XML
 126     documents, it’ll be an XML workflow with the occasional non-XML
 127     document.
 128 -   Henry: I think the crucial first step is to say that “here’s a JPG,
 129     I can route it, I can package it,” but I don’t need to have
 130     steps that work with it.
 131 -   Norm: I think JSON documents are the elephant in the room.
 132 -   Martin: Do we agree that if we want to use JSON, we have to
 133     represent them as XML internally?
 134 -   Norm: Well. Maybe. That’ll always be possible, but other users
 135     might want more.
 136 -   Henry: Documents have media types. But that’s not a complete answer.
 137     Is it just what you got off the wire, is it a serialization, or
 138     does it move XDM data structures around. The question then is:
 139     do we support anything other than documents with a media type
 140     and XDM instances?
 141
 142 Some discussion of the steps that would/could/might implement
 143 operations on random documents.
 144
 145 -   Achim: What about the mapping to XDM because we use XPath
 146     as the expression language.
 147 -   Norm: I think you just have decide what it means.
 148 -   Henry: I think the minimum needed to declare victory is to
 149     give implementations some freedom leaving “error” as a reasonable
 150     alternative. We don’t need to provide *in the spec* the extension
 151     story.
 152
 153 -   Henry: There’s a question of what flows through the pipeline:
 154     is it just sequences of XDMs? Right now we have both: a port
 155     can declare if it accepts a sequence or not. We could get
 156     rid of that in two ways: by saying everyone has to accept
 157     sequences or by saying that the compilation step will take
 158     care of that: if a sequence arrives, a step will be invoked
 159     multiple times.
 160
 161
 162 <a id="org5170dce"></a>
 163
 164 #### Properties of what we’re building
 165
 166 -   It has an XML syntax reminiscent of the XProc 1.0 syntax
 167 -   It has a 3.1 data model
 168 -   It uses XPath 3.x as the expression language
 169 -   Variables/options are now allowed to be any data model instance
 170 -   No parameter ports
 171 -   Metadata flows through the pipes
 172 -   Documents in the pipes have media types
 173 -   Attribute value templates (AVT) and
 174     text value templates (TVT) are supported
 175 -   Ability to import user-defined function (XSLT or XQuery)
 176 -   New if/then/else compound step?
 177 -   Variables everywhere
 178 -   Dynamic evaluation of pipelines
 179
 180 ##### Step improvements
 181
 182 -   SVRL output for all validation steps (a report port)
 183 -   XML-centric step vocabulary
 184 -   Move the eXProc.org steps into a standard step document
 185
 186
 187 <a id="orgeec1887"></a>
 188
 189 ### What is it called, what version is it? (Continued)
 190
 191 -   Norm: Now that we have more clarity on what we’re building,
 192     do we know what to call it?
 193 -   Henry: I think “XProc 3.0” is good. If we ever do the
 194     research project, we call it something else.
 195
 196 General murmers of agreement around the table.
 197
 198 -   Norm: “XProc 3.0” is proposed; any objections?
 199
 200 The “2.0” version has been burned; it’s too difficult to explain
 201 what it means. “3.0” is a clean break and will use the XPath 3.x
 202 expression language.
 203
 204
 205 <a id="orgdb3e29d"></a>
 206
 207 ### What are the *semantics* of pipelines?
 208
 209 -   What are the lowest-level abstractions needed to
 210     describe/discuss pipelines?
 211
 212 -   Norm waffles on a bit; his simplification is just
 213     steps with connections between them.
 214
 215 Some discussion of variables and ordering of steps. There’s
 216 clearly a question of expressing the order in a pipeline syntax
 217 that’s not quite the same as what the underlying implementation
 218 has to do.
 219
 220 -   Henry: The draft I have (which I’ve never published) retains
 221     from XProc 1.0 the idea of a context that has a bunch of
 222     bindings. It also has the idea that setting a variable is
 223     a little step. Then I realized that that set variables off to
 224     the side of the flow of documents through the pipeline between
 225     “real” steps. I was worried about pushing/popping context
 226     but maybe that’s not the right question; that’s an artifact
 227     of the surface syntax.
 228     … The other aspect of complexity in the system I was developing
 229     was probably not necessary for what we’re calling 3.0. It attempted
 230     to distinguish between steps/pipes and sources/syncs.
 231     … If you have something that steps outside the dependency flow,
 232     how do you manage the synchronization issues (for example,
 233     updates to a database) that arise. That something is a resource
 234     manager.
 235 -   Norm: I’d not considered the question of synchronizing something
 236     like database access across several steps.
 237 -   Henry: The Markup pipeline had many extra steps to compile
 238     schemas and stylesheets that it put in the resource manager.
 239     The steps that implemented schema validation and XSLT always
 240     expected a URI (discharged by the resource manager) to their
 241     stylesheets and schemas. That imposed an extrinsic order.
 242 -   Henry: Why a resource manager? It was a simplification: it was
 243     easier to think about producing an output document and giving it
 244     a name and then later using that name. It was an alternative
 245     to pipes.
 246 -   Norm: Use pipes if you can, but a resource manager does handle
 247     the case of wanting to generate a document that will be
 248     XIncluded (for example).
 249 -   Henry: You could also imagine that you have code that produces
 250     a RELAX NG document and your Relax engine doesn’t have a standard
 251     input port, it expects a URI.
 252     … The other use for a resource manager was for internal use.
 253     Some notion of preprocessing or compilation that you want to
 254     use.
 255
 256
 257 <a id="org409905c"></a>
 258
 259 ### Development infrastructure
 260
 261 -   Norm: I think it’s important to do the work in the open.
 262     Github is certainly convenient in terms of tooling. The
 263     wiki notification problem is real, but are there other
 264     difficulties?
 265 -   David: What do we need?
 266     -   A mailing list (we have one)
 267     -   Github is good for code and specs
 268     -   Documentation wiki is just not very good for docs
 269 -   Achim: Do we have a mailing list? The xproc-dev list
 270     isn’t very good for 3.0 discussions because many folks
 271     don’t understand.
 272 -   David: If we discuss in email then maybe the don’t need
 273     notifications.
 274 -   Norm: The other possibility is to just commit documents
 275     to a repo; we can get notifications for that.
 276 -   Geert: We need to make sure we aren’t overwhelming the
 277     “occasional users” on the xproc-dev mailing list.
 278     … There will be other work on tests, tutorials and other
 279     documents that we can make to help 3.0 users.
 280 -   Norm: I think it would be great if other community members
 281     could take on maintaining the tutorials and things. It’s
 282     very hard to do both the spec and tutorials.
 283
 284 Some discussion of writing a primer or tutorial. Norm observes
 285 that if someone writes one, we can check it into the repo and
 286 it’ll be published automatically.
 287
 288 Some discussion of existing resources: the spec, tutorials, etc.
 289
 290 -   Christophe: I volunteer to work on some tutorial material.
 291 -   Matthieu: I will work on that as well.
 292
 293 (Thunderous applause.)
 294
 295 -   Achim: The other point is test cases. We need new tests.
 296
 297
 298 <a id="org06fd04f"></a>
 299
 300 ### Test suite/running tests
 301
 302 -   Norm summarizes the current state of tests
 303 -   Romain: There’s also an XSpec test.
 304 -   Achim: There’s also the problem of multiple outputs.
 305     … If, for example, you don’t support PSVI, then the output
 306     of the test should be “an error” if you don’t specify PSVI
 307     and a document if you do. Both of those should qualify as
 308     a “pass”.
 309 -   Norm: We could probably extend the test suite to cover that.
 310
 311
 312 <a id="org6bd3e15"></a>
 313
 314 ### Review of the current working drafts
 315
 316 -   Current state of the specs
 317
 318 -   Achim leads discussion of
 319     <https://github.com/xproc/Workshop-2017-02/wiki/Some-comments-on-XProc-1.1>-(2017-01-22)
 320
 321 -   Discussion highlights:
 322
 323     -   What do we do about the p:document-properties() function; if the document
 324         is an argument to the function, what XDM value does it have when the
 325         document is not XML?
 326     -   Some discussion of the impact of sequences on this function
 327     -   Achim: What is the content-type of a sequence of three integers?
 328     -   Henry: Maybe only documents have a media type.
 329     -   Norm: “(1,2,3”) content-type=“x.vnd/xdm-item”
 330     -   Henry: We have said that arbitrary documents flow through the
 331         pipeline; but variables can hold arbitrary XDM types
 332     -   Achim: But only documents have metadata
 333     -   Henry: Variables can contain any XDM item. Pipelines
 334         consist of sequences (possibly unitary) of documents with
 335         a media type and associated metadata.
 336
 337     Further discussion of what limits we can/must/should put on
 338     the kinds of things that can flow through a pipeline.
 339     What happens to an XQuery that produces three numbers.
 340
 341     Achim outlines the ideas from his comments document about mapping
 342     from arbitrary documents to XDM instances.
 343
 344     Henry: I’d do it by having the output be a sequence of documents
 345     plus metadata. What we get is the media type and maybe a URI. (Or
 346     the filename in the case of a package.) Maybe you have to run a
 347     step to do the “file” magic to add a media type if it was absent.
 348     Now I can write a choose, based on the media type, that is inside
 349     a for-each. I can use document-properties() for that single
 350     document.
 351
 352     Achim: I think we can put the properties in the context so that
 353     we don’t have to load the document.
 354
 355     Henry: You’re absolutely right on two counts: it has to not be
 356     an error and you have to be able to get at the properties.
 357
 358     Henry and Romain discuss the propsect of using a handle and
 359     a special load function to get the data.
 360
 361     David: I’m concerned about using MIME types. What about encoding
 362     and MIME type parameters?
 363
 364     Achim: Mapping from text/plain to string isn’t quite right.
 365     But there are only four types.
 366
 367 – BREAK —
 368
 369 Achim: Two questions: what flows on ports and what happens if
 370 that becomes a context node. Why not restict whatever arrives
 371 on the port to XDM? The problem is with the current, abstract
 372 notion of documents in the spec.
 373
 374 Henry: We could just us an XProc private namespace to wrap
 375 a handle for everything else.
 376
 377 Achim: What happens if some non-XML representation appears as
 378 a context node. The spec says that this is implementation
 379 defined or an error. I tried to rescue this by having a
 380 mapping just for the case where the non-XML document node
 381 is used as a context element.
 382
 383 David: We’re not talking about mime types, we don’t mean
 384 mime types. We’re talking about types. We can map from
 385 media types to our types, but we should have our own types.
 386
 387 Liam: I don’t really think that’s a good idea. This is what
 388 mime types are for. You could make parameters available as
 389 metadata.
 390
 391 Norm: If we were going to go the handles route:
 392
 393 1.  An XML document would be … an XML document
 394
 395 1b. An JSON document would be … a JSON XDM representation
 396     (an array or a map)
 397 1c. XDM items other than documents … ???
 398
 399 1.  A text/\* document would be
 400     <p:handle type=”text/plain” handle="12345">…the text…</p:handle>
 401 2.  Any other document would be
 402     <p:handle type=”media/type” handle="5678"/>
 403
 404 The argument to the p:document-properties function is a node
 405 and it has to be a valid node or handle.
 406
 407 Achim: This is the solution I like least. We can do things
 408 which are not XML. When people ask, we show them XML documents.
 409
 410 Henry: No, this is just a way of having blobs that flow
 411 through the pipeline.
 412
 413 Henry: Step implementations never see the handle markup.
 414
 415 Norm: This abstraction is too leaky.
 416
 417 Henry: There’s basically a difference of view here. I’d like to
 418 deal with the general case in the general way and look at special
 419 cases for text later. Others are trying to deal with the special
 420 cases but don’t care so much about the general case.
 421
 422 Achim: What bothers me is whatever is a representation, what
 423 happens if it becomes the context node.
 424
 425 Geert: I’ve been wondering if the way NetKernel solves some
 426 issues is relevant. They have a URI for every object and these
 427 URIs have types. There’s a “transreption” when necessary to use
 428 the representation. Most of this is an implementation detail.
 429 If you want to use it, for example for XSLT, they define a
 430 mapping from each representation to something appropriate
 431 for XSLT.
 432
 433 Norm: Yes, I think a layer for doing transformations of
 434 representations is a good idea.
 435
 436 Geert: One of the things that makes it work is that
 437 the transreption library is small and fixed.
 438
 439 Romain: I think we are missing key use cases that would help
 440 us make this decision.
 441
 442 Achim: We could compare the solution with some different use
 443 cases. We have the one that I defined.
 444
 445 David: I’m thinking about cases where I create RDF and I want
 446 to serialize it as ntriples.
 447
 448 Liam: Going the other way would give you non-XML: loading
 449 some ntriples into an RDF graph.
 450
 451 Norm: zip manipulation, RDF graphs, JSON objects, text documents
 452 images, CSVs.
 453
 454 END OF DAY ONE