Extraction-information.md

   1 # Information to extract #
   2
   3 This is the information we want to extract.
   4
   5 ## From the article ##
   6
   7   * Source documents?: `article.tex`, `headers.tex`, etc. Frequently, the source is contained in one zipped file, say `foo.tar.gz` which contains several documents. Should we really record all the source documents? Or it's enough to record the URL in which we found the source?
   8   * URL in which we found the article. For example [arxiv:1406.3018](http://arxiv.org/e-print/1406.3018), [http://www.foo.com/bla.tex](http://www.foo.com/bla.tex)
   9   * Type of document: LaTeX, ConTeXt, PDF, etc. See [[Suported formats]]
  10   * Title
  11   * Authors
  12   * Date
  13   * Abstract
  14   * Propositions and definitions
  15   * References
  16   * License
  17   * Math Subject Classification
  18   * Keywords
  19   * Words: list of all different words
  20   * Language (probably, statistically infered)
  21
  22 ## From the propositions ##
  23
  24   * Statement
  25   * Hipothesis: the prerequisites. This is part of statement which is supposed
  26   * Thesis: the conclusion. This is part of statement
  27   * Name: if it has. For example `Theorem of Pythagoras`
  28   * References: if it links to references. For example `Theorem of Pythagoras [1, 2]` refers to `[1]` and `[2]` as references. `[1]` and `[2]` are links to documents. So we want to record this documents.
  29   * List of article whose belongs to. In which articles appear this proposition
  30   * Words (for quicker search). All words of proposition
  31
  32 # Documents for specific syntax #
  33
  34   * [[LaTeX|latex:extraction]]