1 # Information to extract #
3 This is the information we want to extract.
7 * Source documents?: `article.tex`, `headers.tex`, etc. Frequently, the source is contained in one zipped file, say `foo.tar.gz` which contains several documents. Should we really record all the source documents? Or it's enough to record the URL in which we found the source?
8 * URL in which we found the article. For example [arxiv:1406.3018](http://arxiv.org/e-print/1406.3018), [http://www.foo.com/bla.tex](http://www.foo.com/bla.tex)
9 * Type of document: LaTeX, ConTeXt, PDF, etc. See [[Suported formats]]
14 * Propositions and definitions
17 * Math Subject Classification
19 * Words: list of all different words
20 * Language (probably, statistically infered)
22 ## From the propositions ##
25 * Hipothesis: the prerequisites. This is part of statement which is supposed
26 * Thesis: the conclusion. This is part of statement
27 * Name: if it has. For example `Theorem of Pythagoras`
28 * References: if it links to references. For example `Theorem of Pythagoras [1, 2]` refers to `[1]` and `[2]` as references. `[1]` and `[2]` are links to documents. So we want to record this documents.
29 * List of article whose belongs to. In which articles appear this proposition
30 * Words (for quicker search). All words of proposition
32 # Documents for specific syntax #
34 * [[LaTeX|latex:extraction]]