3 For [[extracting|Extraction Information]] information from LaTeX documents syntax, we have to pick up:
8 * authors (`\author`). In some ocasions, there are more than one author and they put the institutions in which they belong to.
10 * abstract (`\abstract` or `\begin{abstract} \end{abstract}`)
11 * propositions and definitions: with amsthm: `\newtheorem{name}[...]{Theorem}` implies that `\begin{name}` starts a theorem. See amsthm documentation. See `\theoremstyle{definition}`
12 * references: support \bibitems, support bibTeX and AMSRefs
13 * mathematical subject classification
14 * keywords (is there a command for that?)
19 * encoding of the text (`\inputenc`)
23 * We could collect the frequencies of the words (what are really words and not mathematical symbols?). The most frequent words are the most important? The key words are in the introduction and in the abstract.
24 * In the definitions, the words inside `{\em }` are the terms we want