mespeak/voices-and-languages.html

   1 <!DOCTYPE html>
   2 <html lang="en">
   3 <head>
   4         <meta charset="utf-8" />
   5         <title>meSpeak &ndash; Voices &amp; Languages</title>
   6         <link href="http://fonts.googleapis.com/css?family=Open+Sans&amp;subset=latin" rel="stylesheet" type="text/css" />
   7         <link href="http://fonts.googleapis.com/css?family=Lato:300&amp;subset=latin" rel="stylesheet" type="text/css" />
   8 <style type="text/css">
   9         html
  10         {
  11                 margin: 0;
  12                 padding: 2em 1.5em 4.5em 1.5em;
  13                 background-color: #e2e3e4;
  14         }
  15         body
  16         {
  17                 max-width: 900px;
  18                 padding: 2px 40px 60px 40px;
  19                 margin: 0 auto 0 auto;
  20                 background-color: #fafafb;
  21                 color: #111;
  22                 font-family: 'Open Sans',sans-serif;
  23                 font-size: 13px;
  24                 line-height: 19px;
  25         }
  26         h1,h2,h3,h4
  27         {
  28                 font-family: 'Lato',sans-serif;
  29                 font-weight: 300;
  30         }
  31         h1 {
  32                 font-size: 46px;
  33                 line-height: 46px;
  34                 color: #2681a7;
  35                 margin-top: 0.5em;
  36                 margin-bottom: 0.5em;
  37                 padding: 0;
  38         }
  39         h2
  40         {
  41                 font-size: 36px;
  42                 color: #111;
  43                 margin-top: 0;
  44                 margin-bottom: 1.5em;
  45                 clear: both;
  46         }
  47         h3
  48         {
  49                 font-size: 24px;
  50                 color: #111;
  51                 margin-top: 2em;
  52         }
  53         h4
  54         {
  55                 font-size: 20px;
  56                 color: #111;
  57                 margin: 1.5em 0 1em 0.25em;
  58         }
  59         h1 span.pict { font-size: 38px; color: #ccc; margin-left: 0.5em; letter-spacing: -2px; }
  60         p.codesample,xmp
  61         {
  62                 margin: 1em 0;
  63                 padding: 1em 0 1em 2em;
  64                 white-space: pre;
  65                 font-family: monospace;
  66                 line-height: 18px;
  67                 background-color: #f2f3f5;
  68                 color: #111;
  69         }
  70         p.codesample strong { color: #222; }
  71         a { color: #006f9e; }
  72         a:hover,a:focus { color: #2681a7; }
  73         a:active { color: #cd360e; }
  74         p.action { margin: 1em 0 1em 1.5em; }
  75         li { margin-bottom: 0.5em; }
  76 </style>
  77 </head>
  78 <body>
  79 <h1>meSpeak  <span class="pict">(( &bull; ))</span></h1>
  80 <h2>Voices &amp; Languages</h2>
  81
  82 <p>A short guide to the set-up of languages and voices for meSpeak.<br />
  83 Please mind that meSpeak is based on an Emscripten-port of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a>, so all of the eSpeak grammar applies also to meSpeak.</p>
  84
  85
  86 <h3>Standard Language Files</h3>
  87
  88 <p>meSpeak's language-files provide eSpeak's language- and voice-files in a single package.<br />(Since a voice usually refers to a language and its dictionary, it seems suitable to bundle them together in a single file.)<br />The language-files are of the following structure (JSON):</p>
  89
  90 <xmp>{
  91   "voice_id": "<filename>",
  92   "dict_id":  "<filename>",
  93   "dict":     "<base64-encoded octet stream>",
  94   "voice":    "<base64-encoded octet stream>"
  95 }
  96 </xmp>
  97
  98
  99 <p>The values of <em>voice_id</em> and <em>dict_id</em> are actually UNIX-filenames, <code>dict_id</code> relative to the path of eSpeak's data-directory &quot;<code>espeak-data/</code>&quot;, <em>voice_id</em> relative to &quot;<code>espeak-data/voices/</code>&quot;.</p>
 100
 101 <p>If we were to embed the files for the langage &quot;<code>en-en</code>&quot;, these would be:</p>
 102 <ul>
 103         <li>&quot;<code>en/en-en</code>&quot; for the voice and</li>
 104         <li>&quot;<code>en_dict</code>&quot; for the dictionary used by &quot;en-en&quot;</li>
 105 </ul>
 106
 107 <p>For a standard language-file, you would add a base64-representation as the string value of <em>dict</em> and <em>voice</em> of the respective eSpeak-files.</p>
 108
 109
 110 <h3>Customizing</h3>
 111
 112 <p>There is an alternate layout for meSpeak's language-files, which is espacially usefull for the purpose of customizing and testing:</p>
 113
 114 <xmp>{
 115   "voice_id": "<filename>",
 116   "dict_id":  "<filename>",
 117   "dict":     "<base64-encoded octet stream>",
 118   "voice":    "<text-string>",
 119   "voice_encoding": "text"
 120 }
 121 </xmp>
 122
 123 <p>Since eSpeak's voice-files are actually plain-text files, you may use a simple string for these, if you provide an additional property <code>&quot;voice_encoding&quot;: &quot;text&quot;</code> at the same time.</p>
 124 <p><em>For dictionaries, which are a binary files with eSpeak, see the note at the end of the page.</em></p>
 125
 126 <h4>Example</h4>
 127
 128 <p>For an example we will configure a basic female voice for &quot;en-us&quot;, which will be named &quot;en-us-f&quot;.</p>
 129
 130 <ol>
 131 <li>Make a copy of a meSpeak-language file (json), which you want to modify (in this case &quot;<code>voices/en/en-us.json</code>).</li>
 132
 133 <li>Rename the file (e.g.: &quot;<code>en-us-f.json</code>&quot;) and open it in editor.</li>
 134
 135 <li>Download the source of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a> and go to the &quot;<code>espeak-data/</code>&quot; directory.</li>
 136
 137 <li>The eSpeak-file &quot;<code>espeak-data/voices/en-us</code>&quot; looks like this:
 138
 139 <xmp>// moving towards US English
 140 name english-us
 141 language en-us 2
 142 language en-r
 143 language en 3
 144 gender male
 145 // and more, skipped here
 146 </xmp></li>
 147
 148 <li>Rename the &quot;<code>name</code>&quot; parameter to make it unique (e.g.: &quot;<code>name english-us-f</code>&quot;).</li>
 149
 150 <li>Change any paramaters as you whish, in this case change &quot;<code>gender male</code>&quot; to &quot;<code>gender female</code>&quot; for a female voice.</li>
 151
 152 <li>You should have arrived at something like this (first line removed, since it is just a comment):
 153
 154 <xmp>name english-us-f
 155 language en-us 2
 156 language en-r
 157 language en 3
 158 gender female
 159 </xmp></li>
 160
 161 <li>Replace any line-breaks by &quot;<code>\n</code>&quot; in order to get a valid JSON-string:
 162
 163 <xmp>"name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female"</xmp>
 164
 165 And use this as a value for the &quot;<code>voice</code>&quot;-property of the JSON-file.</li>
 166
 167 <li>Add the line <code>&quot;voice_encoding&quot;: &quot;text&quot;</code> to the JSON to indicate that the voice is plain-text.<br />Your voice file should now look like this:
 168
 169 <xmp>Content of file: "en-us-f.json":
 170
 171 {
 172   "voice_id": "en-us-f",
 173   "dict_id":  "en_dict",
 174   "dict":     "<base64-encoded octet stream>",
 175   "voice":    "name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female",
 176   "voice_encoding": "text"
 177 }
 178 </xmp></li>
 179 <li>Save it and load it into meSpeak.</li>
 180 </ol>
 181
 182 <p><em>Please note that eSpeak is not very graceful with syntax errors in a voice-definition and will just throw an error, which will &mdash; in the case of meSpeak &mdash; show up in the console-log.</em></p>
 183
 184 <p>For further details on voice-parameters and fine-tuning, please refer to the eSpeak-documentation: <a href="http://espeak.sourceforge.net/voices.html" target="_blank">http://espeak.sourceforge.net/voices.html</a>.</p>
 185
 186 <h3>Custom Dictionaries</h3>
 187 <p>eSpeak's dictonaries are binary files, which must be compiled with eSpeak first.<br />
 188 You would have to install eSpeak and compile a file following the <a href="http://espeak.sourceforge.net/docindex.html" target="_blank">eSpeak documentation</a>.</br />
 189 Further, you would insert a base64-encoded string of the resulting object-file's content as the value of the <em>dict</em> property of a meSpeak-language-file.<br />
 190 Finally, you would set a suiting and unique value for the property <em>dict_id</em> (UNIX file path).</p>
 191 <p>There is no shortcut to this. Sorry.</p>
 192
 193 <p>Please see also the section on the <em>extended voice format</em> at the <a href="./">main-page</em>.</p>
 194
 195
 196 <p>&nbsp;</p>
 197 <p>Norbert Landsteiner<br />
 198 Vienna, July 2013</p>
 199
 200 </body>
 201 </html>