Experimenting audioContext for Apple (Safari)
[sgc3.git] / mespeak / voices-and-languages.html
blob3126809acf07f9db57e554168dca75bf730587e5
1 <!DOCTYPE html>
2 <html lang="en">
3 <head>
4 <meta charset="utf-8" />
5 <title>meSpeak &ndash; Voices &amp; Languages</title>
6 <link href="http://fonts.googleapis.com/css?family=Open+Sans&amp;subset=latin" rel="stylesheet" type="text/css" />
7 <link href="http://fonts.googleapis.com/css?family=Lato:300&amp;subset=latin" rel="stylesheet" type="text/css" />
8 <style type="text/css">
9 html
11 margin: 0;
12 padding: 2em 1.5em 4.5em 1.5em;
13 background-color: #e2e3e4;
15 body
17 max-width: 900px;
18 padding: 2px 40px 60px 40px;
19 margin: 0 auto 0 auto;
20 background-color: #fafafb;
21 color: #111;
22 font-family: 'Open Sans',sans-serif;
23 font-size: 13px;
24 line-height: 19px;
26 h1,h2,h3,h4
28 font-family: 'Lato',sans-serif;
29 font-weight: 300;
31 h1 {
32 font-size: 46px;
33 line-height: 46px;
34 color: #2681a7;
35 margin-top: 0.5em;
36 margin-bottom: 0.5em;
37 padding: 0;
41 font-size: 36px;
42 color: #111;
43 margin-top: 0;
44 margin-bottom: 1.5em;
45 clear: both;
49 font-size: 24px;
50 color: #111;
51 margin-top: 2em;
55 font-size: 20px;
56 color: #111;
57 margin: 1.5em 0 1em 0.25em;
59 h1 span.pict { font-size: 38px; color: #ccc; margin-left: 0.5em; letter-spacing: -2px; }
60 p.codesample,xmp
62 margin: 1em 0;
63 padding: 1em 0 1em 2em;
64 white-space: pre;
65 font-family: monospace;
66 line-height: 18px;
67 background-color: #f2f3f5;
68 color: #111;
70 p.codesample strong { color: #222; }
71 a { color: #006f9e; }
72 a:hover,a:focus { color: #2681a7; }
73 a:active { color: #cd360e; }
74 p.action { margin: 1em 0 1em 1.5em; }
75 li { margin-bottom: 0.5em; }
76 </style>
77 </head>
78 <body>
79 <h1>meSpeak <span class="pict">(( &bull; ))</span></h1>
80 <h2>Voices &amp; Languages</h2>
82 <p>A short guide to the set-up of languages and voices for meSpeak.<br />
83 Please mind that meSpeak is based on an Emscripten-port of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a>, so all of the eSpeak grammar applies also to meSpeak.</p>
86 <h3>Standard Language Files</h3>
88 <p>meSpeak's language-files provide eSpeak's language- and voice-files in a single package.<br />(Since a voice usually refers to a language and its dictionary, it seems suitable to bundle them together in a single file.)<br />The language-files are of the following structure (JSON):</p>
90 <xmp>{
91 "voice_id": "<filename>",
92 "dict_id": "<filename>",
93 "dict": "<base64-encoded octet stream>",
94 "voice": "<base64-encoded octet stream>"
96 </xmp>
99 <p>The values of <em>voice_id</em> and <em>dict_id</em> are actually UNIX-filenames, <code>dict_id</code> relative to the path of eSpeak's data-directory &quot;<code>espeak-data/</code>&quot;, <em>voice_id</em> relative to &quot;<code>espeak-data/voices/</code>&quot;.</p>
101 <p>If we were to embed the files for the langage &quot;<code>en-en</code>&quot;, these would be:</p>
102 <ul>
103 <li>&quot;<code>en/en-en</code>&quot; for the voice and</li>
104 <li>&quot;<code>en_dict</code>&quot; for the dictionary used by &quot;en-en&quot;</li>
105 </ul>
107 <p>For a standard language-file, you would add a base64-representation as the string value of <em>dict</em> and <em>voice</em> of the respective eSpeak-files.</p>
110 <h3>Customizing</h3>
112 <p>There is an alternate layout for meSpeak's language-files, which is espacially usefull for the purpose of customizing and testing:</p>
114 <xmp>{
115 "voice_id": "<filename>",
116 "dict_id": "<filename>",
117 "dict": "<base64-encoded octet stream>",
118 "voice": "<text-string>",
119 "voice_encoding": "text"
121 </xmp>
123 <p>Since eSpeak's voice-files are actually plain-text files, you may use a simple string for these, if you provide an additional property <code>&quot;voice_encoding&quot;: &quot;text&quot;</code> at the same time.</p>
124 <p><em>For dictionaries, which are a binary files with eSpeak, see the note at the end of the page.</em></p>
126 <h4>Example</h4>
128 <p>For an example we will configure a basic female voice for &quot;en-us&quot;, which will be named &quot;en-us-f&quot;.</p>
130 <ol>
131 <li>Make a copy of a meSpeak-language file (json), which you want to modify (in this case &quot;<code>voices/en/en-us.json</code>).</li>
133 <li>Rename the file (e.g.: &quot;<code>en-us-f.json</code>&quot;) and open it in editor.</li>
135 <li>Download the source of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a> and go to the &quot;<code>espeak-data/</code>&quot; directory.</li>
137 <li>The eSpeak-file &quot;<code>espeak-data/voices/en-us</code>&quot; looks like this:
139 <xmp>// moving towards US English
140 name english-us
141 language en-us 2
142 language en-r
143 language en 3
144 gender male
145 // and more, skipped here
146 </xmp></li>
148 <li>Rename the &quot;<code>name</code>&quot; parameter to make it unique (e.g.: &quot;<code>name english-us-f</code>&quot;).</li>
150 <li>Change any paramaters as you whish, in this case change &quot;<code>gender male</code>&quot; to &quot;<code>gender female</code>&quot; for a female voice.</li>
152 <li>You should have arrived at something like this (first line removed, since it is just a comment):
154 <xmp>name english-us-f
155 language en-us 2
156 language en-r
157 language en 3
158 gender female
159 </xmp></li>
161 <li>Replace any line-breaks by &quot;<code>\n</code>&quot; in order to get a valid JSON-string:
163 <xmp>"name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female"</xmp>
165 And use this as a value for the &quot;<code>voice</code>&quot;-property of the JSON-file.</li>
167 <li>Add the line <code>&quot;voice_encoding&quot;: &quot;text&quot;</code> to the JSON to indicate that the voice is plain-text.<br />Your voice file should now look like this:
169 <xmp>Content of file: "en-us-f.json":
172 "voice_id": "en-us-f",
173 "dict_id": "en_dict",
174 "dict": "<base64-encoded octet stream>",
175 "voice": "name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female",
176 "voice_encoding": "text"
178 </xmp></li>
179 <li>Save it and load it into meSpeak.</li>
180 </ol>
182 <p><em>Please note that eSpeak is not very graceful with syntax errors in a voice-definition and will just throw an error, which will &mdash; in the case of meSpeak &mdash; show up in the console-log.</em></p>
184 <p>For further details on voice-parameters and fine-tuning, please refer to the eSpeak-documentation: <a href="http://espeak.sourceforge.net/voices.html" target="_blank">http://espeak.sourceforge.net/voices.html</a>.</p>
186 <h3>Custom Dictionaries</h3>
187 <p>eSpeak's dictonaries are binary files, which must be compiled with eSpeak first.<br />
188 You would have to install eSpeak and compile a file following the <a href="http://espeak.sourceforge.net/docindex.html" target="_blank">eSpeak documentation</a>.</br />
189 Further, you would insert a base64-encoded string of the resulting object-file's content as the value of the <em>dict</em> property of a meSpeak-language-file.<br />
190 Finally, you would set a suiting and unique value for the property <em>dict_id</em> (UNIX file path).</p>
191 <p>There is no shortcut to this. Sorry.</p>
193 <p>Please see also the section on the <em>extended voice format</em> at the <a href="./">main-page</em>.</p>
196 <p>&nbsp;</p>
197 <p>Norbert Landsteiner<br />
198 Vienna, July 2013</p>
200 </body>
201 </html>