1 Howto: Write website script
2 ===========================
5 1.1. Read $prefix/doc/quvi/CodingStyle
6 1.2. Work with the development code
8 1.4. Amp up libquvi verbosity
9 1.5. Isolate the problem
10 1.6. LUA script search paths
14 2.1.1. Generating HTTP traffic logs
16 2.2. Writing a website script
17 2.2.1. Additional documentation
19 2.3. When you have chosen a website
20 2.3.1. Working with the development code
21 2.3.2. Working with precompiled binaries
22 2.3.3. Typical steps to write the script
23 2.3.3.1. Testing your script
25 2.4. Generate the patch
26 2.5. Before you submit your website script
33 1.1. Read $prefix/doc/quvi/CodingStyle
34 --------------------------------------
36 Please read specifically the LUA guidelines. You can find the same
37 file in the $top_srcdir/doc/ directory.
40 1.2. Work with the development code
41 -----------------------------------
43 The web interface can be found at:
44 <http://repo.or.cz/w/quvi.git>
46 % git clone git://repo.or.cz/quvi.git
52 We use git in the examples of this documentation. Even if you are new to
53 git, you will see that generating patches with git is very easy. git also
54 preserves the patch contributor information along with the changes in the
55 repository log. In the long haul, this is beneficial to the project as
56 well as to the contributor.
58 Therefore, please make sure that you have set the following details in
59 the ~/.gitconfig file:
63 email = your_email_here
65 You may, of course, choose to use diff(1) or some other SCM instead of
69 1.4. Amp up libquvi verbosity
70 -----------------------------
72 You can increase the library verbosity which may aid while you are
73 working on your scripts.
75 % env QUVI_SHOW_SCANDIR=1 quvi
77 Makes the libquvi to dump the LUA script search dirs to the stderr.
79 % env QUVI_SHOW_SCRIPT=1 quvi
81 Similar, but in addition to this, dumps the full paths to LUA scripts.
83 % quvi --verbose-libcurl
85 Flip this switch on if you want to see what libcurl does behind the
89 1.5. Isolate the problem
90 ------------------------
92 While working on the LUA patterns, you may prefer to work with local
93 files instead of directly with (lib)quvi. e.g.:
95 % wget PAGE_URL -O output.html
99 io.input("output.html")
100 local page = io.read("*all")
102 local _,_,s = page:find(...)
108 Once you have perfected the patterns, you can then go ahead and write
109 the actual website script.
112 1.6. LUA script search paths
113 ----------------------------
116 $prefix/share/quvi/lua/README
118 $top_srcdir/share/lua/README
124 You can choose to work with the development source code of quvi or
125 precompiled binaries that you have installed onto your system. We cover
126 these briefly in "2.3".
128 Typical quvi website script:
131 * Parses video details (ID, title, media URL) from fetched page
132 * Returns parsed details
134 Some scripts may be more complex:
135 * Compare buzzhumor.lua to youtube.lua
136 * Compare funnyhub.lua to dailymotion.lua
138 Be sure to read also "1. General tips" for the supported environment
142 2.1. Choose a website
143 ---------------------
145 If you have none in mind, please visit our Trac at:
146 <http://sourceforge.net/apps/trac/quvi/report/1>
148 And check for any unassigned tickets. We could always use help.
151 2.1.1. Generating HTTP traffic logs
152 -----------------------------------
154 Note that analyzation of HTTP traffic is not always necessary but can
155 help. For example, if the media URL is not visible in the video page
156 HTML, digging deeper is often required.
158 Find yourself a system that can execute Adobe Flash object code and
159 capture the generated HTTP traffic. You may be able to figure out
160 how the media URLs are being constructed by analyzing this data.
162 Some contributors have reported that they have used Wireshark for this.
163 Others have reported that some of the Firefox add-ons can be used for
164 this, e.g. "Live HTTP Headers".
166 Even if you are not a programmer, you could always contribute us with
167 log data. We've seen this help the work before.
170 2.2. When you have chosen a website
171 -----------------------------------
173 How difficult task this turns out to be depends on how the website was
174 designed. Please compare some of the existing scripts to get a better
175 understanding of this.
177 You should also make a note that some websites use additional protocols
178 (e.g. RTMP, RTSP, MMS). If you are working on a script for a website
179 that uses, say RTMP, you need to define this in the protocol category
180 in the `ident' function in your script. See francetelevisions.lua
181 for an example of this.
183 quvi defaults to HTTP for historical reasons. This means that if you
184 expect quvi to support a non-HTTP website, make sure you define also
185 "--category-all, -a" or "--category-$scheme" when you run quvi, e.g.:
186 % quvi --category-rtmp URL
190 2.2.1. Additional documentation
191 -------------------------------
194 $prefix/share/quvi/lua/website/README
196 $top_srcdir/share/lua/website/README
198 This file details the functions expected to be found in each website script.
201 2.3. Writing a website script
202 -----------------------------
204 Let's assume that you have figured out how to parse the video details.
207 2.3.1. Working with the development code
208 ----------------------------------------
210 The web interface to our git repository can be found at:
211 <http://repo.or.cz/w/quvi.git>
213 You can grab the development code with:
214 % git clone git://repo.or.cz/quvi.git
216 If you prefer to work with precompiled quvi binaries, please jump to 2.3.2.
218 We will use a VPATH ("tmp") in this example.
219 % cd quvi ; mkdir tmp ; cd tmp
220 % ../configure ; make
222 Use buzzhumor.lua as a template script.
223 % cp ../share/lua/website/buzzhumor.lua ../share/lua/website/foo.lua
224 (open foo.lua in an editor)
226 Jump to 2.3.3 to continue.
229 2.3.2. Working with precompiled binaries
230 ----------------------------------------
232 Make sure that you have at least 0.2.0 installed to your system:
236 Use buzzhumor.lua as a template script.
237 % mkdir -p foo/lua/website/ ; cd foo
238 % cp -r $prefix/share/lua/util/ lua/
239 % cp -r $prefix/share/lua/website/quvi/ lua/website/
240 % cp $prefix/share/lua/website/buzzhumor.lua lua/website/foo.lua
242 So the foo/ dir should look like:
247 ./lua/website/foo.lua
249 ./lua/website/quvi/const.lua
250 ./lua/website/quvi/url.lua
251 ./lua/website/quvi/util.lua
252 ./lua/website/quvi/bit.lua
254 ./lua/util/content_type.lua
255 ./lua/util/charset.lua
258 (open foo.lua in an editor)
260 Jump to 2.3.3 to continue.
263 2.3.3. Typical steps to write the script
264 ----------------------------------------
266 If you are familiar with regular expressions, you will find many
267 similarities in LUA patterns. You can find more about the LUA patterns
268 at: <http://www.lua.org/pil/> -- Programming in Lua
270 1) Change the copyright line (add year, your name, your email).
272 Or you can use the project default:
273 "Copyright (C) year quvi project <http://quvi.sourceforge.net/>"
275 2) Modify the `ident' function in your script.
277 - r.domain = 'buzzhumor.com"
278 + r.domain = 'foo.bar'
280 A word about r.handles: when the `ident' function gets called, quvi
281 checks whether the script can handle the user defined page URL. To
282 do this, we define at least one domain signature and one path
283 signature to compare against the URL. For the sake of brewity, let's
286 - r.handles = U.handles(self.page_url, {r.domain}, {"/videos/"})
287 + r.handles = U.handles(self.page_url, {r.domain}, {"/watch/"})
289 Is enough. It may be easier to understand if we take a look at
291 http://www.buzzhumor.com/videos/32561/Girl_Feels_Shotgun_Power
292 http://foo.bar/watch/1234/
294 The `handles' function (of quvi/util.lua) that we call confirms that:
295 * the domain pattern (e.g. "buzzhumor.com") is found in the URL
296 * the path pattern (e.g. "/videos/") is found in the URL
298 You can leave r.formats untouched, unless you know that the website
299 supports more than one ("default") video format and you know how to
300 access those. If you are not sure, that's OK too, we can always add
303 See youtube.lua, dailymotion.lua and vimeo.lua for examples of scripts
304 with additional formats.
306 3) Change r.categories only if you know that the website uses non-HTTP.
308 See francetelevisions.lua for an example of this.
310 4) Move to the `parse' function. This is where most of the magic happens.
312 To keep things simple, let's go ahead and assume that our 'foo'
313 website is nearly identical to 'buzzhumor'.
315 4.1) Update the host_id.
317 - self.host_id = 'buzzhumor'
318 + self.host_id = 'foo'
320 4.2) In order to have something to work with, let's grab the video page
321 from the user specified URL.
323 local page = quvi.fetch(self.page_url)
325 4.3) Now that we have the page, it's time to parse the video details
326 from it. Let's start from the video title:
328 local _,_,s = page:find('<title>(.-)</title>')
329 self.title = s or error("no match: video title")
331 4.4) Grab the video ID.
333 local _,_,s = page:find('vid_id="(.-)"')
334 self.id = s or error("no match: video id")
336 4.5) We're almost done: the only remaining one is the media URL:
338 local _,_,s = page:find('vid_url="(.-)"')
339 self.url = {s or error("no match: video url")}
341 Make a note of the {}, we place the URL in a table.
344 2.3.3.1. Testing your script
345 ----------------------------
347 If you are working with the development code (see 2.3.1), run:
348 (still in $top_srcdir/tmp)
349 % env QUVI_BASEDIR=../share ./src/quvi TEST_URL
351 Or if you are working with precompiled quvi binaries (see 2.3.2), run:
355 You will most likely spend most of the time tweaking the patterns in
356 in your script. It often helps to "isolate the problem", e.g. copy
357 page data to a local file and write an additional script to perfect
358 the LUA patterns. See "1. General tips" for "Isolate the problem".
360 Read also $top_srcdir/tests/README for tips on how you can use the
361 existing test suite files to test your script.
364 2.4. Generate the patch
365 -----------------------
367 If you are working with the quvi development code (see 2.3.1):
368 (still in $top_srcdir/tmp)
369 % git add ../share/lua/website/foo.lua
371 Or if you are working with precompiled quvi binaries (see 2.3.2):
373 % git init ; git add lua/website/foo.lua
376 % git commit -am 'Add foo support'
377 % git format-patch -M -1
380 $prefix/doc/quvi/HowtoSubmitPatches
382 $top_srcdir/doc/HowtoSubmitPatches
385 2.5. Before you submit your website script
386 ------------------------------------------
388 * Does your script set and parse everything as expected?
395 * Does the website support more than one video format?
396 - If yes, see if you can add support for them
397 - We can, of course, add the support later
399 * Does the parsed video title contain extra characters?
400 - We want the video title *only*
401 - Anything else, e.g. domain name, should be left out
403 * If you are unsure about something, don't hesitate to ask
406 $prefix/doc/quvi/HowtoSubmitPatches
408 $top_srcdir/doc/HowtoSubmitPatches