1 .TH AGG 1 2011-04-04 agg "the news aggregator"
13 is a news aggregator following the UNIX philosophy. It
14 simply reads a news feed (currently RSS only) from stdin
15 and creates or updates a filesystem representation of
20 creates or updates the following directory structure in the
21 current working directory:
32 uses the mtime of files and directories to represent dates
35 If the feed directory does not exist,
37 will create it and store all items in the feed there.
38 The mtimes of the files will be set to the corresponding
39 date of publication, the mtime of the feed directory will
40 be set to the date of publication of the most recent item.
42 If the feed directory already exists (e.g. on subsequent
45 checks the mtime of the feed directory and only fetches
46 items with a newer date of publication, again setting the
47 mtimes for the items fetched in this run. The mtime of the
48 feed directory will be set to the date of publication of
49 the most recent item that was fetched in this run.
51 If an item does not have a publication date, it is set to
54 By manually changing the mtime of the feed directory, you
55 can make agg either skip unfetched items or refetch old
58 To avoid unintentionally changing the mtime and thus
59 skipping items, you can use a tiny wrapper called
65 Writing file names that are are specified in the feed?
69 removes all slashes from file and directory names before
70 they are written, so everything ends up where it belongs.
71 You should run it in a dedicated directory, though.
73 But a malicious feed could use up all space/inodes.
75 Depends on your operating system (configuration). It's not
76 the job of a news aggregator to enforce quotas.
78 Why no download mechanism?
80 Because it's a news aggregator, not a
81 download-and-news-aggregation-program.
83 Why no user interface?
85 Because it's a news aggregator, not a
86 download-and-news-aggregation-and-news-reader-program.
87 The file system hierarchy created is pretty much usable
88 using the default UNIX tools. Feel free to write your own
91 No way! This program writes HTML!
93 Yes, I like to be able to subscribe to xkcd and similar,
94 even if it means I have to launch a graphical browser once
95 in a while. Anyways, there's
98 cat $item | elinks -dump
101 But do I have to download the feed by hand?
107 But this wastes traffic when there are no new items!
110 quits when it assumes that there are no new feeds (see
112 ). The amount of data read too much depends on the ratio
113 of processing vs. download rate.
116 wget $URL -O - --limit-rate=10K | agg
119 Okay. But it only works on a single feed!
124 for feed in `cat feeds`; do
125 (wget $feed -qO - --limit-rate=10K | agg) &
129 How to fetch only new items from feeds that don't use
134 itsself, since it would require a second level storage that
135 contains (hashes of) everything the
137 directory contained -- including items you
138 explicitly deleted. You can easily built such functionality
139 on top using a few lines of shell code.
147 Uses fixed size buffers to simplify code. May lead
148 cut-off news texts or links. The chances for this to
149 happen are low and without much consequences.
151 Assumes items are ordered descending by publication date
152 (newest items on top). Processing is stopped as soon as an
153 old item is encountered.
155 Assumes items only change if their publication date
156 changes. Again, for simplicity.
158 Creation of a "sub-feed" directory if the channel contained
159 an element that had a title tag but is not an item.
161 Supports only dates that have their time zone formatted as
162 +xxxx, not as their abbreviation.
164 Item titles may conflict, especially if they were too long
165 and have been cutted.
167 Items will always be (over-) written in the order they are
170 HTML output is formatted badly.
172 Standard mtime for items without pubDate should be now().
180 http://programmers.at/work/on/agg
184 git://repo.or.cz/agg.git
188 Andreas Waidler <arandes@programmers.at>