1 # agg, the news aggregator
5 agg is a news aggregator for POSIX-compliant systems.
7 It follows the UNIX philosophy and simply reads a news feed
8 from stdin and creates or updates a filesystem
9 representation of that feed.
11 No command line parameters, no user interface, not even
17 * 2011-04-16 agg-0.1.1 released
18 * 2011-04-08 agg-0.1.0 released
19 * 2011-04-01 development started
24 ### 2011-04-16 agg-0.1.1
26 * Included proper README.
27 * Included nomtime in make targets.
29 ### 2011-04-08 agg-0.1.0
43 To install somewhere else see Make.config.
45 Please, run the test suites, they've been written for *you*
46 and take only two seconds on a 500 MHz CPU anyways.
51 ### Writing file names that are are specified in the feed? What about security?
53 agg removes all slashes from file and directory names
54 before they are written, so everything ends up where it
55 belongs. You should run it in a dedicated directory,
58 ### But a malicious feed could use up all space/inodes.
60 Depends on your operating system (configuration). It's not
61 the job of a news aggregator to enforce quotas.
63 ### Why no download mechanism?
65 Because it's a news aggregator, not a
66 download-and-news-aggregation-program.
68 ### Why no user interface?
70 Because it's a news aggregator, not a
71 download-and-news-aggregation-and-news-reader-program.
72 The file system hierarchy created is pretty much usable
73 using the default UNIX tools. Feel free to write your own
76 ### No way! This program writes HTML!
78 Yes, I like to be able to subscribe to xkcd and similar,
79 even if it means I have to launch a graphical browser once
80 in a while. Anyways, there's
82 cat $item | elinks -dump
84 ### But do I have to download the feed by hand?
88 ### But this wastes traffic when there are no new items!
90 agg quits when it assumes that there are no new feeds (see
91 bugs). The amount of data read too much depends on the
92 ratio of processing vs. download rate.
94 wget $URL -O - --limit-rate=10K | agg
96 ### Okay. But it only works on a single feed!
98 for feed in `cat feeds`; do
99 (wget $feed -qO - --limit-rate=10K | agg) &
104 ### How to fetch only new items from feeds that don't use publication dates?
106 Not supported by agg itsself, since it would require a
107 second level storage that contains (hashes of) everything
108 the agg directory contained -- including items you
109 explicitly deleted. You can easily built such functionality
110 on top using a few lines of shell code.
116 * Currently only tested on GNU/Linux.
117 * Uses fixed size buffers to simplify code. May lead
118 cut-off news texts. The chances for this to happen are
119 rather low and without much consequences (you can always
120 follow the link). If you encounter a link that is larger
121 than 8KiB, let me know.
122 * Assumes items are ordered descending by publication date
123 (newest items on top). Processing is stopped as soon as
124 an old item is encountered.
125 * Assumes items only change if their publication date
126 changes. Again, for simplicity.
127 * Creation of a "sub-feed" directory if the channel
128 contained an element that had a title tag but is not an
130 * Supports only dates that have their time zone formatted
131 as +xxxx, not as their abbreviation.
132 * Item titles may conflict, especially if they were too
133 long and have been cutted.
134 * Items will always be (over-) written in the order they
135 are placed in the feed.
136 * HTML output is formatted badly.
137 * Standard mtime for items without pubDate should be now.
138 * Sometimes, mtime of feed directory is set to current
139 time. This seems to happen only when a "new" item is not
140 already stored locally. If it is, the mtime is not
146 * Andreas Waidler <arandes@programmers.at>
150 * git://repo.or.cz/agg.git
151 * http://www.repo.or.cz/w/agg.git
155 * http://programmers.at/work/on/agg
159 * http://programmers.at/work/on/agg/agg-0.1.1.tar.gz
160 * http://programmers.at/work/on/agg/agg-0.1.0.tar.gz
165 Copyright (C) 2011 Andreas Waidler <arandes@programmers.at>
167 Permission to use, copy, modify, and/or distribute this
168 software for any purpose with or without fee is hereby
169 granted, provided that the above copyright notice and this
170 permission notice appear in all copies.
172 THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS
173 ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL
174 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO
175 EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
176 INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
177 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
178 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
179 TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE
180 USE OR PERFORMANCE OF THIS SOFTWARE.