1 # agg, the news aggregator
5 agg is a news aggregator (currently RSS 2.0 only) for
6 POSIX-compliant systems (currently tested on GNU/Linux
9 It follows the UNIX philosophy and simply reads a news feed
10 from stdin and creates or updates a filesystem
11 representation of that feed.
13 No command line parameters, no user interface, not even
19 * 2011-05-11 agg-0.2.1 released
20 * 2011-05-10 agg-0.2.0 released
21 * 2011-04-16 agg-0.1.1 released
22 * 2011-04-08 agg-0.1.0 released
23 * 2011-04-01 development started
28 ### 2011-05-11 agg-0.2.1
30 * Adjusted documentation.
31 * Fixed install target of makefile.
33 ### 2011-05-10 agg-0.2.0
35 * Tests and refactoring.
36 * New output format, no HTML output anymore.
37 * Now requiring that title or description of items come
38 first, and title has to come before description.
39 * Made nomtime work from outside of feed directory
41 ### 2011-04-16 agg-0.1.1
43 * Included proper README.
44 * Included nomtime in make targets.
46 ### 2011-04-08 agg-0.1.0
60 For configuration see Make.config.
62 Please, run the test suites, they've been written for *you*
63 and take few seconds on a 500 MHz CPU anyways.
68 ### Writing file names that are are specified in the feed? What about security?
70 agg removes all slashes from file and directory names
71 before they are written, so everything ends up where it
72 belongs. You should run it in a dedicated directory,
75 ### But a malicious feed could use up all space/inodes.
77 Depends on your operating system (configuration). It's not
78 the job of a news aggregator to enforce quotas.
80 ### Why no download mechanism?
82 Because it's a news aggregator, not a
83 download-and-news-aggregation-program.
85 ### But do I have to download the feed by hand?
89 ### But this wastes traffic when there are no new items!
91 agg quits when it assumes that there are no new feeds (see
92 bugs). The amount of data read too much depends on the
93 ratio of processing vs. download rate.
95 wget $URL -O - --limit-rate=10K | agg
97 ### Okay. But it only works on a single feed!
99 for feed in `cat feeds`; do
100 (wget $feed -qO - --limit-rate=10K | agg) &
105 ### Why no user interface?
107 Because it's a news aggregator, not a
108 download-and-news-aggregation-and-news-reader-program. The
109 file system hierarchy is pretty much usable using various
112 Sky is the limit. Feel free to write your own frontend; you
113 should be able to find mine on my blog.
115 ### How to fetch only new items from feeds that don't use publication dates?
117 Not supported by agg itsself, since it would require a
118 second level storage that contains (hashes of) everything
119 the agg directory contained -- including items you
120 explicitly deleted. You can easily build such functionality
121 on top using a few lines of shell code.
123 Again, its a news aggregator not a caching program.
129 * Currently only tested on GNU/Linux.
130 * Uses fixed size buffers to simplify code. May lead
131 cut-off news texts. The chances for this to happen are
132 rather low and without much consequences (you can always
133 follow the link). If you encounter a link that is larger
134 than 8KiB, let me know.
135 * Assumes items are ordered descending by publication date
136 (newest items on top). Processing is stopped as soon as
137 an old item is encountered.
138 * Assumes items only change if their publication date
139 changes. Again, for simplicity.
140 * Creation of a "sub-feed" directory if the channel
141 contained an element that had a title tag but is not an
143 * Supports only dates that have their time zone formatted
144 as +xxxx, not as their abbreviation.
145 * Item titles may conflict, especially if they were too
146 long and have been cutted.
147 * Items will always be (over-) written in the order they
148 are placed in the feed.
149 * Standard mtime for items without pubDate should be now.
150 * Sometimes, mtime of feed directory is set to current
151 time. This seems to happen only when a "new" item is not
152 already stored locally. If it is, the mtime is not
154 * agg requires that the first element of an item is either
155 title or description, and that the former comes before the
156 latter. Many feeds are not formatted this way, and agg will
157 abort when encounterin this issue.
162 * Andreas Waidler <arandes@programmers.at>
166 * git://repo.or.cz/agg.git
167 * <http://www.repo.or.cz/w/agg.git>
171 * <http://programmers.at/work/on/agg>
175 * <http://programmers.at/work/on/agg/agg-0.2.1.tar.gz>
176 * <http://programmers.at/work/on/agg/agg-0.2.0.tar.gz>
177 * <http://programmers.at/work/on/agg/agg-0.1.1.tar.gz>
178 * <http://programmers.at/work/on/agg/agg-0.1.0.tar.gz>
183 Copyright (C) 2011 Andreas Waidler <arandes@programmers.at>
185 Permission to use, copy, modify, and/or distribute this
186 software for any purpose with or without fee is hereby
187 granted, provided that the above copyright notice and this
188 permission notice appear in all copies.
190 THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS
191 ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL
192 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO
193 EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
194 INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
195 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
196 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
197 TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE
198 USE OR PERFORMANCE OF THIS SOFTWARE.