1 From: Alex Schroeder <alex@emacswiki.org>
2 Subject: Re: MS Word mode?
3 Date: Fri, 08 Nov 2002 00:40:15 +0100
5 Roger Mason <rmason@sparky2.esd.mun.ca> writes:
7 > There was a question about this recently on this forum. Look for
8 > undoc.el, I got it from the wiki (I think). It has worked very well for
9 > me to date, although I have not attempted ro read complex documents.
11 Well, it makes things readable, but it is far from perfect -- it seems
12 to just delete any non-ascii characters, such that sometimes you will
13 see words such as "Alex8" where "8" is some garbage that just looked
14 like being part of a real word... In other words, interfacing to
15 something like catdoc, antiword, or wvText (included with AbiWord)
16 might be cool. Actually all you need is this:
18 (add-to-list 'auto-mode-alist '("\\.doc\\'" . no-word))
21 "Run antiword on the entire buffer."
22 (shell-command-on-region (point-min) (point-max) "antiword - " t t))
26 ===============================================================================
28 From: Arnaldo Mandel <am@ime.usp.br>
29 Subject: Re: MS Word mode?
30 Date: Fri, 8 Nov 2002 11:52:33 -0200
32 Alex Schroeder wrote (on Nov 8, 2002):
34 > Actually all you need is this:
36 > (add-to-list 'auto-mode-alist '("\\.doc\\'" . no-word))
39 > "Run antiword on the entire buffer."
40 > (shell-command-on-region (point-min) (point-max) "antiword - " t t))
42 On my system there are lots of filenames ending in .doc whose files
43 are not Word files. So I modified your function thusly
46 "Run antiword on the entire buffer."
47 (if (string-match "Microsoft "
48 (shell-command-to-string (concat "file " buffer-file-name)))
49 (shell-command-on-region (point-min) (point-max) "antiword - " t t)))
51 Works in Solaris and Linux, and should work on other unixes as well.
55 ===============================================================================
57 From: Alex Schroeder <alex@emacswiki.org>
58 Subject: Re: MS Word mode?
59 Date: Fri, 08 Nov 2002 18:24:07 +0100
61 Arnaldo Mandel <am@ime.usp.br> writes:
64 > "Run antiword on the entire buffer."
65 > (if (string-match "Microsoft "
66 > (shell-command-to-string (concat "file " buffer-file-name)))
67 > (shell-command-on-region (point-min) (point-max) "antiword - " t t)))
69 Cool. I did not know about "file"... :)
71 My stuff is on the wiki, btw:
73 * http://www.emacswiki.org/cgi-bin/wiki.pl?AntiWord
77 ===============================================================================
79 From: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
80 Subject: Re: emacs rmail. How to convert .doc to plain text
81 Date: 24 Nov 2002 18:08:22 +0100
85 Puff Addison <puff@theaddisons.demon.co.uk> writes:
86 > Yes, please post your Emacs integration code.
88 Ok, see below. I should note that it is probably also possible to
89 (ab-)use jka-compr for this, which would make my two functions
96 (defun benny-antiword-file-handler (operation &rest args)
97 ;; First check for the specific operations
98 ;; that we have special handling for.
99 (cond ((eq operation 'insert-file-contents)
100 (apply 'benny-antiword-insert-file args))
101 ((eq operation 'file-writable-p)
103 ((eq operation 'write-region)
104 (error "Word documents can't be written"))
105 ;; Handle any operation we don't know about.
106 (t (let ((inhibit-file-name-handlers
107 (cons 'benny-antiword-file-handler
108 (and (eq inhibit-file-name-operation operation)
109 inhibit-file-name-handlers)))
110 (inhibit-file-name-operation operation))
111 (apply operation args)))))
113 (defun benny-antiword-insert-file (filename &optional visit beg end replace)
114 (set-buffer-modified-p nil)
115 (setq buffer-file-name (file-truename filename))
116 (setq buffer-read-only t)
117 (let ((start (point))
118 (inhibit-read-only t))
119 (if replace (delete-region (point-min) (point-max)))
121 (let ((coding-system-for-read 'utf-8)
122 (filename (encode-coding-string
124 (or file-name-coding-system
125 default-file-name-coding-system))))
126 (call-process "antiword" nil t nil "-m" "UTF-8.txt"
128 (list buffer-file-name (- (point) start)))))
130 (setq file-name-handler-alist
131 (cons '("\\.doc\\'" . benny-antiword-file-handler)
132 file-name-handler-alist))