1 GNU sed NEWS -*- outline -*-
3 * Noteworthy changes in release 4.9 (2022-11-06) [stable]
7 'sed --follow-symlinks -i' no longer loops forever when its operand
8 is a symbolic link cycle.
9 [bug introduced in sed 4.2]
11 a program with an execution line longer than 2GB can no longer trigger
12 an out-of-bounds memory write.
14 using the R command to read an input line of length longer than 2GB
15 can no longer trigger an out-of-bounds memory read.
17 In locales using UTF-8 encoding, the regular expression '.' no
18 longer sometimes fails to match Unicode characters U+D400 through
19 U+D7FF (some Hangul Syllables, and Hangul Jamo Extended-B) and
20 Unicode characters U+108000 through U+10FFFF (half of Supplemental
21 Private Use Area plane B).
22 [bug introduced in sed 4.8]
24 I/O errors involving temp files no longer confuse sed into using a
25 FILE * pointer after fclosing it, which has undefined behavior in C.
29 The 'r' command now accepts address 0, allowing inserting a file before
32 ** Changes in behavior
34 Sed now prints the less-surprising variant in a corner case of
35 POSIX-unspecified behavior. Before, this would print "n".
38 printf n | sed 'sn\nnXn'; echo
41 * Noteworthy changes in release 4.8 (2020-01-14) [stable]
45 "sed -i" now creates temporary files with correct umask (limited to u=rwx).
46 Previously sed would incorrectly set umask on temporary files, resulting
47 in problems under certain fuse-like file systems.
48 [bug introduced in sed 4.2.1]
52 distribute gzip-compressed tarballs once again
56 a year's worth of gnulib development, including improved DFA performance
59 * Noteworthy changes in release 4.7 (2018-12-20) [stable]
63 Some uses of \b in the C locale and with the DFA matcher would fail, e.g.,
64 the following would mistakenly print "123-x" instead of "123":
65 echo 123-x|LC_ALL=C sed 's/.\bx//'
66 Using a multibyte locale or certain regexp constructs (some ranges,
67 backreferences) would avoid the bug. [bug introduced in sed 4.6]
70 * Noteworthy changes in release 4.6 (2018-12-19) [stable]
74 sed now prints a clear error message when r/R/w/W (and s///w) commands
75 are missing a filename. Previously, w/W commands would fail with confusing
76 error message, while r/R would be a silent no-op.
78 sed now uses fully-buffered output (instead of line-buffered) when
79 writing to files. This should noticeably improve performance of "sed -i"
80 and other write commands.
81 Buffering can be disabled (as before) with "sed -u".
83 sed in non-cygwin windows environments (e.g. mingw) now properly handles
84 '\n' newlines in -b/--binary mode.
88 sed no longer accesses invalid memory (heap overflow) when given invalid
89 backreferences in 's' command [bug#32082, present at least since sed-4.0.6].
91 sed no longer adds extraneous NUL when given s/$//n command.
92 [related to bug#32271, present since sed-4.0.7]
94 sed no longer accesses invalid memory (heap overflow) with s/$//n regexes.
95 [bug#32271, present since sed-4.3].
99 New option, --debug: print the input sed script in canonical form
100 and annotate program execution.
103 * Noteworthy changes in release 4.5 (2018-03-31) [stable]
107 sed now fails when matching very long input lines (>2GB).
108 Before, sed would silently ignore the regex without indicating an
109 error. [Bug present at least since sed-3.02]
111 sed no longer rejects comments and closing braces after y/// commands.
112 [Bug existed at least since sed-3.02]
114 sed -E --posix no longer ignores special meaning of '+','?','|' .
115 [Bug introduced in the original implementation of --posix option in
118 sed -i now creates selinux context based on the context of the symlink
119 instead of the symlink target. [Bug present since at least sed-4.2]
120 sed -i --follow-symlinks remains unchanged.
122 sed now treats the sequence '\x5c' (ASCII 92, backslash) as literal
123 backslash character, not as an escape prefix character.
124 [Bug present since sed-3.02.80]
126 $ echo z | sed -E 's/(z)/\x5c1/' # identical to 's/(z)/\1/'
129 $ echo z | sed -E 's/(z)/\x5c1/'
133 * Noteworthy changes in release 4.4 (2017-02-03) [stable]
137 sed could segfault when invoked with specific combination of newlines
138 in the input and regex pattern. [Bug introduced in sed-4.3]
141 * Noteworthy changes in release 4.3 (2016-12-30) [stable]
145 sed's regular expression matching is now typically 10x faster
147 sed now uses unlocked-io where available, resulting in faster I/O
152 sed no longer mishandles anchors ^/$ in multiline regex (s///mg)
153 with -z option (NUL terminated lines). [Bug introduced in sed-4.2.2
154 with the initial implementation of -z]
156 sed no longer accepts a ":" command without a label; before, it would
157 treat that as defining a label whose name is empty, and subsequent
158 label-free "t" and "b" commands would use that label. Now, sed emits
159 a diagnostic and fails for that invalid construct.
161 sed no longer accesses uninitialized memory when processing certain
162 invalid multibyte sequences. Demonstrate with this:
163 echo a | LC_ALL=ja_JP.eucJP valgrind sed/sed 's/a/b\U\xb2c/'
164 The error appears to have been introduced with the sed-4.0a release.
166 The 'y' (transliterate) operator once again works with a NUL byte
167 on the RHS. E.g., sed 'y/b/\x00/' now works like tr b '\0'. GNU sed
168 has never before recognized \x00 in this context. However, sed-3.02
169 and prior did accept a literal NUL byte in the RHS, which was possible
170 only when reading a script from a file. For example, this:
171 echo abc|sed -f <(printf 'y/b/\x00/\n')|cat -A
172 is what stopped working. [bug introduced some time after sed-3.02 and
173 prior to the first sed-4* test release]
175 When the closed-above line number ranges of N editing commands
176 overlap (N>1), sed would apply commands 2..N to the line just
177 beyond the largest range endpoint.
178 [bug introduced some time after sed-4.09 and prior to release in sed-4.1]
179 Before, this command would mistakenly modify line 5:
180 $ seq 6|sed '2,4d;2,3s/^/x/;3,4s/^/y/'
185 $ seq 6|sed '2,4d;2,3s/^/x/;3,4s/^/y/'
190 An erroneous sed invocation like "echo > F; sed -i s//b/ F" no longer
191 leaves behind a temporary file. Before, that command would create a file
192 alongside F with a name matching /^sed......$/ and fail to remove it.
194 sed --follow-symlinks now works again for stdin.
195 [bug introduced in sed-4.2.2]
197 sed no longer elides invalid bytes in a substitution RHS.
198 Now, sed copies such bytes into the output, just as Perl does.
199 [bug introduced in sed-4.1 -- it was also present prior to 4.0.6]
201 sed no longer prints extraneous character when a backslash follows \c.
202 '\c\\' generates control character ^\ (ASCII 0x1C).
203 Other characters after the second backslash are rejected (e.g. '\c\d').
204 [bug introduced in the sed-4.0.* releases]
206 sed no longer mishandles incomplete multibyte sequences in s,y commands
207 and valid multibyte SHIFT-JIS characters in character classes.
208 Previously, the following commands would fail:
209 LC_ALL=en_US.UTF-8 sed $'s/\316/X/'
210 LC_ALL=ja_JP.shiftjis sed $'/[\203]/]/p'
211 [bug introduced some time after sed-4.1.5 and before sed-4.2.1]
215 The "L" command (format a paragraph like the fmt(1) command would)
216 has been listed in the documentation as a failed experiment for at
217 least 10 years. That command is now removed.
221 "make dist" now builds .tar.xz files, rather than .tar.gz ones.
222 xz is portable enough and in wide-enough use that distributing
223 only .tar.xz files is enough. It has been fine for coreutils, grep,
224 diffutils and parted for a few years.
229 new --sandbox option rejects programs with r/w/e commands.
232 * Noteworthy changes in release 4.2.2 (2012-12-22) [stable]
234 * don't misbehave (truncate input) for lines of length 2^31 and longer
236 * fix endless loop on incomplete multibyte sequences
238 * -u also does unbuffered input, rather than unbuffered output only
240 * New command `F' to print current input file name
242 * sed -i, s///w, and the `w' and `W' commands also obey the --binary option
243 (and create CR/LF-terminated files if the option is absent)
245 * --posix fails for scripts (or fragments as passed to the -e option) that
246 end in a backslash, as they are not portable.
248 * New option -z (--null-data) to separate lines by ASCII NUL characters.
250 * \x26 (and similar escaped sequences) produces a literal & in the
251 replacement argument of the s/// command, rather than including the
254 ----------------------------------------------------------------------------
257 * fix parsing of s/[[[[[[[[[]//
259 * security contexts are preserved by -i too under SELinux
261 * temporary files for sed -i are not made group/world-readable until
264 ----------------------------------------------------------------------------
267 * now released under GPLv3
269 * added a new extension `z` to clear pattern space even in the presence
270 of invalid multibyte sequences
272 * a preexisting GNU gettext installation is needed in order to compile
273 GNU sed with NLS support
275 * new option --follow-symlinks, available when editing a file in-place.
276 This option may not be available on some systems (in this case, the
277 option will *not* be a no-op; it will be completely unavailable).
278 In the future, the option may be added as a no-op on systems without
279 symbolic links at all, since in this case a no-op is effectively
280 indistinguishable from a correct implementation.
282 * hold-space is reset between different files in -i and -s modes.
284 * multibyte processing fixed
286 * the following GNU extensions are turned off by --posix: options [iImMsSxX]
287 in the `s' command, address kinds `FIRST~STEP' and `ADDR1,+N' and `ADDR1,~N',
288 line address 0, `e' or `z' commands, text between an `a' or `c' or `i'
289 command and the following backslash, arguments to the `l' command.
290 --posix disables all extensions to regular expressions.
292 * fixed bug in 'i\' giving a segmentation violation if given alone.
294 * much improved portability
296 * much faster in UTF-8 locales
298 * will correctly replace ACLs when using -i
300 * will now accept NUL bytes for `.'
302 ----------------------------------------------------------------------------
305 * fix parsing of a negative character class not including a closed bracket,
306 like [^]] or [^]a-z].
308 * fix parsing of [ inside an y command, like y/[/A/.
310 * output the result of commands a, r, R when a q command is found.
312 ----------------------------------------------------------------------------
315 * \B correctly means "not on a word boundary" rather than "inside a word"
317 * bugfixes for platform without internationalization
319 * more thorough testing framework for tarballs (`make full-distcheck')
321 ----------------------------------------------------------------------------
324 * regex addresses do not use leftmost-longest matching. In other words,
325 /.\+/ only looks for a single character, and does not try to find as
326 many of them as possible like it used to do.
328 * added a note to BUGS and the manual about changed interpretation
329 of `s|abc\|def||', and about localization issues.
331 * fixed --disable-nls build problems on Solaris.
333 * fixed `make check' in non-English locales.
335 * `make check' tests the regex library by default if the included regex
336 is used (regex tests had to be enabled separately up to now).
338 ----------------------------------------------------------------------------
341 * fix bug in 'y' command in multi-byte character sets
343 * fix severe bug in parsing of ranges with an embedded open bracket
345 * fix off-by-one error when printing a "bad command" error
347 ----------------------------------------------------------------------------
350 * preserve permissions of in-place edited files
352 * yield an error when running -i on terminals or other non regular files
354 * do not interpret - as stdin when using in-place editing mode
356 * fix bug that prevented 's' command modifiers from working
358 ----------------------------------------------------------------------------
361 * // matches the last regular expression even in POSIXLY_CORRECT mode.
363 * change the way we treat lines which are not terminated by a newline.
364 Such lines are printed without the terminating newline (as before)
365 but as soon as more text is sent to the same output stream, the
366 missing newline is printed, so that the two lines don't concatenate.
367 The behavior is now independent from POSIXLY_CORRECT because POSIX
368 actually has undefined behavior in this case, and the new implementation
369 arguably gives the ``least expected surprise''. Thanks to Stepan
370 Kasal for the implementation.
372 * documentation improvements, with updated references to the POSIX.2
375 * error messages on I/O errors are better, and -i does not leave temporary
376 files around (e.g. when running ``sed -i'' on a directory).
378 * escapes are accepted in the y command (for example: y/o/\n/ transforms
381 * -i option tries to set the owner and group to the same as the input file
383 * `L' command is deprecated and will be removed in sed 4.2.
385 * line number addresses are processed differently -- this is supposedly
386 conformant to POSIX and surely more idiot-proof. Line number addresses
387 are not affected by jumping around them: they are activated and
388 deactivated exactly where the script says, while previously
391 would actually delete lines 1,2,3,4 and 9 (!).
393 * multibyte characters are taken in consideration to compute the
394 operands of s and y, provided you set LC_CTYPE correctly. They are
395 also considered by \l, \L, \u, \U, \E.
397 * [\n] matches either backslash or 'n' when POSIXLY_CORRECT.
399 * new option --posix, disables all GNU extensions. POSIXLY_CORRECT only
400 disables GNU extensions that violate the POSIX standard.
402 * options -h and -V are not supported anymore, use --help and --version.
404 * removed documentation for \s and \S which worked incorrectly
406 * restored correct behavior for \w and \W: match [[:alnum:]_] and
407 [^[:alnum:]_] (they used to match [[:alpha:]_] and [^[:alpha:]_]
409 * the special address 0 can only be used in 0,/RE/ or 0~STEP addresses;
410 other cases give an error (you are hindering portability for no reason
411 if specifying 0,N and you are giving a dead command if specifying 0
414 * when a \ is used to escape the character that would terminate an operand
415 of the s or y commands, the backslash is removed before the regex is
416 compiled. This is left undefined by POSIX; this behavior makes `s+x\+++g'
417 remove occurrences of `x+', consistently with `s/x\///g'. (However, if
418 you enjoy yourself trying `s*x\***g', sed will use the `x*' regex, and you
419 won't be able to pass down `x\*' while using * as the delimiter; ideas on
420 how to simplify the parser in this respect, and/or gain more coherent
421 semantics, are welcome).
424 ----------------------------------------------------------------------------
427 * 0 address behaves correctly in single-file (-i and -s) mode.
429 * documentation improvements.
431 * tested with many hosts and compilers.
433 * updated regex matcher from upstream, with many bugfixes and speedups.
435 * the `N' command's feature that is detailed in the BUGS file was disabled
436 by the first change below in sed 4.0.8. The behavior has now been
437 restored, and is only enabled if POSIXLY_CORRECT behavior is not
440 ----------------------------------------------------------------------------
443 * fix `sed n' printing the last line twice.
445 * fix incorrect error message for invalid character classes.
447 * fix segmentation violation with repeated empty subexpressions.
449 * fix incorrect parsing of ^ after escaped (.
451 * more comprehensive test suite (and with many expected failures...)
453 ----------------------------------------------------------------------------
456 * VPATH builds working on non-glibc machines
458 * fixed bug in s///Np: was printing even if less than N matches were
461 * fixed infinite loop on s///N when LHS matched a null string and
462 there were not enough matches in pattern space
464 * behavior of s///N is consistent with s///g when the LHS can match
465 a null string (and the infinite loop did not happen :-)
467 * updated some translations
469 ----------------------------------------------------------------------------
472 * added parameter to `v' for the version of sed that is expected.
474 * configure switch --without-included-regex to use the system regex matcher
476 * fix for -i option under Cygwin
478 ----------------------------------------------------------------------------
483 * improvements to some error messages (e.g. y/abc/defg/ incorrectly said
484 `excess characters after command' instead of `y arguments have different
487 * `a', `i', `l', `L', `r' accept two addresses except in POSIXLY_CORRECT
488 mode. Only `q' and `Q' do not accept two addresses in standard (GNU) mode.
490 ----------------------------------------------------------------------------
493 * documentation fixes
495 * update regex matcher
497 ----------------------------------------------------------------------------
500 * fix packaging problem (two missing translation catalogs)
502 ----------------------------------------------------------------------------
507 * fix build problems (vpath builds and bootstrap builds)
509 ----------------------------------------------------------------------------
512 * Remove last vestiges of super-sed
514 * man page automatically built
516 * more translations provided
518 * portability improvements
520 ----------------------------------------------------------------------------
523 * Update regex matcher
525 ----------------------------------------------------------------------------
528 * `y' command supports multibyte character sets
530 * Update regex matcher
532 ----------------------------------------------------------------------------
535 * `R' command reads a single line from a file.
537 * CR-LF pairs are always ignored under Windows, even if (under Cygwin)
538 a disk is mounted as binary.
540 * More attention to errors on stdout
542 * New `W' command to write first line of pattern space to a file
544 * Can customize line wrap width on single `l' commands
546 * `L' command formats and reflows paragraphs like `fmt' does.
548 * The test suite makefiles are better organized (this change is
549 transparent however).
551 * Compiles and bootstraps out-of-the-box under MinGW32 and Cygwin.
553 * Optimizes cases when pattern space is truncated at its start or at
554 its end by `D' or by a substitution command with an empty RHS.
555 For example scripts like this,
557 seq 1 10000 | tr \\n \ | ./sed ':a; s/^[0-9][0-9]* //; ta'
559 whose behavior was quadratic with previous versions of sed, have
562 * New command `e' to pipe the output of a command into the output
565 * New option `e' to pass the output of the `s' command through the
566 Bourne shell and get the result into pattern space.
568 * Switched to obstacks in the parser -- less memory-related bugs
569 (there were none AFAIK but you never know) and less memory usage.
571 * New option -i, to support in-place editing a la Perl. Usually one
572 had to use ed or, for more complex tasks, resort to Perl; this is
573 not necessary anymore.
575 * Dumped buffering code. The performance loss is 10%, but it caused
576 bugs in systems with CRLF termination. The current solution is
577 not definitive, though.
579 * Bug fix: Made the behavior of s/A*/x/g (i.e. `s' command with a
580 possibly empty LHS) more consistent:
582 pattern GNU sed 3.x GNU sed 4.x
588 * Bug fix: the // empty regular expressions now refers to the last
589 regular expression that was matched, rather than to the last
590 regular expression that was compiled. This richer behavior seems
591 to be the correct one (albeit neither one is POSIXLY_CORRECT).
593 * Check for invalid backreferences in the RHS of the `s' command
596 * Support for \[lLuUE] in the RHS of the `s' command like in Perl.
598 * New regular expression matcher
600 * Bug fix: if a file was redirected to be stdin, sed did not consume
602 (sed d; sed G) < TESTFILE
604 double-spaced TESTFILE, while the equivalent `useless use of cat'
605 cat TESTFILE | (sed d; sed G)
607 printed nothing (which is the correct behavior). A test for this
608 bug was added to the test suite.
610 * The documentation is now much better, with a few examples provided,
611 and a thorough description of regular expressions. The manual often
612 refers to "GNU extensions", but if they are described here they are
613 specific to this version.
615 * Documented command-line option:
616 -r, --regexp-extended
617 Use extended regexps -- e.g. (abc+) instead of \(abc\+\)
619 * Added feature to the `w' command and to the `w' option of the `s'
620 command: if the file name is /dev/stderr, it means the standard
621 error (inspired by awk); and similarly for /dev/stdout. This is
622 disabled if POSIXLY_CORRECT is set.
624 * Added `m' and `M' modifiers to `s' command for multi-line
625 matching (Perl-style); in addresses, only `M' works.
627 * Added `Q' command for `silent quit'; added ability to pass
628 an exit code from a sed script to the caller.
630 * Added `T' command for `branch if failed'.
632 * Added `v' command, which is a do-nothing intended to fail on
633 seds that do not support GNU sed 4.0's extensions.
635 ----------------------------------------------------------------------------
638 * Started new version nomenclature for pre-3.03 releases. (I'm being
639 pessimistic in assuming that .90 won't give me enough breathing room.)
641 * Bug fixes: the regncomp()/regnexec() interfaces proved to be inadequate to
642 properly handle expressions such as "s/\</#/g". Re-abstracted the regex
643 code in the sed/ tree, and now use the re_search_2() interface to the GNU
644 regex routines. This change also fixed a bug where /./ did not match the
645 NUL character. Had the glibc folk fix a bug in lib/regex.c where
646 's/0*\([0-9][0-9]\)/X\1X/' failed to match on input "002".
648 * Added new command-line options:
650 Do not attempt to read-ahead more than required; do not buffer stdout.
651 -l N, --line-length=N
652 Specify the desired line-wrap length for the `l' command.
653 A length of "0" means "never wrap".
655 * New internationalization translations added: fr ru de it el sk pt_BR sv
656 (plus nl from 3.02a).
658 * The s/// command now understands the following escapes
666 \oNNN a character with the octal value NNN
667 \dNNN a character with the decimal value NNN
668 \xNN a character with the hexadecimal value NN
669 This behavior is disabled if POSIXLY_CORRECT is set, at least for the
670 time being (until I can be convinced that this behavior does not violate
671 the POSIX standard). (Incidentally, \b (backspace) was omitted because
672 of the conflict with the existing "word boundary" meaning. \ooo octal
673 format was omitted because of the conflict with backreference syntax.)
675 * If POSIXLY_CORRECT is set, the empty RE // now is the null match
676 instead of "repeat the last REmatch". As far as I can tell
677 this behavior is mandated by POSIX, but it would break too many
678 legacy sed scripts to blithely change GNU sed's default behavior.
680 ----------------------------------------------------------------------------
683 * Added internationalization support, and an initial (already out of date)
684 set of Dutch message translations (both provided by Erick Branderhorst).
686 * Added support for scripts like:
687 sed -e 1ifoo -e '$abar'
688 (note no need for \ <newline> after a, i, and c commands).
689 Also, conditionally (on NO_INPUT_INDENT) added
690 experimental support for skipping leading whitespace on
691 each {a,i,c} input line.
693 * Added addressing of the form:
694 /foo/,+5 p (print from foo to 5th line following)
695 /foo/,~5 p (print from foo to next line whose line number is a multiple of 5)
696 The first address of these can be any of the previously existing
697 addressing types; the +N and ~N forms are only allowed as the
698 second address of a range.
700 * Added support for pseudo-address "0" as the first address in an
701 address-range, simplifying scripts which happen to match the end
702 address on the first line of input. For example, a script
703 which deletes all lines from the beginning of the file to the
704 first line which contains "foo" is now simply "sed 0,/foo/d",
705 whereas before one had to go through contortions to deal with
706 the possibility that "foo" might appear on the first line of
709 * Made NUL characters in regexps work "correctly" --- i.e., a NUL
710 in a RE matches a NUL; it does not prematurely terminate the RE.
711 (This only works in -f scripts, as the POSIX.1 exec*() interface
712 only passes NUL-terminated strings, and so sed will only be able
713 to see up to the first NUL in any -e scriptlet.)
715 * Wherever a `;' is accepted as a command terminator, also allow a `}'
716 or a `#' to appear. (This allows for less cluttered-looking scripts.)
718 * Lots of internal changes that are only relevant to source junkies
719 and development testing. Some of which might cause imperceptible
720 performance improvements.
722 ----------------------------------------------------------------------------
725 * Fixed a bug in the parsing of character classes (e.g., /[[:space:]]/).
726 Corrected an omission in djgpp/Makefile.am and an improper dependency
727 in testsuite/Makefile.am.
729 ----------------------------------------------------------------------------
732 * This version of sed mainly contains bug fixes and portability
733 enhancements, plus performance enhancements related to sed's handling
734 of input files. Due to excess performance penalties, I have reverted
735 (relative to 3.00) to using regex.c instead of the rx package for
736 regular expression handling, at the expense of losing true POSIX.2
737 BRE compatibility. However, performance related to regular expression
738 handling *still* needs a fair bit of work.
740 * One new feature has been added: regular expressions may be followed
741 with an "I" directive ("i" was taken [the "i"nsert command]) to
742 indicate that the regexp should be matched in a case-insensitive
743 manner. Also of note are a new organization to the source code,
744 new documentation, and a new maintainer.
746 ----------------------------------------------------------------------------
749 * This version of sed passes the new test-suite donated by
752 * Overall performance has been improved in the following sense: Sed 3.0
753 is often slightly slower than sed 2.05. On a few scripts, though, sed
754 2.05 was so slow as to be nearly useless or to use up unreasonable
755 amounts of memory. These problems have been fixed and in such cases,
756 sed 3.0 should have acceptable performance.