Sommaire

Du bon usage de...

Usenet

Son-of-RFC 1036
News Article Format and Transmission

A. Archeological Notes

A.1. A-News Article Format

The obsolete "A News" article format consisted of exactly five lines of header information, followed by the body. For example:

     Aeagle.642
     news.misc
     cbosgd!mhuxj!mhuxt!eagle!jerry
     Fri Nov 19 16:14:55 1982
     Usenet Etiquette - Please Read
     body
     body
     body

The first line consisted of an "A" followed by an article ID (analogous to a message ID and used for similar purposes). The second line was the list of newsgroups. The third line was the path. The fourth was the date, in the format above (all fields fixed width), resembling an Internet date but not quite the same. The fifth was the subject.

This format is documented for archeological purposes only. Do not generate articles in this format.

A.2. Early B-News Article Format

The obsolete pseudo-Internet article format, used briefly during the transition between the A News format and the mod- ern format, followed the general outline of a MAIL message but with some non-standard headers. For example:

     From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz)
     Newsgroups: news.misc
     Title: Usenet Etiquette -- Please Read
     Article-I.D.: eagle.642
     Posted: Fri Nov 19 16:14:55 1982
     Received: Fri Nov 19 16:59:30 1982
     Expires: Mon Jan 1 00:00:00 1990

     body
     body
     body

The From header contained the information now found in the Path header, plus possibly the full name now typically found in the From header. The Title header contained what is now the Subject content. The Posted header contained what is now the Date content. The Article-I.D. header contained an article ID, analogous to a message ID and used for similar purposes. The Newsgroups and Expires headers were approxi- mately as now. The Received header contained the date when the latest relayer to process the article first saw it. All dates were in the above format, with all fields fixed width, resembling an Internet date but not quite the same.

This format is documented for archeological purposes only. Do not generate articles in this format.

A.3. Obsolete Headers

Early versions of news software following the modern format sometimes generated headers like the following:

     Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
     Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
     Date-Received: Friday, 19-Nov-82 16:59:30 EST

Relay-Version contained version information about the relayer that last processed the article. Posting-Version contained version information about the posting agent that posted the article. Date-Received contained the date when the last relayer to process the article first saw it (in a slightly nonstandard format).

These headers are documented for archeological purposes only. Do not generate articles using them.

A.4. Obsolete Control Messages

There once was a senduuname control message, resembling sendsys but requesting transmission of the list of hosts that the receiving host had UUCP connections to. This rapidly ceased to be of much use, and many organizations consider information about their internal connectivity to be confidential.

Historically, a checkgroups body consisting of one or two lines, the first of the form "-n newsgroup", caused check- groups to apply to only that single newsgroup. This form is documented for archeological purposes only; do not use it.

Historically, an article posted to a newsgroup whose name had exactly three components of which the third was "ctl" signified that article was to be taken as a control message. The Subject header specified the actions, in the same way the Control header does now. This form is documented for archeological purposes only; do not use it; do not implement it.


B. A Quick Tour Of MIME

(The editor wishes to thank Luc Rooijakkers; most of this appendix is a lightly-edited version of a summary he kindly supplied.)

MIME (Multipurpose Internet Mail Extensions) is an upward- compatible set of extensions to RFC 822, currently docu- mented in RFCs 1341 and 1342. This appendix summarizes these documents. See the MIME RFCs for more information; they are very readable.

UNRESOLVED ISSUE: These RFC numbers (here and elsewhere in this Draft) need updating when the new MIME RFCs come out.

MIME defines the following new headers:

     MIME-Version
     Content-Type
     Content-Transfer-Encoding
     Content-ID
     Content-Description

The MIME-Version header is mandatory for all messages con- forming to the MIME specification and carries the version number of the MIME specification. Example:

     MIME-Version: 1.0

The Content-Type header indicates the content type of the message. Content types are split into a top-level type and a subtype, separated by a slash. Auxiliary information can also be supplied, using an attribute-value notation. Exam- ple:

     Content-Type: text/plain; charset=us-ascii

(In the absence of a Content-Type header this is in fact the default content type.)

Important type/subtype combinations are

text/plain
Plain text, possibly in a non- ASCII character set.
text/enriched
A very simple wordprocessor-like language supporting character attributes (e.g., underlining), justification control, and multi- ple character sets. (This pro- posal has gone through several iterations and has recently split off from the main MIME RFCs into a separate document.)
message/rfc822
A mail message conforming to a slightly-relaxed version of RFC 822.
message/partial
Part of a message (supporting the transparent splitting and joining of messages when they are too large to be handled by some trans- port agent).
message/external-body
A message whose body is external. Possible access methods include via mail, FTP, local file, etc.
multipart/mixed
A message whose body consists of multiple parts, possibly of dif- ferent types, intended to be viewed in serial order. Each part looks like an RFC 822 message, consisting of headers and a body. Most of the RFC 822 headers have no defined semantics for body parts.
multipart/parallel
Likewise, except that the parts are intended to be viewed in par- allel (on user agents that support it).
multipart/alternative
Likewise, except that the parts are intended to be semantically equivalent such that the part that best matches the capabilities of the environment should be dis- played. For example, a message may include plain-text, enriched- text, and postscript versions of some document.
multipart/digest
A variant of multipart/mixed espe- cially intended for message digests (the default type of the parts is message/rfc822 instead of text/plain, saving on the number of headers for the parts).
application/postscript
A PostScript document. (PostScript is a trademark of Adobe.)

Other top-level types exist for still images, audio, and video samples.

Some of the above types require the ability to transport binary data. Since the existing message systems usually do not support this, MIME provides a Content-Transfer-Encoding header to indicate the kind of encoding used. The possible encodings are:

7bit
No encoding; the data consists of short (less than 1000 characters) lines of 7-bit ASCII data, delimited by EOL sequences. This is the default encod- ing.
8bit
Like 7bit, except that bytes with the high-order bit set may be present. Many transmission paths are incapable of carrying messages which use this encoding.
binary
No encoding; any sequence of bytes may be present. Many transmission paths are incapable of carrying messages which use this encoding.
base64
The data is encoded by representing every group of 3 bytes as 4 characters from the alphabet "A-Za-z0-9+/", which was chosen for its high robustness through mail gateways (the alphabet used by uuencode does not survive ASCII-EBCDIC-ASCII translations). In the final group of 4 characters, "=" is used for those characters not repre- senting data bytes. Line length is limited and EOLs in the encoded form are ignored.
quoted-printable
Any byte can be represented by a three character "=XX" sequence where the X's are upper case hexadecimal digits. Bytes representing printable 7-bit US- ASCII characters except "=" may be rep- resented literally. Tabs and blanks may be represented literally if not at the end of a line. Line length is lim- ited, and an EOL preceded by "=" was inserted for this purpose and is not present in the original.

The base64 and quoted-printable encodings are applied to data in Internet canonical form, which means that any EOL encoded as anything but EOL must be an Internet canonical EOL: CR followed by LF.

The Content-Description header allows further description of a body part, analogous to the use of Subject for messages.

Finally, the Content-ID header can be used to assign an identification to body parts, analogous to the assignment of identifications to messages by Message-ID.

Note that most of these headers are structured header fields, as defined in RFC 822. Consequently, comments are allowed in their values. The following is a legal MIME header:

     Content-Type: (a comment) text (yeah)   /
             plain    (and now some params:) ; charset= (guess what)
        iso-8859-1 (we don't have iso-10646 yet, pity)

NOTE: Although the MIME specification was devel- oped for mail, there is nothing precluding its use for news as well. While it might simplify imple- mentation to restrict the MIME headers somewhat, in the same way that other news headers (e.g. From) are restricted subsets of the RFC-822 origi- nals, this would add yet another divergence between two formats that ought to be as compatible as possible. In the case of the MIME headers, there is no body of existing code posing compati- bility concerns. A full-featured MIME reading agent needs a full RFC-822 parser anyway, to prop- erly handle body parts of types like mes- sage/rfc822, so there is little gain from restricting MIME headers. Adopting the MIME spec- ification unchanged seems best. However, article- level MIME headers must still comply with the overall news header syntax given in section 4, so that news software which is NOT interested in MIME need not contain a full RFC-822 parser.

The second part of MIME, RFC 1342 (Representation of Non- ASCII Text in Internet Message Headers), addresses the prob- lem of non-ASCII characters in headers. An example of a header using the RFC 1342 mechanism is

     From: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>

Such encodings are allowed in selected headers, subject to the restrictions listed in RFC 1342.

The MIME effort has also produced an RFC defining a Content- MD5 header [rrr 1544], containing an MD5-based "checksum" of the contents of an article or body part, giving high confi- dence of detecting accidental modifications to the contents.

The "metamail" software package [rrr] helps provide MIME support with minimal changes to mailers, and may also be relevant to news reading agents.

The PEM (Privacy Enhanced Mail) effort is pursuing analogous facilities to offer stronger guarantees against malicious modifications, unauthorized eavesdropping, and forgery. This work too may be applicable to news, once it is recon- ciled with MIME (by efforts now underway).


C. Summary of Changes Since RFC 1036

This Draft is much longer than RFC 1036, so there is obvi- ously much change in content. Much of this is just increased precision and rigor. Noteworthy changes and addi- tions include:

  • section 4.3's restrictions on article bodies
  • all references to MIME facilities
  • size limits on articles
  • precise specification of Date-content syntax
  • message IDs must never be re-used, ever
  • "!" is the only Path delimiter
  • multiple moderators in the Approved header
  • rules on References trimming, and the _-_ mechanism
  • generalization of the Xref rules
  • multiple message IDs in Cancel and Supersedes
  • Also-Control
  • See-Also
  • Article-Names
  • Article-Replacing
  • more precise rules for cancellation
  • cancellation authorization based on From, not Sender
  • "unmoderated" and descriptors in newgroup messages
  • restrictive rules on handling of sendsys and version messages
  • the whogets control message
  • precise specification of checkgroups messages
  • compression type preferably specified out-of-band
  • rules for encapsulating news in MIME mail
  • tighter specification of relayer functioning (section 9.1)
  • the "newsmaster" contact address
  • rules for gatewaying (section 10)
  • discussion of security issues (section 11)

D. Summary of Completely New Features

Most of this Draft merely documents existing practice, but there are a few attempts to extend it. These are:

TBW


E. Summary of Differences From RFC 822+1123

The following are noteworthy differences between this Draft's articles and MAIL messages:

  • generally less-permissive header syntax
  • notably, limited From syntax
  • MAIL header comments allowed in only a few contexts
  • slightly more restricted message-ID syntax
  • several more mandatory headers
  • duplicate headers forbidden
  • References/See-Also versus In-Reply-To/References (section 6.5)
  • case sensitivity in some contexts
  • point-to-point headers, e.g. To and Cc, forbidden (section 6)
  • several new headers

References

[Sanderson] "Smileys", David Sanderson, O'Reilly & Associates Ltd., 1993.

TBW


Security Considerations

Section 11 discusses security considerations in detail.


Author's Address

Henry Spencer
henry@zoo.toronto.edu

SP Systems
Box 280 Stn. A
Toronto, Ont. M5W1B2 Canada


[Part 1] [Part 2] [Part 3] [Annexes]

Valid XHTML 1.0! Retour au sommaire Valid CSS!