[ANSI-Smalltalk] Smalltalk file streams

Paolo Bonzini bonzini at gnu.org
Thu Oct 16 08:16:51 BST 2008


> (1) #contents is defined for file streams.
>     There is no permission granted for implementations to give
>     up and say "this is too hard" ("Errors" is "None" in section
>     5.10.1.1).
> 
>     -- How do you handle #contents for /dev/tty?  Read and preserve
>        every character until the machine is powered off, then send
>        the results back in a time machine?

Simple: PEBKAC.

>     -- I have a 500 GB data file.  My virtual memory is much smaller.
>        How is #contents supposed to work in that case?

Probably fail in the same way as "Array new: 100 factorial" would fail.
 So, PEBKAC again? :-)

>     -- The same for serial ports, sockets, and of course pipes.

Most of the sockets my web browser opens are closed very soon.  Same for
the pipes my shell opens.  If one asks for #contents on the `chargen'
service... you guessed it, PEBKAC.

>     -- Historicially, Smalltalk-80 could open a file read-only or
>        it could open it read-write.  The Smalltalk standard doesn't
>        really seem aware of the possibility of the existence of
>        write-only streams (such as streams sent to serial ports,
>        sockets, pipes, the screen, &c) and operating systems with
>        such things as "append-only" file permissions.  How do you
>        implement #contents for a file you can write to but not read?

The erratum here is that #contents belongs in <gettableStream> and
<WriteStream>, not in <collectionStream> and <FileStream>.  I sent
another email on the subject.

> (2) #position is defined for file streams.
>     It is defined for general sequence streams to be the number of
>     elements in the past sequence values.
> 
>     There appears to be a tacit assumption that
>     one internal character = one external byte.

No, there's no such assumption.  It's just that file streams do not hide
that the underlying storage is bytes, not character.  That's the same
contract that Windows makes when you open CRLF-terminated files.  So,
you cannot assume that

   a := s next: 2.
   s position: s position - 2.
   b := s next: 2.
   a = b

but you can assume that

   p := s position.
   a := s next: 2.
   s position: p.
   b := s next: 2.
   a = b

>     In old systems using ISO 2022, Shift-JIS, EUC, XNS, or
>     other variable-width encodings, and in modern systems using
>     UTF-8, "number of characters" and "number of bytes" are
>     linked by a rather floppy rubber ruler.

ISO-2022 has stateful encodings, which is much worse.  But for UTF-8, or
even for SJIS and others, you can assume that the second snippet I gave
above works even if #position returns the number of bytes.  Which I
believe is the right thing to do, because that's what the OS gives you
(with lseek or an equivalent system call).  Anything else should be
implemented with some kind of decorator.

> (3) #position: is defined for file streams.
>     The only permission for it to fail is that given in 5.9.1.5,
>     where we're in trouble if the argument is not an integer or
>     is not in the range 0..size.

The only problem I see with the standard is that there is no standard
exception for OS errors (errno in Unix parlance).  This falls under that
case (ESPIPE under Unix, and there are surely similar errors for Win32
system calls).

>     I note that discussions before I joined this mailing list
>     identified a need for Sockets in the revised standard.  So
>     dealing with #position[:] is timely.

Not necessarily.  For example, GNU Smalltalk does implement low-level
sockets using a subclass of FileDescriptor (which would support
#position and friends, except that they obviously fail for sockets), but
the actual classes meant for the user are direct subclasses of Stream.

Paolo



More information about the ANSI-Smalltalk mailing list