[ANSI-Smalltalk] Smalltalk file streams
Richard O'Keefe
ok at cs.otago.ac.nz
Fri Oct 17 06:48:15 BST 2008
On 16 Oct 2008, at 8:16 pm, Paolo Bonzini wrote:
>
>> (1) #contents is defined for file streams.
>> There is no permission granted for implementations to give
>> up and say "this is too hard" ("Errors" is "None" in section
>> 5.10.1.1).
>>
>> -- How do you handle #contents for /dev/tty? Read and preserve
>> every character until the machine is powered off, then send
>> the results back in a time machine?
>
> Simple: PEBKAC.
I am sorry, this is an exceptionally unhelpful response.
In the situation we are discussing, there IS NO KEYBOARD.
There IS NO CHAIR. There is no user to blame. We are at
this point discussing
WHAT THE STANDARD SAYS.
The standard says that #contents and #position and #position:
are defined on *ALL* file streams without exception and that
with the exception of positions that are out of range, there
are *NO* error conditions.
So we have
- some source, possibly sitting between a chair and a keyboard,
possibly another program, provides
- a legal file name, which
- is successfully opened by a Smalltalk program, which
- performs an operation which the standard *DEFINES*
and does not allow to fail.
It's not going to work.
But when it fails to work, HOW is it going to fail?
The heart of the problem here is that
THERE IS NO WAY FOR AN ANSI STANDARD PROGRAM TO *FIND OUT*
whether the operation will work or did work.
You cannot write a program that validates the input
here. Rejecting /dev/tty won't do, because there are
operating systems (like Windows) where that is a
perfectly sensible file name. Rejecting KBD won't do,
because there are operating systems (like Unix) where
that is a perfectly sensible file name. And having a
platform-dependent check isn't possible because there is
no standard way to find out what the platform *is*.
You can chant PEBKAC until you attain union with the One,
and it won't alter the fundamental situation: there is an
operation which the standard unconditionally demands full
support for, which CANNOT be implemented as stated, and
whose failure code written to the standard cannot prevent
or detect.
The problem, in short, exists IN THE STANDARD.
Or at any rate, >A< problem.
>> -- I have a 500 GB data file. My virtual memory is much smaller.
>> How is #contents supposed to work in that case?
>
> Probably fail in the same way as "Array new: 100 factorial" would
> fail.
> So, PEBKAC again? :-)
Again, an exceedingly unhelpful response.
Take Squeak as an example:
(Array new: 1000000*1000000) size
The result is a "Space is low" popup. Click "proceed"
at the result is another one. You never ever get told
"There is _never_ going to be that much memory available
in this incarnation of Squeak." VisualWorks is much
more helpful, the popup says "Unhandled exception: size
exceeds implementation limit of 2^28 elements."
Recast it as
((1 to: 1000000) collect: [:i | Array new: 1000000]) size
and VW thrashes for a while trying to collect enough garbage
to make room, but eventually notices and there's a popup saying
there isn't enough space.
Conclusion: there is no "THE ... way" that the Array new: call
would fail; one system raises an exception (and produces a
popup dialogue when that exception isn't caught), the other does
not. This means that in one system a program *can* catch and
recover from that problem, in the other, it can't.
Again, the standard does not admit that a call to (Array new: n)
with n a non-negative integer *can* fail, and there is accordingly
no standard way to detect or recover.
It is embarrassingly futile to chant PEBKAC about the 500GB
file example, because with today's 64-bit machines and the
availability of terabyte disc drives and file systems like
ZFS it isn't a *mistake* for the user to ask for the contents
of a 500GB files, it's just beyond the capacity of the machine
he happens to have this year. Maybe next year's machine will
be up to it, without the program or the file changing at all.
Indeed, in a Smalltalk with ReadOnlyByteArray on an operating
system with memory mapped I/O, you have a fighting chance of
getting the contents of a 500GB file as bytes right now.
(My own library has read-only memory mapped byte arrays.)
So
s := FileStream read: 'BigFile' type: #'binary'.
b := s contents.
might well work, while
s := FileStream read: 'BigFile' type: #'text'.
b := s contents.
might fail, because of the need to do CRLF mapping. Do you
still want to chant PEBKAC now? Is the *user* at fault for
imagining that 500GB of characters should be no worse than
500GB of bytes?
I don't really care if the response to asking for #contents
that won't fit is the same as the response to asking for an
array (or a quantity of arrays) that won't fit, but I cannot
think it unreasonable to hope for *some* standardised
response.
>> -- The same for serial ports, sockets, and of course pipes.
>
> Most of the sockets my web browser opens are closed very soon.
Which is irrelevant.
> Same for
> the pipes my shell opens. If one asks for #contents on the `chargen'
> service... you guessed it, PEBKAC.
Chanting PEBKAC is about as unhelpful as it can be.
Blaming the user doesn't actually SOLVE anything.
For one thing, chargen isn't all that apropos, because I
wasn't concerned with the size of the future values part
so much as the physical impossibility of recovering the
past values.
I'm sure we agree that it isn't reasonable to seek or ask
for contents on a whole lot of file system objects.
The problem is that the standard says it must work.
Of course, one problem with the standard is how much
influence it has not had. Squeak has a FileStream
class, but it does not support the ANSI file opening
methods, at least not out of the box. So
s := FileStream read: '/dev/tty'
does not work. Change it to the historic
s := FileStream oldFileNamed: '/dev/tty'
and Squeak on my Mac solemnly informs me that
/dev/tty does not exist. Feh! VisualWorks tells
me "device not configured". Double Feh! Nor can
I open a FIFO.
It appears that Squeak and Visual Works take the
view that their equivalents of ANSI file streams
are for disc files and disc files only and that
anyone trying to open any other kind of file system
object is in a state of terminal confusion and
should be slapped about the head with a dead fish
until they confess.
Sigh. I suppose that's one way to satisfy the requirements
of the standard. If it is the intent of the standard that
file streams should only be used for disc files, not for
any other kind of file system object, then that should be
explicit. With the examples of C, C++, Python, Perl, TCL,
Lisp, Haskell, Prolog, Ada, Fortran, &c before me, it
never occurred to me that Smalltalk file streams might be
restricted this way, and I do think it is important enough
to document it.
So which is it:
- the standard defines file stream operations that don't
make sense on a lot of file system objects, so it ought
to allow those operations to fail in a way that a program
can detect using only portable means,
or - the standard defines file stream operations that don't
make sense on a lot of file system objects, so despite
all the other operations, systems should not allow
such objects to be opened as file streams.
> The erratum here is that #contents belongs in <gettableStream> and
> <WriteStream>, not in <collectionStream> and <FileStream>. I sent
> another email on the subject.
Except that for smallish files, #contents is quite useful.
>
More information about the ANSI-Smalltalk
mailing list