[ANSI-Smalltalk] Smalltalk file streams
Richard O'Keefe
ok at cs.otago.ac.nz
Tue Oct 21 00:16:07 BST 2008
My apologies for my previous posting.
Word of honour, I didn't try to send it.
On 20 Oct 2008, at 3:00 pm, Paolo Bonzini wrote:
>
>> The standard needs to say something about this.
>> If there's a section of the ANSI standard that talks about
>> it, I've been unable to find it.
>>
>> Note that running out of memory is an implementation limit.
>
> Sure -- but running out of memory because you try to slurp the entire
> contents of a file in memory without first checking its size, is a
> programmer bug.
Sorry, wrong answer.
It *would* be a programmer bug *if*
(1) there were a standard way to find out how big the file is.
As it happens, there is:
p := stream position.
stream setToEnd.
size := stream position.
stream position: p.
Of course the snag here is that the idiom I see people
using is 'stream contents size'...
For a text stream, #setToEnd has to read every byte
along the way otherwise thanks to CRLF mapping it would
not know how many characters there were.
(2) there were a standard way to find out how much memory
the contents of the stream would take.
THERE IS NOT.
(3) there were a standard way to find out if something that
size could fit.
THERE IS NOT.
There is no way for a Smalltalk programmer to ask the question
"would the contents of this file fit into memory" using only
ANSI Smalltalk facilities. This is precisely the kind of setup
that I classified yesterday as "programmer incapacity".
In effect, the system is withholding information from the
programmer so that it can go "Nyah, nyah" and shrug off the blame.
Remember, what counts as "too big" does not depend solely on the
file. It depends on the file, the hardware, the Smalltalk
implementation, and what else is in Smalltalk memory, at least.
The obvious way to implement #contents is something like
p := stream position.
w := WriteStream on: (stream species new).
stream position: 0. "Why sockets can't do it"
stream do: [:each | w nextPut: each].
stream position: p. "Also why sockets can't do it"
^w contents
If <readFileStream> >> contents did not exist, people would
presumably write something like this to do the job.
With such an implementation, if the file is too big, some
attempt to extend the WriteStream will fail. In fact, any
attempt to extend a WriteStream may fail, and for that matter
may fail when there is sufficient free memory to hold the new
version (thank to fragmentation).
Fine. I am not saying that this cannot or should not happen.
I _am_ saying that because we KNOW that this kind of thing
can and does happen, the standard should not leave it not
only undefined but unmentioned!
It will also not do to chant PEBKAC or to say that it is a
programmer bug not to check, because there is *never* any way
for an ANSI Smalltalk programmer to find out whether there
would be enough room for anything. We are not even entitled
to assume that 1+1 will not run out of memory.
Even a plain statement that
(A) Sending any message may result in the exhaustion of
some system resource such as time, memory or disc space.
(B) What happens when this occurs is implementation-defined.
would be better than trying to pretend it can't happen,
which is what the ANSI standard does.
Reverting to (Array new: 100 factorial) or whatever it was,
there is a strange omission in section 3.6 "Implementation limits".
Section 3.6 tells us "The values of the following implementation
parameters are implementation defined and must be
documented by conforming implementations".
The sizes of arrays, byte arrays, and strings are not listed.
> I see now one difference between what you expected and what I
> expected:
> for some historical reason, on GNU Smalltalk #contents will always
> return only the *future* contents of the file; only ReadStream and
> ReadWriteStream include the past contents of the file. Probably it
> was
> done to minimize the differences between the handling of pipes and
> regular files, I don't remember.
That is certainly a conflict between GNU Smalltalk and the standard.
So we have
A use case for #futureContents in GNU Smalltalk.
A use case for #pastContents in WriteStream of
Squeak, VW, and VA.
A use case for #fullContents in ReadWriteStream and the standard.
More information about the ANSI-Smalltalk
mailing list