[ANSI-Smalltalk] Behaviour of #collect:

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Sep 23 04:30:24 BST 2008


On 22 Sep 2008, at 11:35 pm, Ralph Johnson wrote:

> Interval>>collect: always returns an array.  In particular (1 to: 100)
> collect: [:each | each + 1] does not return (2 to: 101).

The point is that it doesn't return an Interval.

5.7.1.10 says of #collect: that

     "Unless specifically refined, this message is defined to answer an
      object conforming to the same protocol as the receiver."

5.7.9.2 is the special case for Interval.
5.7.17.4 is the special case for SortedCollection (the result is a
sequence, but not a sorted one).

My library has copied from Squeak, so Set has IdentitySet and
PluggableSet, and Bag has IdentityBag and PluggableBag.  Just
because the *inputs* to the transformer should be compared one
way, that doesn't mean the *outputs* should, so these return
Set and Bag.  When you do know what to return, there are
<class> withAll: collection collect: block
and
(<object creation>) addAll: collection collect: block
to use instead.

The tricky one is array-like things, because those are the things
that have specialised elements.  Notably characters and bytes, but
most Smalltalks support some kind of numeric array.

Here's the actual code from SequencedReadableCollection:

     collect: aBlock
       "ANSI 5.7.1.10 says 'Unless specifically refined, this message is
        defined to answer an object conforming to the same protocol as
        the receiver.'  #collect: is overridden for Dictionary, Bag, and
        Set, and I believe we are compatible with its definitions for
        those classes.  It's also overridden for Interval, where the
        definition inherited from Collection would have worked.  It
        is also overridden for SortedCollection.  But there is no
        overriding for String, Symbol, or ByteArray.  Now consider
        'abc' collect: [:each | each codePoint]
        #[255 1 2] collect: [:each | each + 1].
        These go bang in actual Smalltalks that follow the ANSI rules.
        But that's not very useful.  The simplest thing that could
        possibly work would be always return an Array.  This is the
        next best:  return the ANSI result if that is possible, or
        an Array if it is not."
       |r s t|
       s := self species.
       t := s elementType.
       (t == #Object or: [s == OrderedCollection])
         ifTrue: [^s withAll: self collect: aBlock].
       r := Array withAll: self collect: aBlock.
       t == #none ifTrue: [^r].
       (t == #byte and: [
         r allSatisfy: [:each | each integer_between: 0 and: 255]
       ]) ifTrue: [^s withAll: r].
       (t == #short and: [
         r allSatisfy: [:each | each integer_between: -32768 and: 32767]
       ]) ifTrue: [^s withAll: r].
       (t == #Character and: [
         r allSatisfy: [:each | each class == Character]
       ]) ifTrue: [^s withAll: r].
       (t == #FloatE and: [
         r allSatisfy: [:each | each class == FloatE]
       ]) ifTrue: [^s withAll: r].
       (t == #FloatD and: [
         r allSatisfy: [:each | each class == FloatD]
       ]) ifTrue: [^s withAll: r].
       (t == #FloatQ and: [
         r allSatisfy: [:each | each class == FloatQ]
       ]) ifTrue: [^s withAll: r].
       ^r

I'm sure people can improve on this code in various ways.
The revision I would like would go under <sequenced readable collection>
and would read something like

	If the transformer answers a result that
	an object conforming to the same protocol as the receiver
	would be unable to store,
	the return type is generalized to <sequencedReadableCollection>.

An alternative would be

	If the transformer answers a result that
	an object conforming to the same protocol as the receiver
	would be unable to store,
	an exception will be raised.
	In this case it is not defined whether the exception will
	be raised immediately or after all invocations of the
	transformer have completed.

I wouldn't like that, but at least it would be fair warning.

> A good point about your proposal is that you are only changing the
> implementation of collect: for cases that now cause an error.  So, it
> is unlikely that this change will break any programs that now work.

One obvious way to improve my code above would be to just let the
exception happen and return an array in the handler.

One rather annoying thing about the current ANSI Smalltalk standard
is that it has an elaborate standardised exception handling
mechanism, and almost no standardised exceptions.  One reason for
the code above *not* using exception handling is that there does
not seem to be any portable way for an ANSI Smalltalk program to
catch a "this kind of collection can't hold that kind of value"
exception.

Standardising exceptions is likely to be a political minefield because
no-one is going to want their customers' code breaking.  Maybe that is
why it wasn't done before.  But it really does need doing.





More information about the ANSI-Smalltalk mailing list