## **Note: this wiki is now retired and will no longer be updated!**

**The static final versions of the pages are left as a convenience for readers. Note that meta-pages such as "discussion," "history," etc., will not work.**

# SICP exercise 2.60

## Problem

We specified that a set would be represented as a list with no duplicates. Now suppose we
allow duplicates. For instance, the set {1,2,3} could be represented as the list `(2 3 2 1 3 2 2)`.

Design procedures `element-of-set?`, `adjoin-set`, `union-set`, and `intersection-set` that
operate on this representation. How does the efficiency of each compare with the corresponding procedure
for the non-duplicate representation? Are there applications for which you would use this representation in
preference to the non-duplicate one?

## Solution

The implementation of the `element-of-set?` procedure doesn't change: it returns `true` when it finds the first element that matches the input element, otherwise it returns `false`. Nothing in the functional behavior of that definition depends on the "no duplicates" invariant of the original set representation.

(define true #t) (define false #f) (define (element-of-set? x set) (cond ((null? set) false) ((equal? x (car set)) true) (else (element-of-set? x (cdr set)))))

The new `adjoin-set` is simple: it just `cons`es the input element onto the existing set.

(define (adjoin-set x set) (cons x set))

Note that we could also implement it by `append`ing the new element to the existing set, but that implementation would probably be slower than the `cons`ing one, at least for sets with more than a few elements, because it would need to find the end of the list representing the set. (The `append` version of `adjoin-set` would also cause a copy to be made of the existing set, whereas the `cons` version can probably avoid making a copy by sharing list tail structure.)

Like `element-of-set?`, the implementations of `intersection-set` and `union-set` work as-is. The functional behavior of their implementations doesn't depend on the "no duplicates" invariant of the original set representation.

(define (intersection-set set1 set2) (cond ((or (null? set1) (null? set2)) '()) ((element-of-set? (car set1) set2) (cons (car set1) (intersection-set (cdr set1) set2))) (else (intersection-set (cdr set1) set2)))) (define (union-set set1 set2) (cond ((null? set1) set2) ((null? set2) set1) ((element-of-set? (car set1) set2) (union-set (cdr set1) set2)) (else (cons (car set1) (union-set (cdr set1) set2)))))

Tests:

(define s1 '(a b c)) (define s2 (adjoin-set 'd s1)) s2

*Output:*

`
`

(d a b c)

(define s3 (adjoin-set 'a s2)) s3

*Output:*

`
`

(a d a b c)

(element-of-set? 'a s3)

*Output:*

`
`

#t

(element-of-set? 'z s3)

*Output:*

`
`

#f

(union-set '(a z) s3)

*Output:*

`
`

(z a d a b c)

(intersection-set '(a z d) s3)

*Output:*

`
`

(a d)

(intersection-set '(a z d a) s3)

*Output:*

`
`

(a d a)

Note that `intersection-set` removes duplicates that appear in its second set argument, but not from its first; this behavior is a consequence of which list `intersection-set` `cdr`s down to select individual elements.

The efficiency implications of this new representation are as follows:

- The number of steps required by procedure
`adjoin-set`has gone from θ(*n*) to θ(1), i.e., it now runs in constant time.

- Sets are still represented as lists and the definitions of the remaining procedures are unchanged, so we can expect their orders of growth to be the same as well (θ(
*n*) for`element-of-set?`, θ(*n*^{2}) for`intersection-set`and`union-set`).

- Storage for sets will require at least as much space in this representation as they did in the "no duplicates" representation, probably more for most applications. Therefore, the
*n*in the growth estimates given above will likely be larger by some constant factor that varies from application to application. Because only`intersection-set`can ever produce a set smaller than its inputs, this factor is possibly unbounded for many applications.

- In general, then, when using this "duplicates OK" representation, all procedures except
`adjoin-set`will be slower by some constant factor (possibly a quite large constant factor), and sets will require additional storage proportional to the same factor. Therefore, we would probably only want to use this representation for applications in which`adjoin-set`is a frequent operation, compared to the other procedures, and in which the size of our sets is not an issue.