Note: this wiki is now retired and will no longer be updated!

The static final versions of the pages are left as a convenience for readers. Note that meta-pages such as "discussion," "history," etc., will not work.

SICP exercise 2.60

From Drewiki
Jump to: navigation, search

Problem

We specified that a set would be represented as a list with no duplicates. Now suppose we allow duplicates. For instance, the set {1,2,3} could be represented as the list (2 3 2 1 3 2 2).

Design procedures element-of-set?, adjoin-set, union-set, and intersection-set that operate on this representation. How does the efficiency of each compare with the corresponding procedure for the non-duplicate representation? Are there applications for which you would use this representation in preference to the non-duplicate one?

Solution

The implementation of the element-of-set? procedure doesn't change: it returns true when it finds the first element that matches the input element, otherwise it returns false. Nothing in the functional behavior of that definition depends on the "no duplicates" invariant of the original set representation.

(define true #t)
 
(define false #f)
 
(define (element-of-set? x set) 
  (cond ((null? set) false) 
        ((equal? x (car set)) true) 
        (else (element-of-set? x (cdr set)))))

The new adjoin-set is simple: it just conses the input element onto the existing set.

(define (adjoin-set x set)
  (cons x set))

Note that we could also implement it by appending the new element to the existing set, but that implementation would probably be slower than the consing one, at least for sets with more than a few elements, because it would need to find the end of the list representing the set. (The append version of adjoin-set would also cause a copy to be made of the existing set, whereas the cons version can probably avoid making a copy by sharing list tail structure.)

Like element-of-set?, the implementations of intersection-set and union-set work as-is. The functional behavior of their implementations doesn't depend on the "no duplicates" invariant of the original set representation.

(define (intersection-set set1 set2) 
  (cond ((or (null? set1) (null? set2)) '()) 
        ((element-of-set? (car set1) set2)        
         (cons (car set1) 
               (intersection-set (cdr set1) set2))) 
        (else (intersection-set (cdr set1) set2)))) 
 
(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        ((element-of-set? (car set1) set2)
         (union-set (cdr set1) set2))
        (else (cons (car set1) (union-set (cdr set1) set2)))))

Tests:

(define s1 '(a b c))
(define s2 (adjoin-set 'd s1))
s2

Output:

(d a b c)
(define s3 (adjoin-set 'a s2))
s3

Output:

(a d a b c)
(element-of-set? 'a s3)

Output:

#t
(element-of-set? 'z s3)

Output:

#f
(union-set '(a z) s3)

Output:

(z a d a b c)
 
(intersection-set '(a z d) s3)

Output:

(a d)
(intersection-set '(a z d a) s3)

Output:

(a d a)


Note that intersection-set removes duplicates that appear in its second set argument, but not from its first; this behavior is a consequence of which list intersection-set cdrs down to select individual elements.

The efficiency implications of this new representation are as follows:

  • The number of steps required by procedure adjoin-set has gone from θ(n) to θ(1), i.e., it now runs in constant time.
  • Sets are still represented as lists and the definitions of the remaining procedures are unchanged, so we can expect their orders of growth to be the same as well (θ(n) for element-of-set?, θ(n2) for intersection-set and union-set).
  • Storage for sets will require at least as much space in this representation as they did in the "no duplicates" representation, probably more for most applications. Therefore, the n in the growth estimates given above will likely be larger by some constant factor that varies from application to application. Because only intersection-set can ever produce a set smaller than its inputs, this factor is possibly unbounded for many applications.
  • In general, then, when using this "duplicates OK" representation, all procedures except adjoin-set will be slower by some constant factor (possibly a quite large constant factor), and sets will require additional storage proportional to the same factor. Therefore, we would probably only want to use this representation for applications in which adjoin-set is a frequent operation, compared to the other procedures, and in which the size of our sets is not an issue.
Personal tools