In Python, a set is a type of built-in data structure used for storing data in the same way as a mathematical set. What makes this particularly useful is the fact that all operations on typical sets are supported by this data structure: union, intersection, set difference, etc.
All items stored in a set must be unique (i.e., no two elements can have the same value), but these items can be of different data types.
Sets are unordered -- so you won't be able to access their elements using indexing.
To create a set use the built-in function set():
myset1 = set() # creates an empty set
print(myset1) # print the set
type(myset1) # check type of variable myset1
# create 3 new sets:
myset2 = set(['a', 'b', 'c'])
myset3 = set('abc')
myset4 = set('aaaabc')
# check out what's inside....:
print('myset2 = ', myset2)
print('myset3 = ', myset3)
print('myset4 = ', myset4)
These simple examples illustrate a number of important points. To create a non-empty set, use the pattern:
<NameOfYourSet> = set(<argument>)
The 'argument' can be a list, a string or a tuple. If the 'argument' is a string, each character becomes a set element. If you want your set to contain strings (rather than characters), you must pass them to the built-in 'set()' function as a list. If the 'argument' contains duplicates (as was the case for 'myset4' above), only one duplicate will appear in the set. The 'argument' can be missing, in which case you end up with an empty set; elements can be added to this set later on, as we shall find out shortly.
Let's see some of these remarks in action. Assume that you want a set that contains three words (i.e., strings): 'one', 'two', and 'three'.
# some unexpected results with this set.....
myset5 = set('one', 'two', 'three')
# print them to convince yourself:
print('myset5 =', myset5)
# more unexpected results...
myset6 = set('one two three')
print('myset6 =', myset6)
# To make this work the way we intend
# we must pass the strings to 'set()' as a list:
myset7 = set(['one', 'two', 'three'])
print('myset7 = ', myset7)
Sets are mutable data structures; once you have created a set, you can delete or add elements as you wish. Python provides a whole range of built-in functions to help with that.
The len function, which we have already seen before, returns the number of elements in a set.
len(myset7) # will give the number of elements
The add method adds an element to a set, while the update method allows you to add a whole bunch of elements. Here are some examples:
myset = set([1, 2, 3, 4, 5]) # create a set; the numbers are passed as a list
print(myset) # check out what's inside
myset = set() # reset the initial set to 'empty'
print('\nBefore the loop:', myset)
myset.add(1) # adds '1'
for i in range(2,10): # a more efficient way of adding elements
myset.add(i)
print('\nAfter the loop:', myset)
myset = set()
print('Before update:', myset)
myset.update([1, 2, 3]) # three elements are added in one go
# 'update' is the name of the function that allows you to do that
print('\nAfter update:', myset)
Deleting elements from a set is done very easily. There are three different methods for doing this, depending on what you want to accomplish.
The remove and discard methods: remove a specified element from a set. The item that is to be removed must be passed to both methods as an argument. The difference between the two methods consists of how they behave when the specified element is not found in the set. The former will give you an error message and the execution of the code is stopped, while the latter does not give any error messages.
If you want to remove all the elements of a set, use the clear method. Here are some examples that illustrate all three built-in functions:
myset = set([1, 2, 3, 4]) # create a set
print('Original set is: ', myset) # display its contents
myset.remove(1)
print('\nAfter the 1st removal: ', myset) # check again
myset.discard(4)
print('\nAfter the 2nd removal: ', myset) # another check...
myset.discard(88) # element does not exist
print(myset)
# no error messages are displayed on the screen
myset.remove(88) # element does not exist
myset.clear() # clears the set/no elements left after this
print('After clearing: ', myset)
As in the case of dictionaries, you can use for-loops with sets (we say that sets are iterable). The syntax is very similar to that used in the previous unit.
The not in operator can be used to test whether a value does not exist in a set.
Here are some examples:
myset = set(['a', 'b', 'c', 'd'])
for item in myset:
print(item)
# Note that the order of the elements in the set is
# not that seen above (when the set was created)
myset = set(range(1,20, 2)) # quick way to define a set
print('Original set: ', myset)
L = [4, 11, 19] # some list of values
for i in range(len(L)):
if L[i] in myset:
print('\nThe value', L[i], 'is in my set')
else:
print('\nThe value', L[i], 'is NOT in my set')
# NOTE:
# if 'a in S' evaluates to TRUE, then 'a not in S' evaluates to FALSE
# if 'a not in S' evaluates to TRUE, then 'a in S' evaluates to FALSE
# the output of this code is the same as that of the above code
# the difference: instead of 'in' I used 'not in'....
myset = set(range(1,20, 2)) # quick way to define a set
print('Original set: ', myset)
L = [4, 11, 19] # some list of values
for i in range(len(L)):
if L[i] not in myset:
print('\nThe value', L[i], 'is in NOT my set')
else:
print('\nThe value', L[i], 'is in my set')
The union of two sets: is a new set that contains all the elements of both sets. We can define the union of an arbitrary number of sets in the same way.
Python comes with the built-in method 'union()' that allows us to find the union for any two sets:
# create a couple of sets:
set1 = set([1,2,3,4])
set2 = set([3,4,5,6])
print('set1 = ', set1)
print('set2 = ', set2)
# calculate their union:
set3 = set1.union(set2)
print('\n(1st) Union is: ', set3)
# alternative syntax:
set4 = set1 | set2
print('\n(2nd) Union is: ', set4)
The intersection of two sets: is a set that that contains only elements found in both sets.
Python comes with the built-in method 'intersection()':
# create a couple of sets:
set1 = set([1,2,3,4])
set2 = set([3,4,5,6])
print('set1 = ', set1)
print('set2 = ', set2)
# calculate their intersection:
set3 = set1.intersection(set2)
print('\n(1st) Intersection is: ', set3)
# alternative syntax:
set4 = set1 & set2
print('\n(2nd) Intersection is: ', set4)
The difference of two sets: is a set that contains the elements that appear in the first set but do not appear in the second set.
Python comes with the built-in 'difference()' method:
# create a couple of sets:
set1 = set([1,2,3,4])
set2 = set([3,4,5,6])
print('set1 = ', set1)
print('set2 = ', set2)
# calculate their (set) difference:
set3 = set1.difference(set2)
print('\n(1st) Set difference is: ', set3)
# alternative syntax:
set4 = set1 - set2
print('\n(2nd) Set difference is: ', set4)
The symmetric difference of two sets: is a set that contains the elements that are not shared by the two sets.
Python comes with the built-in 'symmetric_difference()' method:
# create a couple of sets:
set1 = set([1,2,3,4])
set2 = set([3,4,5,6])
print('set1 = ', set1)
print('set2 = ', set2)
# calculate their symmetric difference:
set3 = set1.symmetric_difference(set2)
print('\n(1st) Symmetric difference is: ', set3)
# alternative syntax:
set4 = set1 ^ set2
print('\n(2nd) Symmetric difference is: ', set4)
The set A is a subset of set B if all the elements in set A are included in set B.
Python allows you to check for subsets in two different ways. The first is the 'issubset()' method, which returns either 'True' or 'False', depending on whether or not the first set is a subset of the second. Alternatively, you can use '<=' to achieve the same result.
# create two simple sets
# note that 'set2' is a subset of 'set1'
set1 = set([1,2,3,4])
set2 = set([2,3])
a = set2.issubset(set1) # boolean variable
b = set1.issubset(set2) # boolean variable
# display on the screen the values of 'a' and 'b':
print('a = ', a)
print('b= ', b)
# alternative way of achieving the same
# as the above statements:
a1 = (set2 <= set1) # boolean variable
print('\na1 = ', a1)
b1 = (set1 <= set2) # boolean variable
print('b1 = ', b1)
Finally, we close this unit with a longer example (from the textbook included below).
The program creates two sets: one that holds the names of the students on the baseball team, and another one that holds the names of the students on the basketball team.
The program performs several operations that should be self-explanatory (see the comments inserted in the code). You are invited to study the code and run it several times.
# sets.py
# This program demonstrates various set operations.
#
# Create two sets:
baseball = set(['Jodi', 'Carmen', 'Aida', 'Rachel'])
basketball = set(['Eva', 'Carmen', 'Rachel', 'Sarah'])
# Display members of the baseball set.
print('The following students are on the baseball team:')
for name in baseball:
print(name)
# Display members of the basketball set.
print()
print('The following students are on the basketball team:')
for name in basketball:
print(name)
# Demonstrate intersection
print()
print('The following students play both baseball and basketball:')
for name in baseball.intersection(basketball):
print(name)
# Demonstrate union
print()
print('The following students play either baseball or basketball:')
for name in baseball.union(basketball):
print(name)
# Demonstrate difference of baseball and basketball
print()
print('The following students play baseball, but not basketball:')
for name in baseball.difference(basketball):
print(name)
# Demonstrate difference of basketball and baseball
print()
print('The following students play basketball, but not baseball:')
for name in basketball.difference(baseball):
print(name)
# Demonstrate symmetric difference
print()
print('The following students play one sport, but not both:')
for name in baseball.symmetric_difference(basketball):
print(name)
REFERENCE:
T. Gaddis, Starting out with Python (Fourth Edition), Pearson Education Ltd., 2018