note.nkmk.me

Set operations in Python (union, intersection, symmetric difference, etc.)

Posted: 2021-09-11 / Tags: Python, Mathematics

Python provides a built-in data type set for handling sets.

The set type is a collection of non-overlapping elements (elements that are not the same value, unique elements) and can perform set operations such as union, intersection, difference, symmetric difference, etc.

This article describes the following contents.

Basic operations:

  • Create a set object: {}, set()
  • Set comprehensions
  • Get the number of elements in the set: len()
  • Add an element to the set: add()
  • Remove an element from the set: discard(), remove(), pop(), clear()

Mathematical operations:

  • Union: | operator, union()
  • Intersection: & operator, intersection()
  • Difference: - operator, difference()
  • Symmetric difference: ^ operator, symmetric_difference()
  • Test if A is a subset of B: <= operator, issubset()
  • Test if A is a superset of B: >= operator, issuperset()
  • Test if A and B are disjoint: isdisjoint()

The set type is a mutable type that can add and remove elements. There is also a frozenset type which has methods such as set operations but is immutable (it cannot be modified by adding or removing elements).

Sponsored Link

Create a set object: {}, set()

Create a set object with curly brackets {}

set objects can be created by enclosing elements in curly brackets {}.

If there are duplicate values, they are ignored and only the unique values remain as elements.

s = {1, 2, 2, 3, 1, 4}

print(s)
print(type(s))
# {1, 2, 3, 4}
# <class 'set'>
source: set.py

set can have different types as elements, but cannot have mutable objects such as list.

The set type is unordered, so the order in which it was created is not preserved.

s = {1.23, 'abc', (0, 1, 2), 'abc'}

print(s)
# {(0, 1, 2), 1.23, 'abc'}

# s = {[0, 1, 2]}
# TypeError: unhashable type: 'list'
source: set.py

Even if the types are different, such as int and float, they are considered duplicates if the values are equal.

s = {100, 100.0}

print(s)
# {100}
source: set.py

Since an empty {} is considered to be a dictionary dict, an empty set can be created using set() described next.

s = {}

print(s)
print(type(s))
# {}
# <class 'dict'>
source: set.py

Create a set object with set()

set objects can also be created with set().

By specifying an iterable object such as a list or a tuple as an argument, a set object is created in which duplicate elements are excluded and only unique values remain.

l = [1, 2, 2, 3, 1, 4]

print(l)
print(type(l))
# [1, 2, 2, 3, 1, 4]
# <class 'list'>

s_l = set(l)

print(s_l)
print(type(s_l))
# {1, 2, 3, 4}
# <class 'set'>
source: set.py

For an immutable frozenset, use frozenset().

fs_l = frozenset(l)

print(fs_l)
print(type(fs_l))
# frozenset({1, 2, 3, 4})
# <class 'frozenset'>
source: set.py

If the argument is omitted, an empty set is generated.

s = set()

print(s)
print(type(s))
# set()
# <class 'set'>
source: set.py

You can use set() to remove duplicate elements from a list or tuple, but the original order is not preserved.

Use list() and tuple() to convert a set to a list or tuple.

l = [2, 2, 3, 1, 3, 4]

l_unique = list(set(l))
print(l_unique)
# [1, 2, 3, 4]
source: set.py

For removing duplicate elements in the original order, or extracting only duplicate elements, see the following article.

Set comprehensions

There is set comprehensions as well as list comprehensions. Use curly brackets {} instead of square brackets [].

s = {i**2 for i in range(5)}

print(s)
# {0, 1, 4, 9, 16}
source: set.py

See the following article for more information on list comprehensions.

Get the number of elements in the set: len()

The number of elements of the set can be obtained with the built-in function len().

s = {1, 2, 2, 3, 1, 4}

print(s)
print(len(s))
# {1, 2, 3, 4}
# 4
source: set.py

If you want to count the number of occurrences from a list with duplicate elements, see the following article.

Add an element to the set: add()

Use the add() method to add an element to the set.

s = {0, 1, 2}

s.add(3)
print(s)
# {0, 1, 2, 3}
source: set.py

Remove an element from the set: discard(), remove(), pop(), clear()

Use the discard(), remove(), pop(), and clear() methods to remove an element from the set.

The discard() method deletes the element specified by the argument. If a value that does not exist in the set is specified, no action is taken.

s = {0, 1, 2}

s.discard(1)
print(s)
# {0, 2}

s = {0, 1, 2}

s.discard(10)
print(s)
# {0, 1, 2}
source: set.py

The remove() method also removes the element specified by the argument, but it raises an error KeyError if a value that does not exist in the set is specified.

s = {0, 1, 2}

s.remove(1)
print(s)
# {0, 2}

# s = {0, 1, 2}

# s.remove(10)
# KeyError: 10
source: set.py

The pop() method removes an element from the set and returns its value. You cannot choose which values to remove. It raises an error KeyError if the set is empty.

s = {2, 1, 0}

v = s.pop()

print(s)
print(v)
# {1, 2}
# 0

s = {2, 1, 0}

print(s.pop())
# 0

print(s.pop())
# 1

print(s.pop())
# 2

# print(s.pop())
# KeyError: 'pop from an empty set'
source: set.py

The clear() method removes all elements from the set and makes it empty.

s = {0, 1, 2}

s.clear()
print(s)
# set()
source: set.py

Union: | operator, union()

The union can be obtained with the | operator or the union() method.

s1 = {0, 1, 2}
s2 = {1, 2, 3}
s3 = {2, 3, 4}

s_union = s1 | s2
print(s_union)
# {0, 1, 2, 3}

s_union = s1.union(s2)
print(s_union)
# {0, 1, 2, 3}
source: set.py

Multiple arguments can be specified for union().

Also, not only set but also lists and tuples which can be converted to set by set() can be specified as arguments. The same applies to the following methods.

s_union = s1.union(s2, s3)
print(s_union)
# {0, 1, 2, 3, 4}

s_union = s1.union(s2, [5, 6, 5, 7, 5])
print(s_union)
# {0, 1, 2, 3, 5, 6, 7}
source: set.py
Sponsored Link

Intersection: & operator, intersection()

The intersection can be obtained with the & operator or the intersection() method.

s_intersection = s1 & s2
print(s_intersection)
# {1, 2}

s_intersection = s1.intersection(s2)
print(s_intersection)
# {1, 2}

s_intersection = s1.intersection(s2, s3)
print(s_intersection)
# {2}
source: set.py

Difference: - operator, difference()

The difference can be obtained with the - operator or the difference() method.

s_difference = s1 - s2
print(s_difference)
# {0}

s_difference = s1.difference(s2)
print(s_difference)
# {0}

s_difference = s1.difference(s2, s3)
print(s_difference)
# {0}
source: set.py

Symmetric difference: ^ operator, symmetric_difference()

The symmetric difference can be obtained with the ^ operator or symmetric_difference().

s_symmetric_difference = s1 ^ s2
print(s_symmetric_difference)
# {0, 3}

s_symmetric_difference = s1.symmetric_difference(s2)
print(s_symmetric_difference)
# {0, 3}
source: set.py

Test if A is a subset of B: <= operator, issubset()

To test whether A is a subset of B, i.e., whether all elements of A are contained in B, use the <= operator or the issubset() method.

s1 = {0, 1}
s2 = {0, 1, 2, 3}

print(s1 <= s2)
# True

print(s1.issubset(s2))
# True
source: set.py

Both the <= operator and the issubset() method return True for equivalent sets.

To test if a set is a proper subset, use the < operator, which returns False for equivalent sets.

print(s1 <= s1)
# True

print(s1.issubset(s1))
# True

print(s1 < s1)
# False
source: set.py

Test if A is a superset of B: >= operator, issuperset()

To test whether A is a superset of B, i.e., whether all elements of B are contained in A, use the >= operator or issuperset().

s1 = {0, 1}
s2 = {0, 1, 2, 3}

print(s2 >= s1)
# True

print(s2.issuperset(s1))
# True
source: set.py

Both the >= operator and the issuperset() method return True for equivalent sets.

To test if a set is a proper superset, use the > operator, which returns False for equivalent sets.

print(s1 >= s1)
# True

print(s1.issuperset(s1))
# True

print(s1 > s1)
# False
source: set.py

Test if A and B are disjoint: isdisjoint()

To test whether A and B are disjoint, i.e., whether A and B have no common elements, use the isdisjoint() method.

s1 = {0, 1}
s2 = {1, 2}
s3 = {2, 3}

print(s1.isdisjoint(s2))
# False

print(s1.isdisjoint(s3))
# True
source: set.py
Sponsored Link
Share

Related Categories

Related Articles