Convert Between Unicode Code Point and Character: chr, ord

Modified: 2024-01-31 | Tags: Python, String

In Python, the built-in chr() and ord() functions allow you to convert between Unicode code points and characters.

Additionally, in string literals, characters can be represented by their hexadecimal Unicode code points using \x, \u, or \U.

Contents

Convert characters to Unicode code points: ord()
Convert Unicode code points to characters: chr()
Use Unicode code points in string literals: \x, \u, \U

You can check the correspondence between Unicode code points and characters on the Unicode Consortium's page below.

Unicode Utilities: Character Properties

Convert characters to Unicode code points: `ord()`

ord() returns the Unicode code point as an integer (int) when given a single-character string.

Built-in Functions - ord() — Python 3.12.1 documentation

i = ord('A')
print(i)
# 65

print(type(i))
# <class 'int'>

source: chr_ord.py

Specifying a string of two or more characters will result in an error.

# ord('abc')
# TypeError: ord() expected a character, but string of length 3 found

source: chr_ord.py

Unicode code points are often represented in hexadecimal notation. To convert an integer to a hexadecimal string, use the built-in hex() function.

s = hex(i)
print(s)
# 0x41

print(type(s))
# <class 'str'>

source: chr_ord.py

The built-in format() function can be used for more detailed formatting, such as zero-padding and including or excluding 0x.

Format strings and numbers with format() in Python

print(format(i, '04x'))
# 0041

print(format(i, '#06x'))
# 0x0041

source: chr_ord.py

In summary, the hexadecimal Unicode code point for a specific character can be obtained as follows.

print(format(ord('X'), '#08x'))
# 0x000058

print(format(ord('💯'), '#08x'))
# 0x01f4af

source: chr_ord.py

Some emojis, including flags, are represented by multiple Unicode code points, known as emoji sequences.

Unicode Utilities: Character Properties - 🇯🇵

As of Python 3.11.7, ord() cannot process emoji sequences, resulting in an error. Also, using the built-in len() function to check the length of these emojis will return the number of Unicode code points.

# ord('🇯🇵')
# TypeError: ord() expected a character, but string of length 2 found

print(len('🇯🇵'))
# 2

source: chr_ord.py

Convert Unicode code points to characters: `chr()`

chr() converts a specified integer (int) to its corresponding Unicode character string (str).

Built-in Functions - chr() — Python 3.12.1 documentation

print(chr(65))
# A

print(type(chr(65)))
# <class 'str'>

source: chr_ord.py

Since integers can be written in hexadecimal by prefixing them with 0x in Python, you can directly specify a hexadecimal Unicode code point as an argument in chr(), regardless of zero-padding.

print(65 == 0x41)
# True

print(chr(0x41))
# A

print(chr(0x000041))
# A

source: chr_ord.py

To convert a hexadecimal string to a Unicode character, use int() to convert the string to an integer with base 16, and then pass it to chr().

s = '0x0041'

print(int(s, 16))
# 65

print(chr(int(s, 16)))
# A

source: chr_ord.py

When converting a hexadecimal string to an integer using int(), the second argument can be 0 if the string is prefixed with 0x. For more details on handling hexadecimal numbers and strings, refer to the following article.

Convert binary, octal, decimal, and hexadecimal in Python

Unicode code points are often represented as U+XXXX. To convert such a string to its corresponding character, use slicing to extract the numeric part.

How to slice a list, string, tuple in Python

s = 'U+0041'

print(s[2:])
# 0041

print(chr(int(s[2:], 16)))
# A

source: chr_ord.py

Use Unicode code points in string literals: `\x`, `\u`, `\U`

In string literals, characters can be represented by their hexadecimal Unicode code points using \x, \u, or \U.

Unicode HOWTO - Python’s Unicode Support — Python 3.12.1 documentation

The sequences \xXX, \uXXXX, and \UXXXXXXXX require 2, 4, and 8 hexadecimal digits respectively. An error occurs if the number of digits is incorrect.

print('\x41')
# A

print('\u0041')
# A

print('\U00000041')
# A

print('\U0001f4af')
# 💯

# print('\u041')
# SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-4: truncated \uXXXX escape

# print('\U0000041')
# SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape

source: chr_ord.py

Each code is treated as one character.

print('\u0041\u0042\u0043')
# ABC

print(len('\u0041\u0042\u0043'))
# 3

source: chr_ord.py

In raw strings, where escape sequences are ignored, these sequences are treated as regular text.

Raw strings in Python

print(r'\u0041\u0042\u0043')
# \u0041\u0042\u0043

print(len(r'\u0041\u0042\u0043'))
# 18

source: chr_ord.py

Convert Between Unicode Code Point and Character: chr, ord

Convert characters to Unicode code points: `ord()`

Convert Unicode code points to characters: `chr()`

Use Unicode code points in string literals: `\x`, `\u`, `\U`

Related Categories

Related Articles

Convert Between Unicode Code Point and Character: chr, ord

Convert characters to Unicode code points: ord()

Convert Unicode code points to characters: chr()

Use Unicode code points in string literals: \x, \u, \U

Related Categories

Related Articles

Convert characters to Unicode code points: `ord()`

Convert Unicode code points to characters: `chr()`

Use Unicode code points in string literals: `\x`, `\u`, `\U`