note.nkmk.me

Get the length of a string (number of characters) in Python

Posted: 2021-09-05 / Tags: Python, String

To get the length of a string str (= number of characters) in Python, use the built-in function len() as well as to get the number of elements in a list.

This article describes the following contents.

  • Get the length of a string (number of characters) with len()
  • Full-width and half-width characters
  • Escape sequences and special characters
  • Line breaks

See the following article for the usage of len() for other types.

Sponsored Link

Get the length of a string (number of characters) with len()

By passing a string to the built-in function len(), its length (number of characters) is returned as an integer value.

s = 'abcde'

print(len(s))
# 5
source: str_len.py

Full-width and half-width characters

Both full-width and half-width characters are treated as one character (length: 1).

s = 'あいうえお'

print(len(s))
# 5

s = 'abcdeあいうえお'

print(len(s))
# 10
source: str_len.py
Sponsored Link

Escape sequences and special characters

In Python, special characters such as TAB are represented with a backslash, like \t. The backslash itself is represented by \\.

These special characters such as \t and \\ are treated as a single character.

s = 'a\tb\\c'
print(s)
# a b\c

print(len(s))
# 5
source: str_len.py

In raw strings where escape sequences are not treated specially, the string is treated as it is, without being interpreted as special characters. The number of characters is also counted as is.

s = r'a\tb\\c'
print(s)
# a\tb\\c

print(len(s))
# 7
source: str_len.py

Also, the Unicode escape sequence \uXXXX is treated as a single character.

s = '\u3042\u3044\u3046'
print(s)
# あいう

print(len(s))
# 3
source: str_len.py

Unicode escape sequences are also not treated specially in raw strings.

s = r'\u3042\u3044\u3046'
print(s)
# \u3042\u3044\u3046

print(len(s))
# 18
source: str_len.py

Line breaks

\n (LF: Line Feed) is also treated as a single character.

s = 'a\nb'
print(s)
# a
# b

print(len(s))
# 3
source: str_len.py

Note that if \r\n (CR: Carriage Return + LF: Line Feed) is used, it is counted as two characters, \r and \n.

s = 'a\r\nb'
print(s)
# a
# b

print(len(s))
# 4
source: str_len.py

If \n and \r\n are mixed, the number of characters in each newline section is different.

s = 'abc\nabcd\r\nab'
print(s)
# abc
# abcd
# ab

print(len(s))
# 12
source: str_len.py

If \n and \r\n are mixed, or if you don't know which one is used, use the splitlines() method, which returns a list split by lines.

print(s.splitlines())
# ['abc', 'abcd', 'ab']
source: str_len.py

The number of elements in the list retrieved with splitlines() is equal to the number of lines.

print(len(s.splitlines()))
# 3
source: str_len.py

The number of characters in each line can be obtained using list comprehensions.

print([len(line) for line in s.splitlines()])
# [3, 4, 2]
source: str_len.py

The total number of characters can be calculated with sum().

A generator version of the list comprehension (generator expression) is used here. Generator expressions are enclosed in () instead of [], but when they are used within () as in this example, () can be omitted.

print(sum(len(line) for line in s.splitlines()))
# 9
source: str_len.py

For more information about line breaks, see the following article.

Sponsored Link
Share

Related Categories

Related Articles