Character Encoding
RIPE Database
The format and content of an RPSL attribute value is determined by the syntax definition for that attribute within that object type.
Most RPSL attributes in the RIPE database only accept valid characters from the ASCII character set.
Some RPSL attributes accept valid characters from the Latin-1 (ISO/IEC 8859-1) character set including extended ASCII characters, including:
- address
- reg-nr (Registration Number)
Some RPSL attributes allow valid Unicode code points using the UTF-8 encoding, including:
- descr
- remarks
Normalisation
Normalisation of ASCII characters is performed as follows:
- Control characters (apart from tab, linefeed, carriage return) are replaced with a '?' character
Normalisation of extended ASCII (as in Latin-1 encoding) is performed as follows:
- Extended control characters are replaced with a '?' character
- A non-break space (NBSP) is replaced with a normal space (SP) character
- A soft hyphen (SHY) is replaced with a normal hypen (-) character
Only Unicode code points defined in IDNA 2008, Unicode IDNA Compatability Processing UTS#46 are valid. Any other Unicode code points are replaced with a '?' character.
Normalisation of UTF-8 code points is performed using Normalization Form Canonical Composition (NFC).
Interfaces
The Latin-1 character encoding is used by default on the following interfaces:
- Whois (port 43)
- The default can be changed using the charset query flag (-Z / --charset)
- NRTMv3 (port 4444)
- Daily database dump and split files (.gz extension)
If a character is not supported by an interface, it is replaced with a '?' character.
UTF-8 encoding is used by default on the following interfaces:
- DB web application
- Whois REST API
- RDAP
- NRTMv4
- Syncupdates
- Daily database dump and split files (.utf8.gz extension)
Internationalised Domains in E-mail Addresses
Internationalised Domain Names (IDN) in any attributes containing an email address are automatically converted to Punycode (see RFC 3492) according to NWI-11. Punycode is a representation of Unicode using ASCII encoding, which is compatible with Whois (port 43), and is also compatible with mail clients and servers.
This includes the following attributes:
- abuse-mailbox
- irt-nfy
- mnt-nfy
- notify
- ref-nfy
- upd-to