I’ve been playing around with SMS PDUs encodings and recently observed some curious things regarding the way that cellphones treat the text messages when they include special characters.First look at these two tables of the GSM alphabet:GSM7 Alphabet

GSM7 Alphabet (special)
The tables above show the GSM Alphabet using a 7-bit encoding which implies a maximum number of 160 characters per text message. However, sometimes this is a little bit more tricky and the count is not so straightforward:
If, for instance, you type a character from the second table, it needs to be escaped with the escape character 1) 0xB1, and it will take two characters instead of one. What happens if your text message contains 160 characters and one of them is a ‘]’?
Simple: You will get charged for two text messages because you’re exceeding the maximum length of a simple PDU.
Usually your cellphone won’t warn you about this and you will send it anyways without knowing the fact that this text message will cost twice than you think. The same happens with the ‘€’ symbol and some other not showing up in the table above.If you have a closer look at the tables, you might realize that the ‘é’ symbol appears but where are ‘á’,’í’,’ó’ and ‘ú’? The GSM alphabet was originally designed by France and they only use ‘é’ so if you want to use accents, there’s no way using this encoding.
I have tested some cellphones and they behave in two different ways:

  1. Removing those characters with accents (except ‘é’) and substituting them by the same character without accent.
  2. Using a 16-bit UNICODE encoding to allow sending every character (in this case, the maximum length of the textmessage is 70 characters).

In case 1, the only ‘side-effect’ is that the recipient of the text message won’t get your accents and you can send up to 160 characters.
In case 2, your cellphone won’t warn you and you might send up to 160 characters thinking that it will take just one text message.However, again, you will be charged for up to 3 text messages without knowing it! The recipient will get a multi-part message showing all the characters you sent with no modification.
Here you can see the decoding of a PDU (using PDUSpy) of a text message sent with accents and encoded using UCS2:

  • PROTOCOL IDENTIFIER (0x00)
  • MESSAGE ENTITIES : SME-to-SME
  • PROTOCOL USED : Implicit / SC-specific
  • DATA CODING SCHEME (0x08)
  • AUTO-DELETION : OFF
  • COMPRESSION : OFF
  • MESSAGE CLASS : NONE
  • ALPHABET USED : 16bit UCS2

If the accents are removed from the original text message, the cellphone will automatically use the GSM7 alphabet and you will be allowed to send up to 160 characters in just one PDU (you will get charged once).All in all, be careful and if possible make some research to figure out what your cellphone does and check it against your bill because you will probably save some (or a lot of) money.
Cheers,
D.