On Erlang and Strings…
May 11th, 2007I said in a recent comment that the way Erlang handles strings is wonkey, and I would like to clarify. Take [97, 97] and "aa". My brains says these are two different values, but in erlang they are the same. Type them into an erl shell and what the result of both be “aa”.
This works because it is internally consistent, and as the developer you can easily reason whether a given list of bytes is actually a string of characters. It works because in a closed erlang system, none of the code cares whether it is a string of characters or a list of bytes, it just works.
The problem occurs when you break out erlang and communicate with a language that discerns between the two. In ruby, [?a, ?a] is different than "aa". Now here is the tricky part, Erlang can encode a List in one of 3 ways:
- NIL_EXT: a single byte denoting an empty list
[] - STRING_EXT: lists are encoded to this when they are less that 65535 elements in length and only contain SMALL_INT values(0-255).
- LIST_EXT: any other list of elements
This info can be found in the OTP source distribution in the file “erts/internal_doc/erl_ext_dist.txt”. With these semantics, [1,2,3] would be encoded as a string, and it would be totally surprising to a Ruby programmer to have that value pop out as '\001\002\003' on the ruby side. So, it’s a bit of a juggling act, and with Erlectricity I decided to take a consistent, simple road. Any erlang list, regardless of how it is encoded by erlang, gets spit out as an Array on the ruby side. Likewise, any binary value in erlang will come out as a string on the ruby side. This makes for easy use of unpack if you need to extract value, but if you are wanting a string, nothing else is needed.
This situation is less than ideal, and is one of the confusing pieces that I would like to eliminate. To do this, a framework to ease conversion and guide understanding on the Erlang side is needed. Good thing that such a framework was already planned! By putting a thin veneer over the Erlang ports, we can build upon the erlang distribution format (or build away from if needed) to find a happy equilibrium between the two differing type systems. Stay tuned for more discussion on how I approach a solution to the problem!











