Archive for February, 2008

ThoughtBlog: More on XMPP Apps. Writing-into-the-app edition - 2/21

Thursday, February 21st, 2008

Yesterday I went to sleep having talked about one method I propose of how to write into a data store that is federated using xmpp. Today I’ll continue the trend talking about other methods that jump out of mind.

using PubSub

XMPP has Publish-Subscribe functionality layered on top of itself (Relevant proposal). By itself pubsub doesn’t give us write scalability, but rather provides the extra layer of indirection between app and data node that will let us get tricky.

Similar to how we segment writes among data nodes in the direct messaging model, we would segment writes into the system into a number of topics into which messages would be published. We could slice things along several lines: One example would be along class (all Topics get published to /topics, Products to /products). Taking this approach we are bound by the choice of using class names for topics: if the write needs of a single class exceeds the capabilities of a single box we are stuck.

Like I mentioned before, I would probably want to run experiments on each method to determine the best course for partitioning: My intuition says that the appropriate choice would be different for each write pattern exposed throughout the system.

The benefit is that behind each topic would be any number of data nodes waiting to write the published messages into the system. With that, we could get quicker redundancy by having each redundant node subscribe to the appropriate topic directly, rather than relying on replication to push around changes.

Fault tolerance could be achieved in the same manner as with the system that directly messages data nodes, but we can do something even neater. Since we can forgo replication and have each redundant data node receive write messages by having each subscribe to the same topic, we can have a missed message or failed write be requested from a peer data node instead of relying on repititive messaging from the application node. Not a silver bullet (every data node in the topic could fail to write the message successfully), but leveraging such a method could reduce the ‘length’ that a given data node would need to reach out to get a resend. Whether that is helpful or not for the system is beyond me. Now that I think about it, I just described replication…

Next time i’ll get into my ideas for using Mult-user chat (MUC) to achieve scalability, fault-tolerance, and redundancy.

ThoughtBlog: More on XMPP and Clustering - 2/20

Wednesday, February 20th, 2008

Crazy goals and dreams

  • Horizontally scalable at the application server. The system works with 1 mongrel as well as 1000. Since shared-nothing lets us do this very easily, maintaining or improving this area basically boils to down can the XMPP server handle the same or a greater number of active connections than a mysql server. I think GSFN will be quite gigantic when we run into a bottleneck at this level with either choice.
  • Horizontally scalable reads from and queries of, the data store. I should be able to add boxes into the mix and have the number of users that can access the store simultaneously grow accordingly.
  • Horizontally scalable writes into the data store. As above.
  • Fault tolerance. build the system such the failures are seen by the end user locally: The whole site only goes down if every node in the system goes down. Pieces of the system can wink in and out of existence as failures happen, but the system should be able to recover from such faults.
  • Redundant (which is, of course, different from fault tolerant). Given a node whose functionality/data is redundantly maintained across several nodes in the system, any single node should be able to fail and it should at most affect a single user request. If we make idempotent as many of the system’s functions as possible, we can get this easily: just re-run the request.
  • Easily-commandable. I should be able to shrink/grow the machines in the system, assess system health (and act accordingly), and quickly query data from one place.

Now, achieving these goals can happen through any number of paths, but for the purposes of this exercise I’ll think through how I use XMPP as a tool to achieve these goals.

Crazy assumptions and assertions of how XMPP will relate into the system

So, lets list out things to which I am assuming I want this system to be. To be added to, changed, revised, and scrapped in the future.

  • Data is sharded amongst N data nodes and User requests are served by M application nodes. Each of the nodes are represented in the system as a Jabber client.
  • Nodes join/leave the system dynamically at runtime, and use presence notifications to broadcast health and availability to the general system. Other nodes that desire to know the health/availability of a given node will subscribe to the presence of the given node (add to contacts list)
  • All Modifications to the data store (insert, update, delete) happen by sending a XMPP message. This can happen in one of several different ways, but each method starts with an application node forming an XMPP message and sending it off into the jabber server.

A Choice of Jabber Server

ejabberd. Given that I’m a fan of, and can hack on Erlang it seems a sane choice. Efforts for 2.0 (in release candidate ATM) has apparently have been centered around re-architecture with an eye towards scalability. DJabberd was another possibility, but apparently it hasn’t seen much development over the past 6 months.

A Choice of Development Languages and Jabber Libraries

JRuby and Smack. Rather, JRuby because of Smack. The graphical debugger bundled with Smack is fucking sexy.

hotness

I can see that being a major boon when bumbling my way through this implementation. Not to mention it seems more mature than XMPP4R, as well as having a better license (Apache for Smack, GPL for XMPP4R).

Also, JRuby offers a number of benefits that are very appealing: Having access to the entire body of libraries available in Java, and having a community process that I feel I can participate in opens the possibilities for me to contribute to jruby if I run into any problems. Just to name a few.

Ideas for performing writes

We need to partition our objects amongst the various data nodes. Certain classes within the Get Satisfaction class create objects that are the roots of a tree of objects. We can partition upon these objects, distributing these objects across the entire data store. When one root object refers to another root object we store it as a weak reference through which we can load the referenced object from its own node.

There are various ways to actually partition these roots, and from what I’ve seen you really want to choose this algorithm based on experimental results by testing your application: each choice has tradeoffs. For now, I’m going to assume the system will employ some continuum-based consistent hashing algorithm (like libketama or the hashing strategies described in The Amazon Dynamo paper) from which a location to write data is chosen.

So, writes into the system need to be scalable, fault-tolerant, and redundant. Let’s think about how we can leverage XMPP to achieve this.

directly messaging nodes

When an object is to be persisted:

  1. The ‘owning’ node is calculated using the hashing algorithm.
  2. A message is sent directly to that node
  3. There is no step 3.

With this message we achieve a measure of scalability easily: A given data node could exists anywhere in the federated XMPP network. We can handle as many data nodes as our Jabber network can handle connected clients, and we can handle as many writes as our Jabber network can handle messages.

It’s worth noting that the scalability I’m talking about above is when dealing with reads/writes for multiple objects. Partitioning doesn’t give us scalability when one object is getting hammered: If a object only exists on one node you can only handle as many writes to that one object as the box it lives on can handle. To crack that nut we would need some (more) complicated system like what Amazon’s Dynamo employs to correct conflicts when writes across two nodes clash.

Fault-tolerance is a bit tougher (but costs the same across any of the possible methods I’ve though of so far). Since a message sent in the system is not guaranteed to be responded to, the application node requesting the write would need to check back to ensure that a write was successful. The simplest, but not the best, method is to query the data store to see if the object has been stored successfully. If not, retry. We could also leverage the event notifications extension to XMPP to request a message back from the data node when it has successfully received a message. We post the message, start a wait for the delivery notification with a timeout (if the timeout expires, we retry the write).

Since messaging a data node directly doesn’t have any redundancy built into it, we need to build some in on the backside. We could, for example, have a data node replicate writes around to its peers. Such a system would require that each application node could figure out where those replicants exist in the cloud.

Okay, that’s enough for today. Tomorrow we continue with this line of thinking, exploring more possibilities for writing into the store.

ThoughtBlog: On XMPP and building distributed applications - 2/19

Tuesday, February 19th, 2008

So, i’ve been spending a lot of time thinking, prototyping, and just generally mucking around with what a federated Get Satisfaction would look like. I’ve got a Curio document bursting at the seems with possibilities, problems, questions, etc. and felt like it might help the process to try and put this exploration into prose. In general these ThoughtBlog posts will be out-loud wonderings… feel free to comment and contribute, most of the stuff I’ll by writing about will be musings and questions (to myself and to the world).

I would expect that these entries will end and start at odd places… don’t expect any grand conclusions :)

So, I’ve got it stuck in my head that I want XMPP to be the glue that binds a sharded Get Satisfaction. Whether it’s the right choice or just mental masturbation, I don’t know yet… but I’m having a blast.

So why XMPP?

Besides the enthusiasm that has built around Jabber over the past couple years, when looking at it objectively, there seems to be many parallels to how I think of clusters and what IM is. One of the reasons, in my mind, that Erlang is so successful at building distributed systems is that it makes very few guarantees with the messaging system: As an actor, the only way you can be assured that a message has been delivered is if the recipient sends a message back (more specific details at [http://www.erlang.org/faq/faq.html#AEN1189]).

Now, it may be that XMPP provides a more lossy delivery mechanism than what erlang messaging does, but hopefully it maintains the same general idea as erlang: Delivery is guaranteed as long as nothing breaks. When things break, we should be able to detect those from a system outside of core messaging system (Erlang uses the linked processes construct). I still haven’t figured out how best such a system would be implemented using XMPP.

Another appealing reason is toolchain support. I imagine having a XMPP MUC (Multi-user chat, think IRC) room in which resides every node in the Get Satisfaction system. From that room I can command the entire cluster (reboot, report statistics, re-balance?). Starting a chat with data-1@getsatisfaction.lan/console would pull up an IRB session over XMPP. Now, you might say “Why not SSH? it works already dumbass”, which is a fair point. The wow-factor for me is that the nodes in the cluster now have the ability to contact me directly. It’s one thing to log into the server and check to see that mongrels are running, it’s a whole other thing to have a server tell you it lost a mongrel process. Now, obviously you should have a monitoring system in place to notify you when a server goes down, but it is very appealing to me aggregate all of those various webapps, emails, and console sessions into a single mechanism.

One day, I think it would really fucking cool if Get Satisfaction API provided two faces that worked in concert: A RESTful HTTP service like we have now for pulling from the data store, and an XMPP service that lets you receive push notifications of actions as they happen in the system. I think that would provide for a pretty elegant and complete set of methods to interact with GS data.

What not to use XMPP for

Among the exercises I’m working through, I think it’s important to weigh each decision against what is currently available. By remembering to augment decisions with the consideration, I hope to avoid the problem of getting caught up in my own hype and making a poor decision based on my excitement. To avoid the “When all you have is a hammer, everything looks like a nail” problem.

One specific case so far, is with regards to data-retrieval and querying. As you may or may not know, XMPP defines an IQ type message that is an analog to HTTP: You send an IQ stanza and you expect a response.

For querying the data store, using HTTP seems much nicer and is something I’m much more familiar with. I’m specifically talking here just about the GET & HEAD verbs. Stateless-ness, idempotence, ubiquity. There isn’t really any benefit to using IQ stanzas to re-implement what would be done today in HTTP.

The problems so far

Getting my head wrapped around what is available in the XMPP RFCs as well as all of the various extensions is surely quite an undertaking. I always run into this problem. When presented with a hundred different ways to solve a problem, I have a super hard time choosing one. I usually try to break things down to make the decision easier, but usually I get over things by jumping in feet first.

Part of the issues I find is that it seems that all of the people out there that have built applications on top of XMPP are tight-lipped; I spent a good deal of time wading through google to find a “lessons learned” type post. This has been done before, right? I’m hoping that one of the side-effects of these musings is that some guru comes out of the woodwork and learns it to me good.