17 Jun 2009

Udi Dahan Reliability, Availability, and Scalability - How to have your cake, and eat it too

Reliability - How to manage this in a SOA world?

Reliability for the system as a whole is the key to keep smiling.

The core for Adaptability, Reliability and Scalability: Use one way messaging

Reliability
- Don't lose data
- Causing global inconsistency

Scenarios
- App server goes down
- Database goes down
- Database deadlocks (most common)

How does messaging help in not loosing data by accident?
- Introduce a queue before processing
- Include the queue in the transaction
- The rollback (or exception) is detected and rolls back the message on the queue

Calling external webservices
- WS don't roll back (unless you apply WS Atomic Transactions)
- Introduce a messaging gateway between your app and the external web service. Part of the transactions.

Availability
Is it just a question of nines?? (99,99999...)
Response times is also an issue in this context

Command/Response pattern
- What does it mean of I did not get a response?

How does messaging help:
- Moving responsibilities
1. Accept the command - put it in the queue - respond with "I have received the request and will get back to you"
2. Process the request from the queue and respond on a different channel

This introduces changes to the contracts (in WCF : CallbackContracts)

Messaging over HTTP
- Request message - respond with a GUID and a "expected response time"
- Client ask: Is this GUID finished? (if no, repeat as above, if yes, return response)

Query/response interaction
Get response right away

Move to Pub/Sub - We will tell you when a significant change has occured. Allows for caching..

Scalability

Putting data in a queue is a scalable and "simple" approach
- More http servers to put messages in the queue
- More servers to manage queues
- More servers to process messages (pulling) from the queues and provide the responses (on different channels)

This approach allows for different SLA's for different customers/orders. Use different scaling mechanisms.



Issue: Scaling to large messages (e.g. 50MB sized messages)

Solution: Represent the big message as a set of smaller messages
Big messages originate from interactions. Make this protocol explicit. How???

No comments: