Wednesday, November 20, 2013

Rollback Recovery for Non-Deterministic failed events for web services hosted in Axis2

Rollback techniques in message passing systems have been one of the primary areas in message recovering techniques.One significant area where Rollback recovery has played a crucial role is in the area of web services. Web services are now most popular and widely used mechanism in enterprise business, telecommunication, banking world.In many cases, catastrophic failure is a result of network failures,transport errors and web services are down. If we consider underlying transport is stable and server is stable, we could consider the failures in mainly two categories,which are Deterministic failures and Non-deterministic failures. Recovering from Deterministic failures are much more simpler than Non-deterministic events, since services are aware about the errors. But recovering from Non-deterministic events failure is a challenging part, because it is non determined event failure to the service point of view.
                   System downtime  makes loss to an Enterprise and causes other organizational issues. Over the years, there has been a great amount of effort to improve the recovery techniques in messaging systems. With the increase of recovery techniques, IT systems are able to reduce the cost which occurs due to system errors. Failures due to downtime or incorrect results in an end to end system might have considerable monetary penalties. Failures can occur in several levels and can be categorized in different aspects. For example, client side failures, server side failures, transport level failures, processing time errors
                    From the service level, we could categorize the failures based on two factors, which are known failures to that service and unknown failures to that service. Further we could say, that known failures occur when the service sending out messages. Those can be called as Deterministic events for that service. Unknown failures can occur when the service receives requests and that particular service is not in a state to process those requests. Service is not aware, when the request will be arrived. So, if the service is down when client sends request, service won't be able to receive them. Such errors are called Non-deterministic errors to that service.
                   In case of failures, web services can track , recover failures from Deterministic events easily, since those failures are known to web services. Because, in the Deterministic event errors, service is up and running. But when a failure occurs in Non-deterministic events, web services can not track them or unaware about those failures. Such Non-deterministic errors occur when service is down. That is, when service is not up and running, service can not receive requests. So, those requests will be lost. In this case, we make an assumption that, requests are lost because of the service failure.Those failure events are currently omitted under category , such ,”Requests made to the service within the period of service down time”
                For Non-deterministic events, we can apply the Rollback recovery techniques to Rollback the system/service to a consistent state and can replay the failed events. In case of Non-deterministic events, if we can record all Non-deterministic events happen to a web service we can recreate lost state by replaying those events from a known initial condition. The reconstructed consistent state is not necessarily one that has occurred before the failure. It is sufficient that the reconstructed state be one that could have occurred before the failure in a failure-free, correct execution, provided that it be consistent with the interactions that the system had with the outside world.[1]
                       As a solution for Rollback-Recovery for Non-deterministic events failure in web services hosted in Axis2 , Log-based recovery mechanism contains more advantages than Check-point based recovery technique.

Figure:In a failure situation how log based system performs

Assumptions are, 

  • All Non-deterministic events can be identified and their determinants logged to stable storage 
  • Failures occur after the messages are logged to the stable storage. 
  • Underlying transport channel is reliable. 
  • No orphan processes.
  • Assuming single service domain. 
  • There is no inter process connection in the incoming messages. 
  • Server is reliable. 

A proof of concept design for adding "Log based Roll back Recovery" for Non-deterministic failed events in Axis2.:

           The determinant of each Non-deterministic event is needed to log to a stable storage before the event is allowed to affect the computation. An axis2 message log handler is kept at the Inflow of the message. When clients sends request to a service, before service receive those  requests, this handler receives them and log them to the a stable storage(database). All these incoming requests are non-deterministic events for services, which are hosted in a server.
             At initial phase, all incoming requests are logged to the database. At that time , system does not aware about which will be the success message and which will be the failed message. To differentiate both message another Axis2 handler is placed at the outflow of the Message and identified the Non-deterministic failed events .
         When the service becomes to Active state from InActive state, the Observer Process has to monitor it and needs to check the database for any non-deterministic failure messages for that particular service and replay them. If failed requests are found in the database, those are replayed to the service. To replay the failed Non-deterministic events to service, a Serviceclient is used. The service client will construct message payload back from the saved message and service, operation details are set to the message. Using Service client the failed Non-deterministic message can be replayed back to the service successfully.

[1] E.N. Elnozahy, Lorenzo Alvisi , Yi-Min Wng , David B. Johnson “A Survey of Rollback-
Recovery Protocols in Message-Passing Systems” ACM Computing Surveys (CSUR) ,Vol.
34, Issue 3,pp. 375-408 , sept 2002

No comments:

Post a Comment