Service Discovery and Mesh

Service Discovery and Mesh

In the wake of microservices, it's hard to maintain how services communicate with each other.

Microservices maintain config files to know how to talk to other services. This is an error prone approach as config file is maintained by humans. The most common solutions to tackle this, are either DNS based and/or load balancers.

  • Load balancers add cost, latency and introduce a single point of failure. They've to be updated everytime we add/remove servers.

  • DNS introduces latency for updating the changes and relies on client to refresh the records. DNS aims for eventual consistency which isn't optimal in micro-services communication.

With cloud, we've the following requirements to optimise our inter-service communication.

  • We want to update our list of computers running a certain service dynamically.
  • We don't want to have single points of failures, nor latency.
  • If there's a failure in service, other services should be informed.

Service discovery

Service discovery solves this by establishing a service discovery protocol (SDP) on network layer. In short, the protocol allows the following:

  • When a new service comes up, it should auto-register itself to the registry with location (IP, port).
  • All services should be able to find how to reach any other service.
  • Constant health check on all services, and deregister if a service goes offline.

Service mesh

While the above addresses how services communicate to each other, given the dynamic nature of which services are running on which computers..a new problem arises. How do we secure?

Traditionally, these are acheived by firewall rules, proxy configurations and other perimeter based network approaches. They change too frequently in a cloud environment.

Service Mesh solves this by securing the service itself with help of service control policies (SCPs). These policies work above network layer and hence enables a dedicated communication layer.

Service Mesh further enhances monitoring, observability into what's happening and rules related to max retries, backoff, throttling, rate limiting, ssl termination, etc.

How to implement:

Consul addresses both the above use cases, with quite exhaustive features.