Services Runtime Security — Part 4

number40
11 min readOct 2, 2023

An Alternate Approach

Thanks for reading the previous installment in this series (https://number40.medium.com/services-runtime-security-part-3-8b9641753e31). Our initial plunge into establishing robust services runtime security measures has been commendable, though not without its areas for improvement. At a technical level, our approach appears sound. We’ve successfully implemented a multi-layered defense strategy, aptly termed ‘Defense in Depth’. This ensures that, even if one security mechanism falters, others remain active to thwart potential breaches. We’ve also established authorization protocols, ensuring that every entity seeking resource access is thoroughly vetted. Furthermore, our dedication to safeguarding service-to-service communications is evident in our encryption measures, and our vigilant traffic analysis continually scans for anomalies, highlighting our proactive stance on security.

However, as with most initial implementations, ours isn’t devoid of challenges. A pressing concern is the introduction of new dependencies during our security setup. Such dependencies can become potential bottlenecks or even security vulnerabilities if one component were to fail or be compromised. Additionally, the complexity introduced, due to the tooling and the intertwining of multiple security measures, has made management, troubleshooting, and updates a more intricate task. Although we can achieve authorization between services in the API Gateway model traffic must leave Kubernetes and all authorization checks become North-South. This limits some of the at scale advantages of Kubernetes where our scenarios under load may require true East-West communication between services. Our development teams have also expressed that they would prefer to have the security either embedded or as close to the services as possible. They have also pointed out that not all of our web services are fronted by an API gateway today. They are willing to make this change and have inquired about ways to put authorization in place without changing these older services.

Given these insights, we need to consider a re-evaluation. This isn’t to suggest an overhaul of the areas we secure, but rather a critical examination of the tools and solutions we’ve employed. Could there be alternatives in the market that promise equivalent or enhanced security with a more streamlined approach? As we delve into this re-evaluation, it’s also imperative to introduce new requirements. These would be reflective of our learnings, feedback from our teams, and the ever-evolving landscape of cloud security best practices.

Additional Requirements

  1. The solution must support both development and operational teams at their respective stages in the cloud journey.
  2. The selected security tooling solution must minimize dependencies and, where possible, implement security transparently.

Services Authorization

Let’s start with services authorization and introduce Open Policy Agent!

Open Policy Agent (OPA) is an open-source, general-purpose policy engine that unifies policy enforcement across the technology stack. It is a Cloud Native Computing Foundation (CNCF) incubating project designed to enable policy-based control and governance for various systems like Kubernetes, microservices, API management, and other cloud-native environments.

OPA provides a high-level declarative language called Rego that allows users to write policies as code. Rego is designed to express complex policies that can be easily understood and maintained by both humans and machines. OPA policies can be used to enforce fine-grained access control, security, and compliance rules across your infrastructure and applications.

OPA is highly extensible and can be integrated with various platforms and tools to achieve consistent policy enforcement. Let’s take a look at some of the features we get by using OPA.

To learn more about OPA and get started, you can visit the official documentation at https://www.openpolicyagent.org/docs/latest/.

Declarative Policy Language (Rego)

OPA utilizes a high-level declarative language called Rego to express policies. Rego allows policy authors to write concise, expressive, and easy-to-understand policies without having to deal with the intricacies of imperative languages. This results in reduced complexity and improved maintainability of policy code helping our development teams with adoption.

Decoupling Policy Decision-making

OPA decouples policy decision-making from policy enforcement, allowing policies to be managed and updated independently from the services they govern. This separation of concerns simplifies policy management, reduces the risk of introducing errors, and enables a more agile policy lifecycle.

Extensible and Integratable

OPA is designed to be easily integrated into a wide range of systems, including APIs, microservices, CI/CD pipelines, and infrastructure management tools. This flexibility makes OPA a versatile policy engine that can be adopted by multiple teams in our organization and our various technology stacks.

Scalable and Performant

OPA is built to scale horizontally and vertically, providing high performance even in large-scale environments. It can handle thousands of policy evaluations per second with low latency, making it suitable for our organization’s demanding workloads.

Policy-as-Code

OPA treats policies as code, which means they can be version-controlled, tested, and deployed using the same processes and tools as application code. This enables our organization to apply best practices like continuous integration, continuous deployment, and code reviews to policy management workflows.

So this is just an authorization microservice? Yep. Pretty cool huh.

Advantages

OPA is designed with microservices architectures in mind, allowing easy integration into existing microservices environments without tight coupling. OPA can operate decentralized, providing local policy enforcement at each microservice instance, reducing the need for a central, monolithic authorization service which can become a bottleneck. While policy enforcement can be decentralized, policy management can be centralized, ensuring consistent policy application across all services. This can be accomplished through the use of the Styra DAS SaaS platform. With policies defined in Styra DAS and written in Rego, it’s straightforward to audit and review the policies across all services, ensuring compliance and security standards are met. By offloading authorization to OPA, our service developers can focus on business logic, maintaining the microservice principle of single responsibility.

Disadvantages

Introducing any external authorization check introduces potential latency. While OPA is fast, network hops between services and OPA can add up in high-throughput scenarios. Adding another component to your architecture means additional monitoring, logging, and alerting configurations to ensure OPA is operating correctly. In distributed systems, failures will happen. Strategies need to be in place for what occurs if OPA cannot be reached. Does the system fail-open or fail-closed?

From an architectural perspective, OPA offers a flexible and scalable solution for policy-based authorization in microservices environments. However, careful design considerations are necessary to avoid potential pitfalls and ensure that the authorization system is both resilient and efficient.

Note: I think one of the biggest advantages of using OPA is the many deployment options. Depending on which one is chosen it make local development extremely simple.

Deployment Options

The BlueMyst security architecture team has evaluated the different deployment options for OPA and we have come up with a preferred ranking. These really break down into two categories: Code Integrated and Non-code Integrated.

Code Integrated

Centralized Server

  • Why: It offers a straightforward approach where multiple applications or services send policy checks to a single OPA instance. No need to adjust current application containers or infrastructure immediately.
  • Description: Deploy OPA as a standalone server, where multiple services/applications send their policy queries to this centralized instance. This is usually accomplished via some form of SDK or a custom library.
  • Use Case: Scenarios where centralizing policy decisions makes sense or where you want to minimize the number of OPA instances.
  • Advantages: Easier to manage and update as there’s a single instance or a few instances.
  • Disadvantages: Becomes a critical point of failure, potential network latency, and could become a bottleneck if not scaled properly.

Embedded

  • Why: This is about integrating OPA directly into your application as a library. While this sounds straightforward, it does require some code-level changes. However, it doesn’t involve changing deployment patterns or infrastructure. This can be accomplished through the use of frameworks such as Dapr.
  • Description: Embed OPA as a library within your application. This means the application directly links to OPA and calls it in the same process space.
  • Use Case: Useful for situations where network overhead is unacceptable or you need the tightest integration possible.
  • Advantages: Low latency, simplified deployment since it’s part of the application.
  • Disadvantages: Requires more tight coupling with the application. Any update to OPA would potentially require redeploying the application.

Serverless

  • Why: If you’re already using serverless infrastructure, deploying OPA as a function can be pretty straightforward. Most cloud platforms offer easy deployment for serverless functions.
  • Description: Deploy OPA as a serverless function in platforms like AWS Lambda, Azure Functions, or Google Cloud Functions.
  • Use Case: Situations where OPA’s workload is event-driven and doesn’t need a continuously running instance.
  • Advantages: Cost-effective for sporadic workloads, scales with demand.
  • Disadvantages: Cold start times can introduce latency, and there are potential limitations based on the serverless platform’s constraints.

Non-code Integrated

Built-in Service Integration

  • Why: The complexity here is variable. Some services might offer easy integrations with OPA, while others might require a deeper understanding of both the service and OPA.
  • Description: Integrate OPA directly with services that have built-in support for it, such as some API gateways or service meshes.
  • Use Case: Enforce policies at the API gateway level or within a service mesh without a separate OPA deployment.
  • Advantages: Takes advantage of native service features for policy enforcement, potentially reducing overhead.
  • Disadvantages: Depends on the level of support and features offered by the integrated service.

Host-level Daemon

  • Why: For teams familiar with host-level operations, deploying a new daemon might be straightforward. However, it might require configuration and management at the host level.
  • Description: Run OPA as a daemon on each node/host in your infrastructure.
  • Use Case: Policy enforcement for host-level operations or for applications running on the host outside of containers.
  • Advantages: Centralized policy enforcement for all applications on a given host.
  • Disadvantages: Requires managing another process on each host, potentially more resource overhead.

Sidecar

  • Why: While sidecar patterns are standard in microservices architectures, it requires changes to deployment configurations and a deeper understanding of container orchestration.
  • Description: Deploy OPA as a sidecar container alongside your service in container orchestration platforms like Kubernetes.
  • Use Case: Microservices architectures, especially in Kubernetes environments.
  • Advantages: Clear separation of concerns, scalable, easier updates without touching the application, and works well with service mesh architectures.
  • Disadvantages: Adds an additional container to manage and potentially more network overhead due to local communication between the application and OPA.

Selection

So which one do we choose? Well, it depends. You would need to analyze the technology and then choose the one that best fits your needs. In our case our group of security architects wanted maximum adoption. They wanted to commit to helping the development teams and they wanted to meet our requirements.

We have chosen to build out a centralized server that contains application specific open policy agents. This will allow us to use multiple approaches of integration. Our teams will be able to use the power of the API Gateway inbound policies to call out to the agents and authorize requests to services that have some older technology and have not been modernized. For our more modernized services they have decided to create an enterprise library that allows them to integrate with the agents via Restful calls. They plan to transition this very soon to an embedded architecture as they plan to use Dapr as an additional cloud application framework.

In our catch all phase of this architecture, OPA will be employed to handle the authorization for API access using a policy-based method. When an access request is made, it will be relayed to OPA, which then assesses the request against its set policy. OPA subsequently issues a decision, generally in a format similar to {“allow”: <true/false>}. This decision instructs whether the call to the primary API should proceed or be halted. API Gateways such as Kong and Azure API Manager have the ability to either directly integrate or call out to OPA. This is a solid approach for our teams that cannot include the enterprise library/SDK in their service. These services are generally not deployed in Kubernetes and have not been modernized. This somewhat eliminates the need for the advantage of true East-West communications as almost all communications end up being North-South anyways. This approach opens the door for us to standardize authorization and acts as a catch all. It is not scary and the actual authorization logic is deployed as a Rego policy which helps us decouple security logic away from the application. Examples: https://www.styra.com/blog/policy-as-code-with-azure-api-management-apim-and-opa/ and https://docs.konghq.com/hub/kong-inc/opa/

Our second method of introducing standardization allows us to use the exact same policies and agents. Again decoupling security away from the application. In languages such as Java, Node.js, and C#, when creating a RESTful API, controllers often handle the incoming HTTP requests. Consider an endpoint like /GET PERSON. When an HTTP request is directed towards this endpoint, it doesn't go straight to the logic that fetches a person's data. Instead, it typically navigates through a series of predefined steps or middleware, each responsible for a different aspect of the request's lifecycle.

First, the request might hit a load balancer or a reverse proxy, which directs the request to the appropriate server instance. Once the request reaches the API server, a routing mechanism determines which controller or method should handle this particular request based on the HTTP verb (e.g., GET, POST, PUT, DELETE) and the URL path.

Before the request reaches the controller’s logic, there can be a series of middleware functions or filters that inspect or modify the request. Middleware is a powerful tool that can be used to implement a variety of functionalities:

  1. Middleware can validate authentication and authorization.
  2. It can document each incoming request for auditing or debugging purposes.
  3. It can ensure that the data coming in with the request (like a POST body) matches the expected format and contains valid information.

Once middleware processes have been executed, the request makes its way to the controller method specifically designated for that endpoint. After the controller completes its operations, the response starts its journey back to the client. Along the way, it may again pass through middleware functions, for instance, to format the response or add any necessary headers.

Finally, the HTTP response is returned to the client, containing the requested data, status codes, and any other relevant information.

Architecting this flow is crucial for our design and security in general, as it ensures efficient and secure processing of requests, enhances the scalability of the API, and provides a structured approach to building our web services. This is where our enterprise library or if a vendor is chosen to help fulfill this need, a solid SDK comes into play. It can be plugged in here ensuring all of our services have authorization handler by default. Helping us achieve not only North-South authorization but also East-West authorization. The effect here is we have met our requirements around authorizing each request to our services and even some of the service-to-service requirements (but not all). We have also achieved one of our biggest goals. We are meeting our development teams where they are in their journey. They can implement the validation request at the API Gateway or they can make a minor change and validate a request upon entry to their service.

Note: Some of the BlueMyst development teams are looking to innovate. They would like to go beyond enterprise libraries and use even more cloud native techniques. The great thing about OPA is its extensibility. This means our teams can accomplish their goal fairly seamlessly via Dapr. Dapr supports custom middleware, which allows users to plug in additional processing or functionality into the request/response path. Dapr has middleware that consults OPA for policy decisions before processing a request. The documentation states that they use a sidecar approach. Dapr uses a sidecar for each service, OPA is deployed as a sidecar to enforce policies before requests reach the actual service. “I thought we needed a service mesh for sidecars?” Well in this case the Dapr framework is managing the sidecar. As a pod is instantiated an OPA sidecar is delivered with the application. https://docs.dapr.io/reference/components-reference/supported-middleware/middleware-opa/

The architectural thought process for the authorization mechanism, as with our main theme, is adoption. As an architectural team we know that authorization requirements are no good if they are not adopted. By choosing a policy-based approach OPA and a common language like Rego to express our policies, we have achieved standardization. The policies can be written either by security teams or development teams. Deployment and delivery cannot be simpler as the policies are code. They can be stored and maintained in common DevOps tooling and either delivered through configuration or via a control plane platform such as Styra DAS.

Demonstrating there are many options for deployment, we have established that we understand our developers needs and have achieved “meeting them where they are in their cloud journey”. We believe that these efforts will lead to adoption.

In our next installment we will take on our next challenge secure service-to-service communications…

--

--

number40
number40

Written by number40

Really enjoy new technologies and helping developers and architects build secure systems. Currently working as a Cyber Security Architect.

No responses yet