TechnologyFebruary 27, 2019

KillrVideo Python Pt. 2 — App Dev with ProtoBuf and GRPC

KillrVideo Python Pt. 2 — App Dev with ProtoBuf and GRPC

In the first post in this series, I described the KillrVideo reference application and my motivation for creating a new implementation of the microservices tier for KillrVideo in Python.

In this post, I’d like to start tracing the development of this Python application in somewhat chronological order. So, here goes…

// Request to create a new user message CreateUserRequest {
   killrvideo.common.Uuid user_id = 1;
   string first_name = 2;
   string last_name = 3;
   string email = 4;
   string password = 5;
}
// Response when creating a new user message CreateUserResponse {}

Where to start?

Remember that this Python implementation is just the microservice tier — meaning that there is an existing webapp that sits on top of these services which is already implemented (killrvideo-web). And, there is an existing database schema that the services will use (killrvideo-data)

So, I decided to start building the application from the “outside-in”. This means creating the code that defines the interfaces by which the services would be invoked — in this context that code is the GRPC stubs. One advantage of starting with stubs is that I can begin plugging in the Python services to the rest of the KillrVideo system as quickly as possible, including the killrvideo-integration-tests which we’ll dive into in a future post.

KillrVideo Python GRPC Logo

Why GRPC?

Taking a step back, I need to discuss why Google Remote Procedure Call (GRPC) was chosen. The really simple answer is that we’ve defined the standard service interfaces for KillrVideo using GRPC.

There are certainly many options for exposing service interfaces, and the history of the IT industry is littered with many approaches that have gained and then lost popularity over the years (I remember CORBA with a lot more sympathy than the awfulness of defining “web services” with XML-based formats such as WSDL and SOAP).

For modern cloud applications, the leaders these days seem to be:

  • REST
  • GRPC
  • GraphQL

RESTful APIs have the advantage of being ubiquitous and probably the lowest barrier to define thanks to the availability of frameworks such as Spring. Unfortunately I find it challenging to enforce consistent semantics when you have more than a few RESTful APIs, and I’ve been a part of quite a few arguments about how to properly structure URL resource paths that probably weren’t worth the mental energy (“should the path be hotels/inventory, or inventory/hotels?” or “should we put the API version number in the path?”).

The main reason we didn’t choose to define RESTful APIs for KillrVideo services is due to our desire to provide service implementations in multiple languages — the difficulty of describing or enforcing common semantics across multiple language implementations

We ended up choosing GRPC as the way of defining service interfaces for KillrVideo. GRPC is more of a successor to the CORBA style model in which there’s an interface definition language (IDL), from which programming-language-specific stubs can be generated. This is a better fit for our polyglot implementation approach and has served us well so far.

GraphQL is an emerging standard for developing APIs that has a number of promising characteristics, most notably the ability for clients to specify exactly the information they are interested to a fine degree of control. Finer control like joining data and filtering was certainly possible previously in RESTful APIs using techniques like XPath, but the approach was pretty unwieldy and difficult to implement consistently.

The emerging best practice seems to be using GraphQL with in a web application tier as a way of exposing a server-side API that represents data in the way which is most useful to the client application. So GraphQL services are often higher-order services built on top of services that expose APIs using the techniques described above (REST, GRPC, etc.). We’ve actually started the process of rewriting killrvideo-web using GraphQL, so that will be a fun topic to discuss in the future.

Defining the GRPC APIs

We define an interface for each service in the killrvideo-service-protos repository. The interfaces are defined a common repo so they can then be included or referenced in various language implementations.

KillrVideo/killrvideo-service-protos

The Grpc service definitions for KillrVideo. Contribute to KillrVideo/killrvideo-service-protos development by creating…github.com

Here’s an example: the interface for the UserManagementService:

// The service responsible for managing user information service UserManagementService {
   // Creates a new user
   rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);

// Verify a user's username and password
rpc VerifyCredentials(VerifyCredentialsRequest)
   returns (VerifyCredentialsResponse);

// Gets a user or group of user's profiles
rpc GetUserProfile(GetUserProfileRequest)
   returns (GetUserProfileResponse);
}

Notice that each service operation is a remote procedure call (rpc) which in turn takes a single request parameter and returns a single response parameter. These are synchronous operations by definition. Here’s an example of what the request/response messages look like for the CreateUser operation:

// Request to create a new user
message CreateUserRequest {
   killrvideo.common.Uuid user_id = 1;
   string first_name = 2;
   string last_name = 3;
   string email = 4;
   string password = 5;
}
// Response when creating a new user message CreateUserResponse {}

The CreateUserRequest looks straightforward — it includes simple types like string as well as a special type we’ve defined to represent a unique identifier — killrvideo.common.UUID. We’ve defined a UUID type within a common killrvideo namespace for use in our service interfaces, since GRPC (strangely) does not define a common UUID type.

You’ll also notice that although the CreateUser  operation does not have a specific return value, we still define an empty return message CreateUserResponse.

It’s also worth noting that GRPC does support asynchronous message passing semantics. You can see an example “event” definition in the  user_management_events.proto file. We’ll dig more into messaging in an upcoming post.

Generating GRPC stubs

Given service interfaces defined in GRPC, the next step is to create bindings in the chosen implementation language, in our case, Python. Leveraging some of the great tutorials available, I figured out how to generate the stubs for multiple services. First I had to install the grpcio-tools package:

$ pip install grpcio-tools

Then I ran the generator to create the stubs. Note that the generator creates both client and server stubs.

python -m grpc_tools.protoc -I. --python_out=../../../killrvideo --grpc_python_out=../../../killrvideo comments/*.proto common/*.proto ratings/*.proto search/*.proto statistics/*.proto suggested-videos/*.proto uploads/*.proto user-management/*.proto video-catalog/*.proto

There was a bit of a trick to getting the generator to work because of the .proto files files being spread across multiple directories. I ended up having to name each of the directories including the directory containing the common type definitions.

The command above is preserved in the generate-grpc-stubs.sh script which I’ve included in the repository in case anyone should need to regenerate the stubs (for example, on an interface change).

Implementing the services

The next step is to begin to implement each of the services. For the initial implementation I chose to follow the pattern used in the other KillrVideo language implementations, which was to incorporate all of the service interfaces in a single application. This approach provides a lot of simplicity for running / deploying the services, but of course limits our ability to scale services independently, or even mix-and-match service implementations in different languages, something that is on the KillrVideo roadmap.

The servicer

If you examine one of the generated files such as user_management_service_pb2_grpc.py, you’ll see the server stub, which is a Python class that looks like this:

class UserManagementServiceServicer(object):
   """The service responsible for managing user information
   """
   def CreateUser(self, request, context):
   """Creates a new user
   """
   context.set_code(grpc.StatusCode.UNIMPLEMENTED)
   context.set_details('Method not implemented!')
   raise NotImplementedError('Method not implemented!')
...

A subclass for GRPC server implementation

The generated code as set to indicate that each method is not implemented. So the task that is left for us is to implement the methods in the server stub. The best practice is actually to treat the generated stub as a base class and create a subclass to override the method implementations.

You can find my implementation of the Search Service in the user_management_service_grpc.py file.

Registering services with the GRPC server

The final step is to start a GRPC server and register our service with it. While I wish the GRPC generator had an option to generate a nice main() function for us to do this, it appears that you just need to copy some example code. You can find the code that initiates all of the KillrVideo services in the main application file at the root of the module (__init__.py):

def serve():
   # Initialize GRPC Server
   grpc_server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
   # Initialize Services
   CommentsServiceServicer(grpc_server,   CommentsService())
   RatingsServiceServicer(grpc_server, RatingsService())
   SearchServiceServicer(grpc_server, SearchService())
   StatisticsServiceServicer(grpc_server, StatisticsService())
   SuggestedVideosServiceServicer(grpc_server, SuggestedVideosService())
   UserManagementServiceServicer(grpc_server,
UserManagementService())
   VideoCatalogServiceServicer(grpc_server, VideoCatalogService())
   # Start GRPC Server
   grpc_server.add_insecure_port('[::]:' + _SERVICE_PORT) grpc_server.start()
   # Keep application alive
   try:
      while True:
         time.sleep(_ONE_DAY_IN_SECONDS)
 except KeyboardInterrupt:
      grpc_server.stop(0)

This code is a bit different from the example code referenced above in that it exposes multiple services on the same port, which is a perfectly acceptable approach for a simple reference application.

One of the things that you might have noticed in the code above is that I’m delegating the work of registering with the GRPC server to a separate constructor function for each of the service implementations. For example, the user_management_service_grpc.py file we find the constructor which actually performs the registration:

class UserManagementServiceServicer( user_management_service_pb2_grpc.UserManagementServiceServicer): """Provides methods that implement functionality of the UserManagement Service.""" def __init__(self, grpc_server, user_management_service): self.user_management_service = user_management_service user_management_service_pb2_grpc.add_UserManagementServiceServicer_to_server(self, grpc_server)

Isolating GRPC-specific details from business logic

The other task that is performed by the constructor above is storing a reference to a user_management_service, which is the implementation of the actual service business logic. That’s right, I’ve separated out GRPC-related code from the business logic.

The reason for this separation is that if we ever decide to expose service interfaces via REST or some other approach, we can reuse the business logic. In fact, I’ve tried to keep service implementations such as the UserManagementService class free from references to GRPC types, using types that are more idiomatic to Python where available.

A prime example of this would be converting the UUID type defined in the common package to the more standard Python uuid.UUID. The resulting type conversions turned out to be one of the most difficult portions of the GRPC-related code for me to implement. Which leads me to…

The hardest part of GRPC in Python

At the risk of going into too much detail in this post that is already on the long side, I wanted to highlight a special case that was very non-intuitive for me.

If you have operations in your services that can return a list of objects, you’ll want to educate yourself on the intricacies of generating the response object by reading this portion of the protobuf reference. Basically, you will need to use the extend() operation to populate any list values in your response objects.

Here’s an example from user_management_service_grpc.py:

def UserModelList_to_GetUserProfileResponse(users):
   response = GetUserProfileResponse()
​​​​​​​   if isinstance(users, (list,)):
​​​​​​​   response.profiles.extend(map(UserModel_to_UserProfile, users)) elif users is not None: # single result ​​​​​​​   response.profiles.extend([UserModel_to_UserProfile(users)])
​​​​​​​   return response

Wrapping up

To summarize, the setup of the User Management Service is as follows:

  • killrvideo/user_management/user_management_service_pb2_grpc.py — contains the generated GRPC stubs.
  • killrvideo/user_management/user_management_service_grpc.py — contains the subclass of the generated server stub which overrides the operation definitions, converts GRPC types to more natural Python types, and calls the actual business logic
  • killrvideo/user_management/user_management_service.py — contains the actual business logic

Coming up next

That’s it for this time. In the next post, we’ll examine some of our options for service discovery and how we’ve implemented this capability in killrvideo-python using etcd, read Part 3 of the KillrVideo Python series, Advertising Python Services via etcd, here.

This article is cross-posted from Jeff's personal blog on Medium.

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.