

Learn More: How To Build Data Pipelines for a Multi-Cloud Environment Implementation Google provides excellent tutorials, and I will reference some of those as we go through the process. Describing each service in detail would require many articles. This article does not contain a tutorial for any of these services but will explain how the pieces fit together to create a pipeline. We could also use GCF to send the email at the final step, but we will put that in our composer orchestration to keep things as contained as possible. We will also use a Google Cloud Function (GCF) to watch for a file arrival. Our database will be Google’s big data distributed SQL database, BigQuery, and we will use Google’s managed Airflow, Google Cloud Composer. We will use Google Cloud Storage (GCS) to store the files.
PIPELINE TOOLBOX CODE
As part of our technical requirements, we want to keep this as serverless as possible and write as little code as possible. Even with restricting ourselves to GCP, there are still many ways to implement these requirements.

Our imaginary company is a GCP user, so we will be using GCP services for this pipeline. When the file is delivered, an automated email should be sent to a specific address that will open a ticket alerting the other team that a file is available. The new file will be stored in a shared location so that another team can access it anytime. As part of the process, a new file should be generated that will contain all order data aggregated by customer and product. The data should be stored over time and new data appended. The file will contain a customer name, a product id, an item count, and an item cost. The file will come from our sales management system and the data will be used for tracking sales and providing aggregated data to an upstream system. Our sales department wants to provide a CSV file of recent sales. In this article, we will conduct a simple demonstration of how to build a simple data pipeline using services from the Google Cloud Platform (GCP).įor this particular use case, let’s pretend to be an imaginary company named LRC, Inc. Google Cloud Platform is a suite of cloud computing services that brings together computing, data storage, data analytics and machine learning capabilities to enable organizations to build data pipelines, secure data workloads, and carry out analytics.
