top of page

GoCargo Logistics

Case Study

Problem Statement

GoCargo’s operations team faced challenges in analyzing near real-time shipment data due to a manual and inefficient data workflow. New shipping logs in JSON format were being added daily, requiring manual identification and processing of new files. This approach was slow, error-prone, and unable to scale with growing data volumes, resulting in delayed insights and reduced operational efficiency. The goal was to design an automated and scalable data ingestion process to streamline analysis and support timely decision-making.

Solution

An incremental data ingestion pipeline was designed and implemented using Microsoft Fabric to automate shipment data processing. The solution leverages watermarking logic to identify and load only new or modified shipment files into a Delta Lake table in the Bronze layer of the Lakehouse. This automated approach eliminates manual effort, enhances scalability, and enables near real-time visibility into shipment patterns, supporting faster and more reliable operational insights.

Pipeline.png
watermark table create.png

Using Spark SQL, a watermark table was created in the Lakehouse and seeded with an initial timestamp. During pipeline execution, a Lookup activity reads the current watermark value from this table to determine which shipment files are new or have been modified, enabling incremental processing.

upload data.png

A Copy activity within the pipeline ingests shipment files in JSON format whose creation or modification timestamps fall between the stored watermark value and the current pipeline trigger time. These files are uploaded to the Lakehouse as tables in the Bronze layer, ensuring that only new or updated data is processed.

image.png
image.png

Copy Activity

watermark table update.png

Once the ingestion completes successfully, the watermark table is updated with the latest pipeline trigger time. This ensures that subsequent runs process only data added or modified after the previous execution, enabling a fully incremental and automated data ingestion workflow.

Business Impact

  • Enabled near real-time shipment analysis, accelerating decision-making and response to logistics issues.

  • Eliminated manual effort in data identification and ingestion, improving efficiency and reducing errors.

  • Delivered a scalable and automated pipeline capable of handling growing data volumes.

  • Ensured data consistency and traceability through structured watermark-based processing.

  • Strengthened the foundation for advanced analytics and predictive modeling across operations.

bottom of page