Apache Hudi at Yields.io

We are proud to announce that Yields.io is officially included among Hudi users, along with AWS, Uber, Alibaba, Tencent, and others.

At Yields.io, clients use our solution to validate and monitor models. Hudi is one of the key parts of the solution.

Hudi was first developed at Uber and has been recently incorporated into Apache Software Foundation, which secures its future as an open-source project.

Hudi is a layer over the Hadoop file system. It allows the user to do more than just ingest data, but to do so with many powerful features, such as incremental upserts, snapshots, rollbacks and timeline.

Upsert means “update or insert”, where data can be inserted if not found, or updated otherwise. This allows our clients to easily merge new data with existing ones, with minimum input from the user. One of the scenarios for upserting is to correct the existing data, without the need to restart the ingestion from scratch.

Incremental upserts provided by Hudi is a way of combining low-latency processing, similar to streaming, with efficacy of batch processing, and flexibility of columnar data processing.

Thanks to Hudi, our Chiron platform provides the user with a timeline feature to easily browse through the data history. If new data is wrong, the user can rollback it to the previous version. Snapshots allow the user to “bookmark” the data at any point in time.

The users of our Chiron platform benefit seamlessly from all of those features. We provide an easy-to-use and powerful front-end, where users can manage and run various analyses (exploration, validation, monitoring, etc.) on both model and dataset inventories. Hudi is an important part of the solution provided by Chiron platform to achieve those goals.

Interested in learning more? Watch a demo of Chiron, our flagship product.