Skip to main content
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

A data scientist specializing in machine learning, AI, Python, and TypeScript, with a strong interest in applying these technologies to data-driven projects and innovative AI solutions.

View all authors

Key Component of a Manufacturing Data Lakehouse

· 12 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Manufacturing Data Lakehouse Building Block

In today's data landscape, flexibility in terms of performance, cost, and storage availability is at a premium. Data warehouses provide structured, analytical capabilities to process large amounts of data. Data lakes, on the other hand, are known for scalability, and the flexibility to handle vast amounts of unstructured data.

In recent years, however, demand has grown for a robust marriage of these two concepts, leveraging cloud-based technologies and advanced data processing frameworks to store massive volumes of data in raw form (data lake), while also supporting structured querying and analytics (data warehouse). This combined data solution is referred to as a data lakehouse.

The data lakehouse concept is particularly useful for manufacturing, as manufacturing requires fast processing of large amounts of data from numerous sources, including sensors (vibration, temperature, power), employee productivity data (files, spreadsheets, documents), logs, cameras, GPS, and more, creating an unstructured jumble of formats that is difficult to process in a traditional data warehouse.

A data lakehouse is an IT infrastructure that provides a unified solution to handle multiple data formats while still providing the capacity to make sense of this data. It supports intelligent query-based analytics and applies structure to the chaos.

At the same time, ReductStore Cloud Solution is a special type of data storage solution that combines the flexibility of a time series database with the capacity of an object storage. In this article, we will explain these strengths in detail and why we think ReductStore has many advantages as a building block to create a data lakehouse for manufacturing.

In order to present these strengths, we will tie our case to the core components of a strong data lakehouse solution.

The MinIO alternative for Time-Series Based Data

· 10 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

MinIO vs ReductStore

The amount of data generated world-wide is expanding exponentially, and will only increase further in coming years. In fact, over 90% of the data worldwide has been generated in the last two years, and 40% of data in 2020 was generated by machines. Not to mention that 80 to 90 percent of data is unstructured. Not only is timely processing of said data ever more important, the data itself is often time-stamped and must be handled in a time-based structure. Due to the rise of AI/ML, Robotics, IoT, and edge-computing, solutions that can efficiently leverage much cheaper and plentiful unstructured object/blob storage while maintaining the ability to organize, read, and transmit time-series based data from multiple sources and in multiple formats are in great demand. ReductStore and MinIO are two solutions designed to meet this demand.

How to Store Vibration Sensor Data

· 19 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Vibration Data Flow Intro

This is a complete guide to storing vibration sensor data efficiently and effectively. We'll cover everything from the basics of vibration data to best practices for managing it as well as setting up a robust and scalable environment to store, query, and replicate vibration sensor data.

Vibration data is typically collected from sensors attached to machinery or equipment to monitor its health and performance. This data can be used to detect anomalies, predict failures, and optimize maintenance schedules.

However, effectively managing vibration data can be challenging due to its high frequency, large volume, and complex nature. To address these challenges, we must implement efficient storage strategies that balance data retention with storage constraints.

After covering the basics of vibration data, we'll explore the best practices for managing this data, including storing both raw and pre-processed metrics to take advantage of their benefits. We'll also look at the differences between traditional time series databases and a time series object store such as ReductStore, which is designed to efficiently handle time series unstructured data, making it an excellent choice for storing high-frequency vibration sensor measurements.

We'll then cover a real-world example of storing vibration sensor data using Python and ReductStore. This example will show you step-by-step how to store raw sensor data, calculate key metrics, and query and retrieve this data for analysis.

Finally, we'll discuss strategies for preventing data loss through volume-based retention policies and automated replication to ensure that valuable information is always available for diagnosis and analysis.