3 docs tagged with "big-data"

Alluxio: Data Orchestration for Analytics and AI

Alluxio is a data orchestration technology for analytics and machine learning in the cloud. It bridges the gap between data-driven applications and storage systems, bringing data closer to compute for faster processing while providing a unified namespace for data access across different storage systems.

Apache Hive Overview

Apache Hive is a data warehousing system built on top of Apache Hadoop for providing data query and analysis. It provides an SQL-like interface to query data stored in Hadoop's Distributed File System (HDFS) or other compatible storage systems. Hive translates SQL queries into MapReduce jobs or other execution frameworks like Apache Spark or Apache Tez, allowing users to interact with massive datasets using familiar SQL syntax.

Apache Spark: Unified Analytics Engine

Apache Spark is a unified analytics engine for large-scale data processing with built-in modules for streaming, SQL, machine learning, and graph processing. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine that supports general execution graphs.