Off-Prem

SaaS

MySQL Heatwave dives into object storage data lakes

Oracle joins the analytics anywhere bandwagon, promises future access to AWS S3


Oracle has launched MySQL HeatWave Lakehouse, an extension to its proprietary analytics platform which now supports object storage outside the database.

The analytics system, which was built on top of the open source MySQL database, can query data in the object store in a variety of file formats as well as combine it with data in MySQL. Meanwhile, files in the object store are queried directly by HeatWave without copying the data into the MySQL database, Oracle told us.

The data lake technology supports file formats including CSV, Parquet, and export files from other databases. At the same time, MySQL Autopilot promises to improve performance and scalability without requiring database tuning expertise.

On a 500TB TPC-H benchmark, Oracle claims queries took nine times longer on AWS's data warehouse and 17 times longer on Snowflake and Databricks compared with the new Heatwave datalake. Google's BigQuery would be 36 times slower, Oracle reckons, though it did not publish comparisons with Teradata, the data warehouse vendor founded in 1979.

The system is only available on Oracle Cloud Infrastructure (OCI), but Nipun Agarwal, senior vice veep of MySQL HeatWave, told The Register that Oracle planned to extend the system to query data held in object storage in other clouds including AWS, Azure and GCP.

"One of the important things to note over here is that data in the object store remains in the object store," he said. "We do not copy data from the object store into the MySQL database. Secondly, the processing of this data, whether it's loading or queried, is done by Heatwave not by the MySQL engine. That's what gives it extreme scalability because the Heatwave cluster can scale up to 500 nodes."

Using analytics engines to query data outside their home database is not new. The approach was used by Snowflake, Cloudera and Google's BigQuery with their support for the Apache Iceberg table format. Similarly, Databricks, Microsoft and SAP have endorsed Delta Lake table format, an open source format under the Linux Foundation, created by Databricks.

Commentators and vendors have suggested most vendors will come to support most formats, including Hudi.

Agarwal said Oracle intends HeatWave to support these formats in the future, starting with Iceberg and Delta Lake.

The Autopilot feature offers schema inference, which help users determine data type in object storage before data is analyzed by the query engine.

"We can come up with this mapping, even for files which don't have metadata," Agarwal said. "Autopilot can make these predictions in less than one minute. We invented this technique called adaptive data sampling, which very intelligently scans and samples the file without compromising on the accuracy."

Autopilot also predicts the in-memory representation for a specific data source, the optimal size of the cluster that is needed to compute the data and how long it's going to take to load the data, he said.

Holger Mueller, vice president and principal analyst at Constellation Research, said Oracle had introduced new features to HeatWave in the last three years at a rapid pace. "The HeatWave team has out-innovated all other cloud databases," he claimed.

The move into object storage was "huge," he added, because it "allows users to bring all the data of the enterprise together – into one single query. It is something enterprises have long waited for."

Meanwhile, the ability to query data in AWS, Azure and GCP object storage would appeal to users who want to work across all their enterprise data using Heatwave, he said.

Like any suite model, Oracle Heatwave had the downside of competing with specialist players in any one of its features. "But, at this point, Oracle is more than good enough," Mueller said. ®

Send us news
2 Comments

Analysts scratch heads over MariaDB's decision to ditch DBaaS crown jewels

'Their future is murky at the moment'

Birmingham set to miss deadline to make Oracle disaster 'safe and compliant'

Bankrupt council is like 'ship adrift' – lacks financial info in midst of equal pay nightmare

Imagine a world without egress fees or cloud software license disparities

UK regulator lists series of potential remedies for anti-competitive practices early on in probe

In rare bout of generosity, Oracle extends free support for Database 19c

Big Red says it wants to give customers time to upgrade to 23c, which only exists in the cloud for now

UK IaaS market: Deeper probe by competition regulator lands soon

Ofcom to refer findings to CMA – which insiders say will home in on egress fees, interoperability and licensing

HTTP/2 'Rapid Reset' zero-day exploited in biggest DDoS deluge seen yet

Botnet storm drowned last record with 398 million requests per second

MariaDB ditches products and staff in restructure, bags $26.5M loan to cushion fall

Strategic DBaaS and distributed back end jettisoned after years of promotion

City council Oracle megaproject got a code red – and they went live anyway

Poor security and segregation of duties also worry auditors

Obscured by clouds: Time for IaaS vendors to come clean and play fair

All that stuff about resilience, choice, and control? Yeah, we'll take them now please

AWS stirs the MadPot – busting bot baddies and eastern espionage

Security exec Mark Ryland spills the tea on hush-hush threat intel tool

CERN swells storage space beyond 1EB for LHC's latest ion-whacking experiments

A petabyte or more a day of readings? No problem, pal

Techies at Europe's biggest council have 8 weeks to pull finance reports from Oracle system

Auditors issue new deadline following ill-fated migration