To further strengthen our commitment to providing industry-leading data technology coverage, VentureBeat is pleased to welcome Andrew breast and Tony Baer as regular contributors. Watch out for their articles in the Data Pipeline.
Oracle has owned MySQL since it was acquired by Sun Microsystems well over a decade ago. Under Oracle’s watch, MySQL has remained self-contained. But unless you were MariaDB, until about a few years ago, few thought about Oracle’s responsibility. And with each of the major cloud providers rolling out their own managed MySQL database services, Oracle offered relatively few reasons for customers to sign up for Oracle-enabled MySQL.
Well, that’s no more. Fifteen months ago, Oracle introduced MySQL HeatWave with its own optimized implementation of MySQL running on Oracle Cloud Infrastructure (OCI, also known as Oracle’s public cloud platform). These optimizations should be transparent to the application. And now Oracle is releasing version 3.0 of HeatWave, scaling node size which lowers costs for a range of workloads and introducing machine learning to the database that could benefit from higher density data nodes.
HeatWave is not plain open source MySQL as it differs from extensions developed by Oracle (see below). This is not uncommon with open source, as Amazon Aurora and Azure PostgreSQL Hyperscale, as well as the myriad other PostgreSQL flavors on the market, show that open source databases provide a clean slate for differentiation.
On its way to becoming a serious contender in the MySQL space, Oracle has taken the database in a unique direction with HeatWave: it has been optimized for analytics, in addition to transaction processing, by leveraging MySQL’s support for pluggable storage engines. In this case, an in-memory columnar store engine was hooked up, working side-by-side with the row store, and includes optimizations tailored for analytical query processing.
Plugging in a columnar storage engine working side-by-side with a row-oriented engine is not uncommon; MariaDB did it, and indeed Oracle took a similar path a few years ago, but with a different technology for its flagship database. But to date, Oracle is the only company that has developed an analytics-optimized engine for MySQL.
In the latest version, Oracle introduced new improvements to reduce computational costs and integrate machine learning into the database.
Let’s start with the operating costs. HeatWave version 3.0 doubles the data density in each compute node without changing computing power prices. So now you can only consume (pay for) half the number of nodes to compute the same workload. Incidentally, Oracle set the stage for all of this in the previous version of HeatWave 2.0 when it doubled the maximum limit for HeatWave clusters to 64 nodes.
The combined calculation of cost efficiency and scalability should come in handy now that machine learning models can be run on the database. keep that thought
Beyond data density, HeatWave 3.0 makes scaling more economical as you can add any number of nodes (up to a maximum of 64) in any increment. This is in line with what Oracle has rolled out for its Autonomous Database cloud service to do away with so-called standard “T-shirt sizes”. So elasticity with HeatWave doesn’t mean you have to double the number of active nodes every time your workload bursts are calculated. HeatWave also improves availability during resizing, with a few microseconds at most while the query is suspended.
HeatWave 3.0 adds some tricks to further speed up processing. Like any columnar storage engine, HeatWave makes extensive use of data compression. And it applies some common techniques like Bloom filters that reduce the amount of cache required for query processing. Specifically, HeatWave implemented clogged Bloom filters that can perform the necessary data searches with much less overhead, significantly reducing the required buffer memory.
These capabilities, in turn, pave the way for Oracle to introduce the ability to process machine learning models within the database without the need for an external ETL engine or machine learning execution environment. And with that, Oracle follows a trend that also includes AWS (Amazon Redshift ML), Google (BigQuery ML), Microsoft (SQL Server with in-database R and Python capabilities), Snowflake (with Snowpark), and Teradata (via advanced SQL ). Comparing these approaches is like apples and oranges, however, as each vendor takes different paths, ranging from developing models externally to providing limited, curated choices for running ML, while others extend SQL themselves.
Heatwave goes the curated way. It’s an approach fit for business analysts or “citizen data scientists” to democratize machine learning in the same way that self-service visualization put BI in the hands of the average user. In contrast, the external path is aimed at data scientists in organizations who compete for their ability to develop their own unique, highly complex models.
An advantage of the curated approach is that no external tools are required, which means that selecting, configuring, training and running ML models is done entirely within the database. This eliminates the overhead and cost of moving data to tools or ML services running on separate nodes. Oracle also notes that keeping all data in the database reduces potential attack surfaces and consequently reduces security risk.
This is how HeatWave’s AutoML approach works. The user chooses the table, columns, and algorithm type (e.g. regression or classification) and then specifies where to store the model artifacts. The system automatically determines the best algorithm, appropriate features, and optimal hyperparameters and generates a tuned model.
It streamlines important steps; For example, when testing a candidate model, it separates individual tasks or steps that the model performs, with each step being evaluated using proxies or stubs that simulate the algorithm using a representative sample of hyperparameters. It then automatically documents the choice of data, algorithms, and hyperparameters to make the model explainable, as shown in the figure below.
The benefit of in-database ML processing is a flatter architecture and the elimination of data movement overhead. While the downside of putting application processing in the database is more processing overhead, there are several design features that make these issues moot.
The cloud-native architecture, which allows computing power to be scaled as needed, eliminates the problem of competing for limited resources. Additionally, most cloud analytics platforms that support in-database ML either streamline optimization or only support limited libraries of models to prevent the AI equivalent of the workload from hell, especially for training runs, which tend to be the most time-consuming and computationally intensive are. Oracle has published ML benchmarks for HeatWave 3.0 which are available on GitHub for customers and prospects to run and verify for themselves.
Oracle’s introduction of ML processing in HeatWave complements an ML-related feature from its latest release, version 2.0 last summer. This release included MySQL Autopilot, which uses internalized machine learning to help customers run the database, e.g. B. Suggests how to mount and load the database while providing closed-loop automation for error handling/error recovery and query execution.
With version 3.0, MySQL HeatWave comes full circle by using ML to support execution of the database and support execution of ML models in it. This is another example of a prediction we made for this year that machine learning will take center stage, both to optimize database operations and to allow customers to develop models on the database and/ or to execute.
VentureBeat’s mission is intended to be a digital marketplace for technical decision makers to acquire knowledge about transformative enterprise technology and to conduct transactions. Learn more about membership.
https://venturebeat.com/2022/04/01/oracle-cranks-up-mysql-heatwaves-thermostat-for-in-database-machine-learning/ Oracle is cranking up the MySQL HeatWave thermostat for database machine learning