Varada, the data lake query acceleration innovator, today announced a new capability of its flagship platform designed to support text analytics workloads and help data teams deliver faster time-to-insights on exabytes of string-based data. Varada’s solution for interactive text analytics—integrated with the popular open source search engine Apache Lucene—works directly on the customer’s data lake and serves SQL data consumers out-of-the-box. As a result, data teams can achieve maximum performance without moving data, duplicating or modeling it.
Most text analytics solutions are deployed as a bolt-on addition to existing data analytics stacks, which presents problems for agility, cost, time-to-market and scaling. Varada’s addition of Lucene support within its solution delivers an integrated stack that performs and scales to exabytes of data on data lakes, making possible richer business insights.
Today’s announcement means that Varada’s technology can give companies actionable business insights by leveraging 10 times more data and delivering results up to 100 times faster. Varada’s text analytics feature is easily deployed in the organization’s own environment, so the data is not duplicated and never leaves. Plus, it incorporates all data from any source without modeling, which means data teams get “zero time to market” with results that are both thorough and precise. Varada’s dynamic and adaptive indexing technology enables text analytics workloads to run at close to zero latency response time, especially on latency sensitive queries.
“Text analytics has been evolving from on-premises solutions to cloud-based solutions,” said Eran Vanounou, CEO at Varada. “These approaches were innovative when introduced, but they have become complex and expensive, especially given the wide range of analytics platforms and stacks. At Varada, we’re introducing the next era in text analytics with a solution that runs directly on top of the customer’s data lake and alongside other analytics workloads. For the first time, users can deploy a text analytics solution without having to move data to expensive systems and complex, proprietary data schemas.”
Text Analytics Challenges Are Best Addressed on the Data Lake
As the volume of data and text analytics applications grows exponentially, data teams are increasingly challenged to optimize cost and performance. Large-scale text analytics requires customized optimizations for LIKE %text% function and RegExps, which often results in turning to disparate data silos that specialize in text.
“More often than not, organizations use complex and high-end text analytics solutions for simple SQL text search, such as “prefix,” “suffix” and “contains” functions,” explains Ori Reshef, Varada’s vice president of products. “There is no need to build and maintain a standalone text analytics solution that will over-index each string and comes with a hefty price tag on both license and maintenance. An example here would be n-grams. With Varada, which integrates Lucene index within our data lake query acceleration engine, we are using minimal indexing to get the job done.”
Varada’s Adaptive Indexing Technology
Varada’s adaptive and autonomous indexing technology leverages machine learning capabilities to dynamically accelerate queries to meet evolving business requirements. Varada indexes data directly from the data lake across any columns. Based on the data type, structure, and distribution of data, Varada automatically creates an optimal index from a set of indexing algorithms including text-optimized search and index (based on Apache Lucene) as well as bitmap, dictionary, trees, etc. Indexes also adapt to changes in data over time, which is critical for effective analytics anomaly detection across vast datasets.
Varada’s smart engine detects bottlenecks automatically and adjusts the cluster and acceleration techniques to ensure business requirements are met at the allocated budget. Key features include:
- Works atop the customer data lake, enabling access to new data as it becomes available.
- Works directly on raw behavior data, without any need to model data to improve performance; any new data can be analyzed immediately with zero time-to-insights, resulting in fast results without losing the full dimensionality of the data
- Continuously monitors queries to identify which data is used and how it’s being used by workloads; this critical observability is then leveraged to dynamically and automatically accelerate text analytics workloads with adaptive indexing and caching of data or intermediate results
- Completely decoupled from the storage layer and can easily scale to serve fluctuating demand
- Provides data teams full control to prioritize analytics projects, define budgets and performance requirements
The Varada mission is to enable data practitioners to go beyond the traditional limitations imposed by data infrastructure and instead zero in on the data and answers they need—with complete control over performance, cost and flexibility. In Varada’s world of big data, every query can find its optimal plan, with no prior preparation and no bottlenecks, providing consistent performance at a petabyte scale. Varada was founded by veterans of the Dell EMC XtremIO core team, and is dedicated to leveraging the data lake architecture to take on the challenge of data and business agility. Varada has been recognized in the Cool Vendors in Data Management report by Gartner, Inc. For more information, visit: https://varada.io/