Innovations in the cloud and the rise of more efficient ways to collect,
access, and analyze big data, have rapidly improved the amount of value
enterprises are getting from their data. In 2020, enterprises will
evolve in how they approach data maturity and strategize cloud
According to Tomer
Shiran, co-founder and CEO of Dremio,
the new year will bring compelling reasons to focus on modern cloud
data lakes; increased efficiency of cloud services to remarkably
reduce cloud computing costs; easier ways to make IoT data a valuable
business asset; and open source innovations to accelerate analytics
results. The following five major trends guide his predictions for 2020.
Cloud data warehouses turn out to be a big data detour.
Given the tremendous cost and complexity associated with traditional
on-premise data warehouses, it wasn’t surprising that a new generation
of cloud-native enterprise data warehouse emerged. But savvy enterprises
have figured out that cloud data warehouses are just a better
implementation of a legacy architecture, and so they’re avoiding the
detour and moving directly to a next-generation architecture built
around cloud data lakes. In this new architecture data doesn’t get moved
or copied, there is no data warehouse and no associated ETL, cubes, or
other workarounds. We predict 75 percent of the global 2000 will be in
production or in pilot with a cloud data lake in 2020, using multiple
best-of breed engines for different use cases across data science, data
pipelines, BI, and interactive/ad-hoc analysis.
Enterprises say goodbye to performance benchmarks, hello to
Escalating public cloud costs have forced enterprises to re-prioritize
the evaluation criteria for their cloud services, with higher efficiency
and lower costs now front and center. The highly elastic nature of the
public cloud means that cloud services can (but don’t always) release
resources when not in use. And services which deliver the same unit of
work with higher performance are in effect more efficient and cost less.
In the on-premises world of over-provisioned assets such gains are hard
to reclaim. But in the public cloud time really is money. This has
created a new battleground where cloud services are competing on the
dimension of service efficiency to achieve the lowest cost per compute,
and 2020 will see that battle heat up.
IoT data finally becomes queryable.
The explosion of IoT devices has created a flood of data typically
landing in data lake storage such as AWS S3 and Microsoft ADLS as the
system of record. But while capturing and storing IoT data is easy, the
semi-structured nature of IoT data makes it difficult to process and
use: data engineers are forced to build and maintain complex, and often
brittle, data pipelines to enrich IoT data, add context to it, and
accelerate it. Software AG has stepped in to tackle this problem head on
with their Cumulocity
IoT Data Hub, and we predict in 2020 IoT data will be directly
queryable at high performance via business intelligence, self-service
analytic, machine learning, or SQL-based tools.
The rise of data microservices for bulk analytics.
Traditional operational microservices have been designed and optimized
for processing small numbers of records, primarily due to bandwidth
constraints with existing protocols and transports. But now this
long-standing bottleneck issue has been solved with the arrival of Apache
Arrow Flight, which provides a high performance, massively parallel
protocol for big data transfer across different applications and
platforms. We predict that in 2020 Arrow Flight will unleash a new
category of data microservices focused on bulk analytical operations
with high volumes of records, and in turn these data microservices will
enable loosely coupled analytical architectures which can evolve much
faster than traditional monolithic analytical architectures.
Apache Arrow becomes fastest project to reach 10M downloads/month.
Arrow (co-created by Dremio) has firmly established the
industry-standard for columnar, in-memory data representation and
sharing, powering dozens of open source & commercial technologies and
making data science 100 to 1000X faster. Arrow has already achieved over
6M monthly downloads in the three years since release, with downloads
continuing to grow exponentially. As a result, we predict Arrow will
reach 10M downloads/month in 2020, faster than any other Apache project.
And with the release of Apache Arrow Flight (also co-created by Dremio)
this past October, the performance benefits of Arrow are being extended
to the Remote Procedure Call (RPC) layer further increasing data
interoperability. While Arrow Flight is just getting started, we predict
that by 2025 it will replace decades-old ODBC/JDBC as the de facto way
in which all modern data systems communicate.
Tweet this: .@Dremio 2020 predictions #cloud #datalake #analytics
#bigdata #iot #opensource https://www.dremio.com/press-releases/
Dremio’s Data Lake Engine delivers fast query speed and a self-service
semantic layer operating directly against data lake storage. Dremio
eliminates the need to copy and move data to proprietary data warehouses
or create cubes, aggregation tables and BI extracts, providing
flexibility and control for Data Architects, and self-service for Data
Consumers. For more information, visit www.dremio.com.
Founded in 2015, Dremio is headquartered in Santa Clara, CA. Investors
include Lightspeed Venture Partners, Redpoint, and Norwest Venture
Partners and Cisco Investments. Connect with Dremio on GitHub, LinkedIn,
Twitter, and Facebook.