Dremio, the innovation leader in data lake transformation, today announced support for Apache Arrow Flight, an open source data connectivity technology co-developed by Dremio that radically improves data transfer rates. As a result, client applications can now communicate with Dremio’s data lake service more than 10 times faster than using decade-old technologies, such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC).
This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20210209005394/en/
Speed of a 500,000 row Apache Parquet dataset query using pyodbc vs. Apache Arrow Flight (Graphic: Business Wire)
The implementation comes as data scientists, engineers and architects scale their applications and need to exchange data across process boundaries in a fast and efficient way without making copies. As companies continue to implement machine learning models and become more data-centric and data-driven, they require high-speed access to data to be successful. Apache Arrow, an open source project co-created by Dremio engineers in 2017, is now downloaded over 20 million times per month. Arrow Flight enables Arrow-powered technologies, such as Dremio and Python data science libraries, to exchange data at network speeds without any serialization/deserialization overhead.
“Even as data volumes have increased by orders of magnitude, companies have had to continue to rely upon such archaic 25-year-old technologies like ODBC and JDBC for data transfer. While these technologies are fine for applications that require small datasets, they are a bottleneck for modern applications, such as machine learning, where millions of records are retrieved over the wire. Today we are announcing the availability of Arrow Flight in Dremio, which will open the door for new applications of data and set the performance standard for high-speed data transfer in the modern enterprise,” said Tomer Shiran, founder and chief product officer at Dremio.
In addition to superior performance, Arrow Flight offers many other benefits. Arrow Flight is cross-platform and has multi-language support including Python, Java and C++, with others to come. As an example, data scientists can retrieve data directly from a Flight-enabled database like Dremio into a Python dataframe without having to extract the data into local files on the client.
The ability to avoid data extracts, combined with Arrow Flight’s wire-level encryption and authentication capabilities, enables companies to overcome data governance and security challenges. Since data is being consumed directly from the centralized IT-controlled database or data lake service, data teams can control and monitor access to the data and delete records when necessary to comply with GDPR and CCPA requirements, such as “the right to be forgotten.”
Arrow Flight is now available as part of the Apache Arrow 3.0 release. To learn more, Dremio will be hosting a webinar, “Eliminate Data Transfer Bottlenecks with Apache Arrow Flight,” on Thursday, Feb. 25 at 10 a.m. PT / 1 p.m. ET and you can register to attend here.
Dremio reimagines the cloud data lake to deliver faster time to analytics by eliminating the need to copy and move data to proprietary data warehouses, or create cubes, aggregation tables and BI extracts. A self-service semantic layer provides flexibility and control for data architects, and self-service for data consumers. Founded in 2015, Dremio is headquartered in Santa Clara, CA. Investors include Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, Redpoint Ventures and Sapphire Ventures. For more information, visit www.dremio.com. Connect with Dremio on GitHub, LinkedIn, Twitter and Facebook.