O’Reilly, the premier source for insight-driven learning on technology and business, today announced the results of research into the state of data quality in 2020. The “O’Reilly State of Data Quality in 2020” report reveals concerns around data quality and uncertainty about how best to address those concerns in the enterprise.
Key findings include:
- There are too many data sources – and little consistency: when asked to share the primary data quality issues they face, more than 60% said they were suffering from “too many data sources and inconsistent data.” This was followed by 50% reporting “disorganized data stores and lack of metadata” and “poor data quality controls at data entry” (selected by 47%).
- Organizations are dealing with several data quality problems at the same time: a majority of respondents reported that they’re dealing with either three or four data quality issues at the same time. 56% of respondents reported at least four data quality issues and 71% reported having at least three data quality issues.
- Data governance best practices are not being adhered to: 80% of respondents say their organizations do not publish information about data provenance or data lineage, which – along with robust metadata – are essential tools for correctly diagnosing and resolving data quality issues.
- Few resources are currently available: 44% of respondents said that they had “too few resources available to address data quality issues.”
- Use of Machine Learning (ML) and Artificial Intelligence (AI) to address data quality issues is growing: almost half (48%) of respondents, however, say they are now using data analysis, ML, or AI tools to address data quality issues. This should help improve the lack of resources problem, as ML and AI can help simplify and automate the tasks involved in discovering, profiling, and indexing data.
“These findings show the need for both better education and better data management and cataloging tools – those that generate metadata and capture/manage data provenance and lineage,” said Rachel Roumeliotis, Vice President, Content Strategy for O’Reilly, and conference co-chair. “While the research indicates a growing understanding from the c-level of the importance of data quality, there still needs to be a push to educate organizations about data quality, data governance, and general data literacy.”
O’Reilly’s Strata Data and AI Conference is being held March 15-18 at the San Jose McEnery Convention Center in San Jose, CA. The event brings together the most influential business decision makers, strategists, architects, developers, and analysts in data and AI to shape the future of their industry. Key speakers include Jin Hyuk Chang, software engineer at Lyft; Minal Mishra, engineering manager at Netflix; and Paige Roberts, open source relations manager at Vertica.
Registration for the upcoming Strata Data and AI Conference is now open, and a limited number of media passes are available for qualified journalists and analysts. Please follow @strataconf or #StrataDataAI on Twitter for the latest news and updates.
Conducted in late 2019, the “O’Reilly State of Data Quality in 2020” report surveyed more than 1,900 professionals in the data industry. To download the full report, please visit: https://www.oreilly.com/radar/the-state-of-data-quality-in-2020/.
For 40 years, O’Reilly has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise at O’Reilly conferences and through the company’s SaaS-based training and learning solution, O’Reilly online learning. O’Reilly delivers highly topical and comprehensive technology and business learning solutions to millions of users across enterprise, consumer, and university channels. For more information, visit www.oreilly.com.