If you are launching Hadoop in your enterprise, you’ll want to get the most out of its rich data. Users across the enterprise have important roles to play in your eventual Hadoop user community. This process doesn’t end with the first user group or first user. As with data warehouses, the users will eventually form into multiple categories. Although the Hadoop user base is often data science heavy at first, Hadoop builders should nourish users across the enterprise for the valuable data they are making available.
William McKnight, Consultant, McKnight Consulting Group, firstname.lastname@example.org
William will be presenting the following sessions Master Data Management, Big Data Technology and Use Cases, and Strategies for Consolidating Enterprise Data Warehouses and Data Marts into a Single Platform at our Enterprise Data and BI Conference Europe 2016, 7-10 November, London This article was previously published here
Four categories will make up your Hadoop user community, and each one interacts with Hadoop in a specific way.
Scientists with the statistical and applied mathematical expertise to analyze data for insights — the ability to extract signal from noise — are more critical than ever in Hadoop environments. Most Hadoop projects, of course, involve processing big data to find a relatively minute amount of signal.
Hadoop’s ability to rapidly crunch through enormous amounts of data makes it economically feasible to extract these insights. In addition, data scientists may discover patterns that aren’t evident in smaller data sets.
These members of your team investigate the value of various big data sources, which in Hadoop environments means mastering a wider range of tools and analytic techniques.
Creating queries and guiding machine-learning algorithms, they discover data patterns and relationships that could potentially be useful for BI or for building predictive or descriptive analytic models. They determine which data looks interesting enough to justify further analysis and build logical views (e.g., Hive tables) on top of the data to facilitate queries by themselves and other users.
As in the traditional data warehouse environment, data analysts run queries to respond to inquiries or produce reports. Thanks to today’s SQL-on-Hadoop solutions, most of their skills are transferable, and with the processing performance of Hadoop, they’ll enjoy a nice bump up in productivity.
What’s different for Hadoop is that the role of data analyst tilts more toward internal consulting.
You need these data experts to be able to point business analysts and other business users to the right sources and guide less technical staff in accessing, integrating, and analyzing data for specific business aims. Data analysts can also help by creating data visualizations that can be accessed from libraries and reused in different contexts — BI tools, mobile apps, Web pages, etc.
In Hadoop environments, business analysts still fulfill the critical functions of looking for ways to improve business processes, posing questions that can lead to discovering strategies for competitive advantage and helping to specify requirements for new products and services.
They’re empowered in all of these responsibilities by an increasing range of role-based, self-service graphical tools that greatly expand their abilities to access, integrate, and analyze big data. In fact, data integration solutions on Hadoop include hundreds of prebuilt connectors to big data sources.
In addition, these tools enable the role of the business analyst to encompass data preparation, including contributing to data profiling, cleansing, and validation processes. They may also be able to help specify and maintain data quality rules under the oversight of the data steward.
Hadoop is ushering in a new era of ubiquitous analytics, where nearly every job in the enterprise involves working with data in some respect — ideally via self-service tools or using familiar apps.
Forward-looking organizations understand the competitive potential of infusing everyday tasks with evidence, insights, and predictions. They’re doing everything they can to put big data analytics in the hands of enterprise citizens.
The Bottom Line
As you set up or build out your own Hadoop environment, keep end users — and the tremendous leverage they can exert as big data consumers — clearly in your sights.
William McKnight is President of McKnight Consulting Group (www.mcknightcg.com). He is an internationally recognized authority in information management. His consulting work has included many of the Global 2000 and numerous midmarket companies. His teams have won several best practice competitions for their implementations and many of his clients have gone public with their success stories. His strategies form the information management plan for leading companies in various industries. William is author of the book “Information Management: Strategies for Gaining a Competitive Advantage with Data”. William is a very popular speaker worldwide and a prolific writer with hundreds of articles and white papers published. William is a distinguished entrepreneur, and a former Fortune 50 technology executive and software engineer. He provides clients with strategies, architectures, platform and tool selection, and complete programs to manage information. Follow William on Twitter: @williammcknight.
Copyright: William McKnight, McKnight Consulting Group