The Power of Data

You purchase toothpaste and milk at a store and the cashier hands you an arm’s length of printed coupons, many of which will be quite useful on your next shopping trip. How does the store know that you like specialty coffee, Greek yogurt and a particular brand of paper towels? How did they figure out that you have a large dog or that you suffer from seasonal allergies? It all comes down to data.

Data drives just about everything these days: how people work, travel, shop, eat, exercise. Today, individuals can make decisions big and small armed with a wealth of information to which they’ve never had access before. Stuck in traffic? A mobile app can identify a better, faster route in real time. Planning a vacation on a budget? A website can provide the best time to book a flight. Need some exercise? An app can track workouts, examine progress over time and even compare performances to others.

Everyone uses data. And for many of the decisions people make, data-driven or otherwise, there’s a business using data to influence those decisions.

"There’s no business that can ignore the use of data," says Georgette Chapman Phillips, the Kevin L. and Lisa A. Clayton Dean of the College of Business and Economics. "I don’t care if you’re a retailer, if you are in financial services or accounting, whatever you’re doing. You have a bank of data, and if you’re not using that data in your business, you’re squandering so many opportunities."

Companies have long utilized business insights and customer analytics to solve complex problems and personalize the consumer experience. The type of coupons customers receive with their receipts isn’t accidental; nor are the appropriately timed newsletters in inboxes or the items online retailers suggest buyers might like. The simple scan of a loyalty card provides information that allows a business to plan for demand, stock shelves accordingly and even provide those personalized coupons that encourage return visits. The more a business can learn about its customers, the better it can serve them—and the more it can sell.

Today, marketers have access to an unprecedented amount of data and the innovative technologies that allow them to manage and analyze it.

"All this data is changing the way people around the globe do business—and more importantly, it’s changing the way Lehigh prepares students for life after graduation," says David A. Griffith, professor and chair of the Department of Marketing.

Enter Lehigh’s new Data X initiative.

Not a major, minor or center, Data X is a university-wide initiative that will equip students with the necessary skills to not only collect and analyze data, but also to approach it from the varied perspectives of different fields of study. Data X will develop in students the computational thinking skills that will enable them to better navigate the challenges—technical, intellectual and ethical—of an increasingly digital and social media-focused world. The initiative, announced earlier this year, is much bigger than the data that lies at its foundation.

"Students don’t all need to be crack programmers," says Daniel Lopresti, director of the Data X initiative and professor and chair of the Department of Computer Science and Engineering. "It probably behooves a lot of them to learn programming skills, but there are other kinds of skills that allow students to work with and manipulate data in powerful ways. [We also will develop] an awareness of data science and what the opportunities are, what the tools are, and then how to feed the system and interpret the results."

Through a large hiring initiative, Data X will expand its faculty working in computer science and related fields to make available to students of all majors broadly accessible courses that will achieve these goals.

The imperfect data of an imperfect world

In the early days of data analysis, businesses would collect structured data such as units and monthly sales and enter them into Microsoft Excel. Examining that data took time—an analyst would run functions, consider possibilities and try to recognize patterns.

"If you had perfect data in your spreadsheet," says Lopresti, "you could make some conclusions."

Today, "big" unstructured data is a major factor.

"We’re collecting megabytes, gigabytes, terabytes of data in very short periods of time about everything from industrial processes to business experiences to your personal experience, your entertainment experience," says Lopresti. "We’ve gone from basically a very data-poor environment to a very data-rich environment."

Much of this data is too large and complex for traditional applications to process. IBM breaks this "big data" into four elements: volume, velocity, variety and veracity. Volume refers to the sheer quantity of data available; velocity to the speed at which data is collected, organized, analyzed and decisions are made. The variety of the data collected has exploded in recent years. According to an IBM infographic, for example, 400 million tweets are sent daily by approximately 200 million Twitter users. This is in addition to the more than 4 billion hours of video viewed on YouTube each month and the 30 billion pieces of content shared on Facebook. The final "v," veracity, has to do with how much the data on which decisions are based can be trusted. How reliable is the information?

Big data is noisy and incomplete, says Lopresti. "So you need computational techniques that can deal with the volume, the velocity, the variety and the noise and the real-world aspects of [big data]."

People—intelligent, thoughtful, well-trained people—are helping the world get better and better at developing these techniques.

Today’s computational infrastructure is constantly evolving. People have built, and continue to build, the increasingly innovative computational infrastructure required to collect, disseminate, hold and manage massive amounts of data. This infrastructure comes in the form of high-performance computing techniques, mobile computing techniques, cloud computing, machine learning, pattern recognition and data mining—all methods of extracting useful information from complex sources. Techniques for dealing with the volume, velocity and variety of data are improving over time, thanks in part to the vast quantities of data available today.

As a result, an online retailer can now see not only what customers bought, but also everything they browsed and how much time they spent viewing a particular item. Social media platforms such as Facebook and Twitter provide a massive amount of data about users’ likes and dislikes and, on a larger scale, about societal trends. More data, however, doesn’t necessarily mean more answers.

"I think sometimes we can lose sight of the question because we get enamored with the data. Thoughtful questions and careful consideration are key,"says Griffith.

"You can do web-based analysis [of unstructured data]," says Lopresti. "You can put in numbers, press a button and you’ll get an answer out of it. That’s dangerous because, unless you really understand the nuances of the way these things work, it’s not as simple, unfortunately, as using your toaster—you put the toast in, press a button, and either the toast comes out warm or it doesn’t. It’s not like that."

All the data in the world won’t do much if you’re not smart in how you use it, says Lopresti.

A deep connection and a strong foundation

Lehigh’s department of marketing, one of the Data X initiative’s three initial thrust areas (which also include digital media and bioengineering), is already invested in computer and data science. Many members of the marketing faculty conduct related research and provide students with meaningful hands-on experiences, including, for example, opportunities to work with real datasets to solve problems for actual businesses. Data X will take these efforts even further by providing opportunities for students to consider those same problems from different perspectives.

"While we have been doing things across the business school pretty well, I think [Data X] moves us outside of the business school and will move us to interacting a whole lot more with computer science, journalism, psychology, sociology, mathematics, et cetera," says Griffith.

Interdisciplinary study has long been one of Lehigh’s strengths. The program in Computer Science and Business, a popular joint initiative between the College of Business and Economics and the P.C. Rossin College of Engineering and Applied Science, already allows students to enroll in both colleges and complete a unique dual degree, bridging the gap between business and computer science. Through its hiring efforts, Data X will increase the variety and expand the availability of computer and data science courses—already in high demand—to all students, regardless of their major.

Through Data X, the marketing department is currently conducting a dual search for new faculty members—one marketing expert studying computer science and one computer scientist looking at marketing. Rather than compartmentalizing areas of study, efforts like these will provide an unprecedented connectivity across campus, leveraging and complementing its existing strengths.

"The Data X initiative places a spotlight on the importance of interdisciplinary understanding and highlights the fact that we’re all using data and we’re all using different aspects of analysis, tools, techniques, computational sets—you name it—to draw on these things. I think that what it’s going to do is get us to all think more broadly," says Griffith.


If this story interests you, please click here to learn about Lehigh's new Data X initiative, which focuses on strengthening Lehigh's research and teaching capacity in computer and data science across multiple disciplines.