Diving into data

Mountaintop mentors support students in their discovery during the ten-week summer session atop South Mountain. They use their expertise and experience to ask key questions and encourage exploration. In some cases, however, Mountaintop is as much a learning experience for mentors as it is for their mentees.

This was the case when two undergraduates and two graduate students joined faculty members from Lehigh’s College of Arts and Sciences and the College of Education to explore the Early Childhood Longitudinal Study (ECLS) and other large-scale national datasets that help researchers learn more about early childhood development.

If you want to learn, for example, whether a young child’s narrative abilities influence his or her eventual reading abilities, you need data. Longitudinal studies, which track individuals or groups at different points in time, are necessary for this type of research, as they allow researchers to measure change over time. Unfortunately, longitudinal studies are often challenging for graduate students to conduct.

“In most studies, especially in developmental studies, we have limited sample size because we can’t really conduct a longitudinal study during a masters or Ph.D.—it’s a long process,” says Burcu Ünlütabak, a graduate student in psychology. “[The national] datasets are already collected. We thought we could ask some questions and find answers to those questions [with] those datasets.”

In three studies, the ECLS followed one cohort of children from birth to kindergarten, another from kindergarten to fifth grade, and a third from kindergarten to eighth grade. The ECLS provides data that allows researchers to study the relations between a variety of individual, family, school and community variables in child development, learning and school performance. 

The trick is learning how to access and analyze all that data.

“Our question was figuring out how to use the datasets. ...  [We thought] if we got a group together, with the graduate students and the undergraduates helping us, we can figure it out,” says mentor Ageliki Nicolopoulou, professor of psychology and global studies.

“Learning how to use national datasets is a skill that really takes some time to develop,” said Laura Wallace, a graduate student in school psychology. “[We thought] this summer project would be a unique training experience to be able to spend some time focusing on learning how to develop the skills and to work with these datasets.”

And so four faculty members and four students began working together to learn how to do just that.

‘A process of exploration’

Ünlütabak, Wallace, Kelsey Konopka ’16 and Katie MacLachlan ’16 joined mentors Nicolopoulou; Amanda Brandone, assistant professor of psychology; Patti Manz, associate professor of school psychology; and Brook Sawyer, assistant professor of teaching, learning and technology, and dove into the ECLS and other large-scale national datasets. All sought exposure and experience. For undergraduates, the project was a good exercise in exploration. For faculty members and graduate students, it was method of determining whether or not these large datasets might be useful to their work.

“This is really setting us up for a much longer research agenda,” says Sawyer. “Once you know these datasets, then you can say, ‘Oh, here are four manuscripts that can come out of this one dataset.’ But it takes a long time to learn the datasets, so that’s what we were mucking around with this summer.”

“It’s very much a process of exploration,” says Nicolopoulou.

The team worked with a consultant from Temple University, who introduced them to some of the datasets and helped navigate the process. They examined published projects to get a sense of how others have used the datasets and created a webpage that compiles links to survey information, useful library databases and tips for using national datasets for research.

Group members negotiated and identified areas in which they were all interested.

“Then we said, ‘What datasets would be amenable to these interests?” explains Sawyer. “And then you go to that dataset and think, ‘Well, which variables...?’ One of the key questions we’re looking at is narrative development, but when you have tens of thousands of children, then it’s not always measured in the way that you would go to the same degree of thoroughness.”

The team quickly discovered that using existing data can take a researcher’s initial question down an entirely different path.

“So then you look at the dataset and say, ‘I can’t answer the question that I was hoping to answer, [so] what other question can I answer?’” says Sawyer.

Students are trained to have control over everything in research, says Nicolopoulou. With these datasets, there is a loss of that control, as the data has already been collected.

“This is practice in learning how to have less control because when they develop these studies, the government and the head researchers picked the measures they were using,” says Wallace. “It’s hard to go into that—I can’t control which language measure they used, or [I wonder] why didn’t they ask parents this question. ... We like to measure things one way, but there are other ways, other tests that people use, other measures that people use. [It helps to start] thinking about the merits of doing something differently or uniquely.”

Making sense of a complicated process

Team members learned that the process of accessing the data isn’t always straightforward. Because the data deals with children, many restrictions, security measures and technology requirements are in place. As team members narrowed down questions they might ask using the data, they also had to investigate how they might access the data they’d need to use. Once they did access the data, they had to navigate how to interpret it—another significant challenge, but one they’ve become more comfortable tackling.   

“We have a few [ideas] that we’re interested in moving forward, but the statistics are overwhelming, so then you need to have someone on board to do the statistics piece,” says Sawyer. “So we’re starting to move in that process in terms of one of the narrative development questions.”

‘An excellent opportunity’

This Mountaintop experience is an ongoing process that has created an avenue for future collaboration—and a new set of skills.

“I’m really interested in home literacy experiences, and I have more knowledge about literacy if we’re talking about language and vocabulary, whereas Ageliki [Nicolopoulou] and Burcu [Ünlütabak] are very interested in narrative development,” says Sawyer. “That’s a niche that I don’t know very much about, so I’m really happy to be collaborating, because I would like to learn more about that area and explore that data.”

“I do feel like I have a good sense of what it would take and what the benefits might be [of using the datasets],” says Manz, who may not use the datasets but is happy to have the skills to do so if she decides to down the road. “Even if we don’t use the datasets, [this project] gave us information on what’s out there in terms of research.”

“As I go on into work experience, having the ability to actually look at data and see where to go from there is important,” says MacLachlan, a global studies student. “I think [this project has] given me a really good, different perspective for where I want to go in terms of research and understanding.”

“[The project allowed me to] explore how I can be a better teacher,” says Konopka, a psychology and environmental science student also in the general education program. “This was a huge opportunity for me to explore different areas that overlap education and psychology, so it’s awesome.”

“This has been an excellent opportunity for all of us,” says Nicolopoulou.

If this story interests you, please click here to learn about Lehigh's new Data X initiative, which focuses on strengthening Lehigh's research and teaching capacity in computer and data science across multiple disciplines.

Photo by Christa Neu