Big data is coming. And if your first question is “What the hell is big data?” you are not alone.
Big data is the scary-sounding name for the torrent of data collected from traditional and digital sources that, according to Forbes, “represents a source for ongoing discovery and analysis” for businesses, educational institutions, healthcare, and whatever else. It is the amalgam of information out there — pretty much all of it, from Twitter feeds to image-based metrics that track consumer behaviors online — into, for lack of a better word, a single-source. Or at least a conglomeration of all this data into something businesses and institutions can use without having to sift through individual websites and databanks.
Kind of. You see, the thing about big data is, it’s not that easily defined. And it’s in that phase the Internet was in back in the early 1990s, when people referred to it as the information superhighway and the potential seemed limitless.
What we do know is this: The potential for big data seems limitless, and, according to Manish Parashar, the director of the Rutgers Discovery Informatics Institute in New Brunswick (dubbed RDI2), multi-layered. The logistics of implementing big data among New Jersey businesses and universities alone are staggering — which is evidenced in the fact that a lot of very smart people are still scratching their heads over exactly what needs to be done.
However, what we also know is this: Earlier this month the state senate unanimously passed a bill requiring the state’s major universities to collaborate in a “Big Data Alliance” to create an advanced cyber infrastructure plan. The bill, sponsored by senators Bob Smith and Raymond Lesniak, aims to develop the universities’ networks to support collaboration in science, medicine, and mathematics research and knocking down the walls that keep information obscured within each institution.
The schools — Rutgers, Princeton, NJIT, Rowan, Richard Stockton College, Stevens Institute of Technology, and the University of Medicine and Dentistry — are major research centers that collect the kind of information that other schools and businesses could use for analytic purposes, technological advances, and a better understanding of how to reach customers and clients.
The thing about this information is that it is exceedingly valuable. As in, roughly $300 billion, which is why the state EDA and Office of Information Technology are banding together with these schools to develop the infrastructure and support for releasing the big data genie. In the final picture, says Parashar, the New Jersey Big Data Alliance will be a shared storehouse of data, education, and research between schools and businesses.
Rutgers has the lead on figuring out exactly how this singularity will be achieved, and Parashar is the main person at the university charged with seeing that all the moving parts come together as they should. Born in India to parents who were both medical doctors, Parashar earned his bachelor’s in electronics and telecommunications from Bombay University before moving to the United States. When he got here, the already computer-savvy Parashar studied computers formally for the first time. He earned his master’s and Ph.D. in computer engineering from Syracuse, the last in 1994. His work in computers, in the early days of the Worldwide Web, focused on the movement and use of large data.
Parashar came to Rutgers in 1997, where he serves as a professor of electrical and computer engineering. He is also the founding director of RDI2, the NSF Cloud and Autonomic Computing Center, and the Applied Software Systems Laboratory at Rutgers. He recently served as program director in the Office of Cyberinfrastructure at the National Science Foundation, where he managed a $150 million research portfolio in software sustainability, computational and data-enabled science and engineering, and cloud computing.
Part of the Big Data Alliance is providing the infrastructure that allows the transmission of data between the parties in the network. “I’m not really interested in the data, I’m interested more in having a collection of data and how it’s used,” Parashar says. “This could go in many directions,”
Bandwidth. All that information needs to travel fast, meaning that the bandwidth needs to increase for big data transmission to reach its real potential. This, Parashar says, needs to be done at the state level.
A bill in the state Senate would require coordination between the RDI2, the Office of Information Technology, NJEDge.net, and the New Jersey Big Data Alliance to create an advanced cyberinfrastructure plan. The bill, which unanimously passed in the Assembly, will likely pave the way for expanded bandwidth to accommodate the data flow.
Hardware and data sources. Apart from the computers themselves, there has to be an actual storehouse infrastructure for all the data collected. Rutgers, says Parashar, has the largest amount of data in the state, and the quite-simple necessity of being able to get to it is a major component.
The human element. Parashar admits that teaching people how to use big data is a thorny, but necessary component to the overall alliance infrastructure. Consider right now, how much information is out there on the Internet? You know it’s there. But do you know where it is? How to get to it?
Part of the training of the big data future, Parashar says, is teaching people how to think in ways that will lead them to this information. It also must teach them how to think in new ways to use it. “Once the data exists, how can people interpret it?” Parashar asks. “That’s where universities can add a lot of value.”
How to achieve all this is still being considered, but the end result has the potential for immense breakthroughs, Parashar says. So long as it’s done wisely. “There’s a tremendous amount of data,” he says. “The trouble is, it’s not all good data. What’s the signal and what’s the noise?”
Whatever it is, it’s coming.