Caption: Ang Chen, left, and Eugene Ng. (Credit: Jeff Fitlow/Rice University)
Rice University scientists have been awarded a National Science Foundation grant to develop distributed programming methods to analyze streaming data.Â
According to Chen, principal investigator on the project, it means the switches, routers and other components that stand between end users and data servers can play a more active part in managing and analyzingÂ big data. It could make data networks faster and more efficient, which would be a boon for financial services, social networks, the â€śinternet of thingsâ€ť and many other applications.
The researchers said the range of programmable elements in data networks has expanded to include not only servers but also interface components, field-programmable gate arrays, application-specific integrated circuits and network topology. â€śToday, all the processing is done at the server, without any processing or computation along the path. Weâ€™re going to try to change that,â€ť said Ng,a professor of computer science and electrical and computer engineering.
â€śOur vision is to optimize all of these components to achieve a sweet spot in the design space for each application,â€ť said Chen, an assistant professor of computer science and of electrical and computer engineering, who joined Rice in 2017. â€śWe hope to have an approach that can work across different kinds of protocols.â€ť
Ng said common examples of streaming data also include fraud detection, monitors and temperature and other environmental sensors that continuously generate data and send it at high speed to servers from all over the world. â€śOur challenge is to develop a scalable platform that allows programmers to derive real-time insight from data utilizing the technologies we propose,â€ť he said.
One likely strategy is to intelligently process and reduce data before it reaches servers, Ng said. That could be accomplished by programming components along the path to handle as much computation as theyâ€™re able. â€śThat can allow server clusters to pull down more data, because youâ€™re not just moving data for the sake of moving it. Youâ€™re processing it and potentially generating a partial answer to your question.
â€śI think itâ€™s safe to say that there is vast untapped potential in using this emerging hardware for big data processing â€“ and the key word is â€™emerging,'â€ť Ng said. â€śItâ€™s new, so very few people have thought about what it can do.â€ť
The researchers also plan to study how data flows through networks so they can optimize it on the fly. â€śSometimes it matters which stuff you perform first,â€ť Chen said. â€śItâ€™s not just about where programming capabilities exist in the network but also about organization of the network itself.
â€śSo weâ€™re looking at how an underlying physical network can adapt itself and change the network flow to optimize latency,â€ť he said.
Mike Williams, senior media specialist, Rice University's office of Public Affairs