2012-2017 Special Focus on Information Sharing and Dynamic Data Analysis: Overview

If the 20th Century was dominated by rapid advances in technology, most notably the development and growth of computers, then the 21st will be dominated by the huge growth in data. The ability to instrument, monitor and collect data on every action within society provides huge potential to use this information to improve life: designing better systems (e.g. in healthcare), better managing interactions between large complex systems (e.g. in urban planning and traffic management), identifying subtle problems and bad outcomes (e.g. in homeland security), and more.

Yet while there have been great strides made in building and designing systems which have strong control over their data sources (think of large web-based applications, such as search engines and social networks), systems which need to span multiple, noisy data sources have shown less progress. Consider traffic management in a modern city: there are many complex inputs (historical travel patterns, current and predicted weather activity, maintenance, accident reports, and current observed traffic density). Various controls are possible (adjusting traffic signal timings, blocking or opening lanes, reversible flows etc.), but it remains a hard problem to ensure smooth flow of traffic at peak times. A key stumbling block is provided by the difficulties inherent in bringing together multiple diverse sources of information and using these to correctly draw conclusions and make decisions. Similar examples can be seen in other application areas: identifying disease spread from signals as broad as pharmacy purchases and web search trends; finding potential terrorist plots from a mixture of open and classified sources.

The goal of the Special Focus on Information Sharing and Dynamic Data Analysis is to address the technical problems at the heart of these data challenges. No simple solution or single algorithm can revolutionize this area. Instead, there needs to be protracted effort to understand, model and make progress on these fundamental issues. However, there many embedded technical problems that need to be solved: this is a matter of science, as well as engineering. Our approach is to highlight the key technical problems that need to be solved in this area. This cannot be done in clinical isolation, but requires the participation of practitioners and data users, in addition to scientists and academics. The special focus aims to provide venues for these interactions to occur, and to stimulate further progress via meetings, reports, and identification of open problems. The focus will take place over four years, allowing time for some topics to be visited multiple times as advances are made.

Cross-cutting themes throughout the Special Focus will be:

Data Preparation and Quality. Real world data collection systems must deal with substantial quality problems in the data: values are missing, or corrupted. The same object may appear within the data with multiple referents. Detecting and correcting such problems are challenging: there is the need to define (or learn) complex rules that define the expected behavior, and use these to identify errors without erasing genuine outlier behavior. This is compounded when bringing together data from multiple disparate sources (as in the traffic management example): sources might appear self-consistent, but be mutually inconsistent. It is then a challenge to reconcile these discrepancies.
Privacy and Security. When information is shared between different entities, there are inevitably concerns about the privacy of individuals in the data, and the security of the information against attackers. Recent years have seen renewed interest in these topics from both the research 4 community and the broader legislative/public policy domain, but these two efforts have only begun to meet. In particular, while there has been much technical advance in methods to provide privacy for data sharing and release, this has not been matched in practice or regulation: law and policy has so far adhered to simplistic definitions of privacy (such as in HIPAA), but data releases which meet these requirements, such as the AOL and Netflix releases, have nevertheless failed to provide the protection needed. By running this SF concurrently with the Cybersecurity SF, we will be able to bring together people with broad range of expertise including security, privacy, public policy, and a number of practical domains in order to lay out and advance the research agenda in data privacy and security for the next decade.
Continual and Distributed Processing. The historic model of data sharing is of receiving a data set from another party. But rather than data sets, the current world works on data feeds: streams of data arriving continually. This brings new challenges in terms of not just how to combine multiple such feeds, but how to do so efficiently: when streams arrive at high rates, there are additional costs in terms of network bandwidth, local storage, and processing power to manage. These push towards distributed computation in place of centralizing data, which leads to new models of parallel computation: cloud computation, and paradigms such as MapReduce and Hadoop.
Fusion and Inference. Current techniques for data fusion and inference over data from multiple sources are still somewhat crude. Learning techniques are being developed to identify optimal ways to collect information in uncertain environments and settings. It is possible to draw on existing and developing techniques, such as Bayesian Hierarchical modeling, Kalman filters and so on. But still there is much potential for new results on combining heterogeneous data from disparate sources to draw conclusions and make decisions.

Opportunities to Participate:

Workshops: A variety of workshops and mini-workshops are being planned
Working Groups: Interdisciplinary ``working groups'' will explore special focus research topics.
Implementation Challenges: Implementation challenges focus on empirical aspects of algorithm design and evaluation.
Tutorial: A tutorial will provide background knowledge to those who wish to participate in the special focus or just get an introduction to some of the fundamental issues in the field.
Seminar Series: There will be a mix of research talks and practitioner presentations.
Visitor Programs: Applications for research and graduate student visits to the center are invited. Some funds may be available for travel and local support.
Graduate Student Support: Funds will be set aside for graduate students interested in attending workshops. Students interested in visiting DIMACS during the special focus are encouraged to apply to the special focus organizers.
Publications: We anticipate that a variety of publications, including AMS-DIMACS volumes, technical reports, abstracts and notes on the WWW, and DIMACS modules will result from the special focus.

Index of Special Focus on Information Sharing and Dynamic Data Analysis:

DIMACS Homepage

Contacting the Center
Document last modified on September 9, 2017.