Program | Hikone Data Science 2020
13-14, November, 2020
This symposium will run as a virtual symposium
|JST: UTC+9:00||Session A|
|chair: Chisako Muramatsu|
"Accelerating AI Research – An Introduction to The Latest Technology from NVIDIA for Data Scientist"
Deep learning, which started with basic classification and recognition problems such as image and speech recognition, is expanding every year. Its applications range from self-driving cars and diagnostic aids to large-scale simulations. On the other hand, as the scale and complexity of the problem settings become larger and more complex, more computing resources are needed. Under these circumstances, it is common for deep learning to utilize GPUs with high computational performance to accelerate learning. In addition, not only hardware but also the whole ecosystem including software is continuously improving its performance and usability to provide various environments. In this talk, I will introduce the latest GPU technologies for deep learning, as well as software and tools from NVIDIA to accelerate research and development for data scientists.
"Causality diagnosis and Its applications in Industry"
Causal inference is a technique that tries to discover and quantify causal relations from historical observational and experimental data. With causal relations organized in a directed network, it catches an insight of the data generation mechanism of a system. Further with an exact knowledge of causal effects, it can detect the root/key causes leading things to happen as well as can suggest optimal actions to direct things to go. Causality has great potentials in marketing & operation, new sales, advertising, manufacture, medical care, finance, and etc. Let us see what are the chances and challenges of causality in industry.
|chair: Xiaokang Zhou|
"Data Science Applications in Energy Efficiency for Smart Building Design"
Artificial intelligence (AI) integrated data-driven energy efficiency solutions are highly demanded and among the most important topics in the related fields, such as smart building/city design, applied energy applications, electrical and electronic engineering, automation and constructions. In this talk, three energy efficiency solutions for smart building design will be introduced, including: 1) data-driven fault detection and diagnosis (FDD) of heating, ventilation and air-conditioning (HVAC) systems, 2) energy consumption forecasting problem for individual households and 3) solar energy utilization. First, the most up-to-date energy efficiency problems will be introduced. Second, the recently developed techniques will be described, which include semi-supervised learning, generative adversarial networks, long-short term memory and hybrid deep learning neural networks. Last, the trends of using machine learning technology in the field of building are summarized.
"Randomized learning algorithms under heavy-tailed feedback"
The PAC-Bayesian learning framework gives us a flexible toolbox for analyzing randomized learning algorithms and deriving new procedures which effectively balance under-fitting and over-fitting, regularized by a lucid form of prior knowledge. Unfortunately, this framework is limited to machine learning settings in which the loss distribution is known to be essentially Gaussian, which excludes many important real-world settings known to be heavy-tailed in nature. In this work, we make a first attempt at extending the scope of PAC-Bayes to heavy-tailed losses, keeping statistical guarantees tight and computational overhead minimal.
|JST: UTC+9:00||Session B|
|chair: Shohei Shimizu|
"Discovering Temporal Causal Relations from Low-Resolution Data"
Granger causal analysis has been an important tool for causal analysis for time series in various fields, including neuroscience and economics. However, as the time resolution becomes lower than the causal resolution, the original Granger causality cannot recover the underlying causal relations. In this talk, I will present our work that aims to answer the following question: can we estimate the temporal causal relations at the right causal frequency from the low-resolution data? Traditionally this suffers from the identifiability problems: under the Gaussianity assumption of the data, the solutions are generally not unique. We prove that, however, if the noise terms are non-Gaussian, the underlying model for the high-frequency data is identifiable from low-resolution data under mild conditions. We then present estimation algorithms based on non-Gaussian latent variable modelling to discover causal relations from both low-resolution data.
"Recent results on estimating feature relevance and indirect causal contributions"
This talk is about some of our recent contributions on explainable AI using Shapley values to quantify feature relevance and indirect causal influences in a directed acyclic graph (DAG). By clarifying the difference between interventional and observational distributions – which is the reason for most misconceptions between statistical and casual reasoning – we derive the right notion of feature relevance via Shapley values and point out a common misconception in the literature. In a related work, we further utilize the concept of Shapley symmetrization to develop a general approach for estimating indirect causal influences via structure preserving interventions. This allows a quantification of indirect contributions of upstream variables to a target node in a causal DAG.
|chair: Ryo Nishide|
"Neural Graph Processing: an embedding-based approach"
Many fields, like physics, neuroscience, chemistry, and sociology, investigate phenomena by processing multivariate measurements advantageously represented as a sequence of attributed graphs. Graphs come in different forms, with variable attributes, topology, and ordering, making it difficult to perform a mathematical analysis in the graph space. Within this framework, we are interested in processing graph datastreams coming from sensor networks to solve applications e.g., detect time variance, anomalies or events of interest as well as design sophisticated processing like those requested by predictors. The talk will focus on neural graph processing as seen from an application independent perspective.
|JST: UTC+9:00||Session C|
|chair: Osamu Ichikawa|
"Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model"
We study the identification and estimation of statistical functionals of multivariate data missing non-monotonically and not-at-random, taking a semiparametric approach. Specifically, we assume that the missingness mechanism satisfies what has been previously called “no self-censoring” or “itemwise conditionally independent nonresponse,” which roughly corresponds to the assumption that no partially-observed variable directly determines its own missingness status. We show that this assumption, combined with an odds ratio parameterization of the joint density, enables identification of functionals of interest, and we establish the semiparametric efficiency bound for the nonparametric model satisfying this assumption. We propose a practical augmented inverse probability weighted estimator, and in the setting with a (possibly high-dimensional) always-observed subset of covariates, our proposed estimator enjoys a certain double-robustness property. We explore the performance of our estimator with simulation experiments and on a previously-studied data set of HIV-positive mothers in Botswana. This is joint work with Ilya Shpitser (Johns Hopkins University) and Eric J. Tchetgen Tchetgen (University of Pennsylvania).
"Causal Discovery with unobserved confounding and non-Gaussian data"
We consider data which arise from a linear structural equation model in which the idiosyncratic errors are allowed to be dependent in order to capture possible latent confounding. We show that under certain restrictions on the latent confounding and when the errors are non-Gaussian, the exact causal structure–not merely an equivalence class–can be consistently recovered from purely observational data when the graph corresponding to the SEM is bow-free and acyclic.
|chair: Akimichi Takemura|
"Cultural Color Keyword Analysis using Twitter Text Mining"
Colors continue to be an intricate part of societies and characterize cultural differences in the use of colors and color terms. Colors convey emotions, have semiotic capacity and are therefore used in product design, architectural design, advertisement, and others. The use of colors is also dependent on cultural background. Most research in this field has been done on small data sets or experimental design in humanities. With the use of large online data sets like Twitter, new insights can be found. Data was collected in three languages, processed and analyzed. The HDS2020 keynote will cover some practical and theoretical methods for text mining. Cultural implications and difficulties will be part of the discussion for this segment.
"Experience of the Development of International Data Science Program at TNI"
When talking about data science, people undoubtedly have seen the Venn diagram of data science, consisting of three components: computer science, statistics, and domain knowledge. It is also debatable whether these three are the correct representation or even what mixed combination of the three should be taught to produce graduates with skills required by various industries not only technology companies. Recognizing the rising demand for the workforce with analytics skills in Thailand, TNI offered a bachelor degree of information technology in data science and analytics. Unlike other undergraduate programs, our DSA program offers courses in the combination of those three components while focusing more on the IT side and the application of statistics due to the nature of our faculty and their expertise. Data mining, machine learning and introduction to quantitative analysis are examples of subject rooted in statistical theory. Python, R, and SQL are the top three programming classes in our program, providing necessary skills required in data science job listings. Competencies in foreign languages such as Japanese and English are a bonus and an advantage for our students. In this talk, we like to share the experience of starting a new, international undergraduate program of data science and analytics in Thailand.