The Applied Statistics Workshop 2024

担当教員:(前期)大森裕浩 (Yasuhiro Omori)・下津克己 (Katsumi Shimotsu)、(後期)入江 薫 (Kaoru Irie)・奥井亮 (Ryo Okui)・久保川達也 (Tatsuya Kubokawa)

以下本年度終了分

【臨時】
日時

(臨時ワークショップ)

2024年4月12日(金 Friday) 16:50-18:35

場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Cun-Hui Zhang (Rutgers University)
"Tensor PCA in High Dimensional CP Models"
Abstract The CP decomposition for high dimensional non-orthogonal spike tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. We propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by others CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods. This talk is based on joint work with Yuefeng Han.
日時
2024年4月19日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
姫野哲人 (滋賀大学)
Tetsuto Himeno (Shiga University)

近年の高次元データ解析の動向について
Abstract 近年、ビッグデータ(サンプルサイズも変数の数も膨大なデータ)の活用の重要性が高まる中、サンプルサイズが十分ではないが変数の数が膨大となる高次元データの扱いも重要となってきている。このような高次元データに従来からの分析手法を適用しようとすると、各種近似の精度が不十分となったり、分散共分散行列の推定量が特異になるために統計量自体が定義できなくなったりするなどの問題が発生する。そのため高次元データ解析では、いくつかの仮定を設けながら従来手法のバイアスを補正して高次元でも使えるようにしたり、または新たな手法を開発したりして、変数選択などによる情報のロスがない形での分析手法が提案されてきている。本報告ではこれまでの高次元データ解析を概観しつつ、報告者の最近の研究であるGMANOVAモデルにおける仮説検定や高次元回帰分析について紹介する。

日時
2024年5月24日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Eric Yanchenko (Akita International University(国際教養大学))
Core-periphery hypothesis testing in networks
Abstract
Core-periphery (CP) structure is an important network feature where nodes are grouped into a densely-connected core and a sparsely-connected periphery. While this structure has been observed in numerous real-world networks, there has been minimal statistical formalization of it. In this work, we investigate this feature through a statistical lens, and in particular, provide a statistical hypothesis test for its significance. Adopting the popular Borgatti and Everett (BE) (2000) metric to quantify the strength of CP structure, we rigorously study this metric, deriving the model parameter it estimates, and show it fits within the standard modularity framework. We then propose a computationally efficient algorithm to identify the feature and theoretically prove that it consistently detects the true core as the number of nodes goes to infinity. To our knowledge, this is the first result to prove the consistency of the BE metric. Additionally, we develop two theoretically rigorous asymptotic hypothesis tests for CP structure, for both the Erdos-Renyi and Chung-Lu null models. The proposed method shows excellent performance on synthetic data, and our applications show that statistically significant CP structure is somewhat rare in real-world networks.
日時
2024年5月31日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Wenxin Zhou (University of Illinois Chicago)
Expected Shortfall Regression: A Unified Two-step Framework
Abstract

Expected Shortfall (ES), also known as superquantile or Conditional Value-at-Risk, has been recognized as an important measure in risk analysis and stochastic optimization. In finance, it refers to the conditional expected return of an asset given that the return is below some quantile of its distribution. In this talk, we consider a joint regression framework that simultaneously models the conditional quantile and ES of a response variable given a set of covariates, for which the state-of-the-art approach is based on minimizing a joint loss function that is non-differentiable and non-convex.

Motivated by the idea of using orthogonal scores to reduce sensitivity with respect to nuisance parameters, we study a unified two-step framework for fitting joint quantile and ES regression models under three settings: (i) linear models, (ii) sparse linear models in high dimensions, and (iii) nonparametric models in RKHS. We establish finite-sample properties for the proposed estimators along with their robust counterparts and propose different inference methods under various model structures. A new Python package, named quantes (https://pypi.org/project/quantes/), is developed to implement ES regressions.

日時
2024年6月7日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Hyunseok Jung (University of Arkansas)
Testing for Peer Effects without Specifying the Network Structure
Abstract
This paper proposes an Anderson-Rubin (AR) test for the presence of peer effects in panel data without the need to specify the network structure. The unrestricted model of our test is a linear panel data model of social interactions with dyad-specific peer effects. The proposed AR test evaluates if the peer effect coefficients are all zero. As the number of peer effect coefficients increases with the sample size, so does the number of instrumental variables (IVs) employed to test the restrictions under the null, rendering Bekker's many-IV environment. By extending existing many-IV asymptotic results to panel data, we establish the asymptotic validity of the proposed AR test. Our Monte Carlo simulations show the robustness and superior performance of the proposed test compared to some existing tests with misspecified networks. We provide two applications to demonstrate its empirical relevance.
日時
2024年6月21日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
今井晋 (一橋大学)
Susumu Imai (Hitotsubashi University)

Estimating Cost Functions in Differentiated Product Oligopoly Models without (Valid) Instruments (joint with Neelam Jain, Hiroto Suzuki and Miyuki Taniguchi)
Abstract
We propose a methodology for estimating cost and market share functions of differentiated products oligopoly model when both demand and cost data are available. The method deals with the endogeneity of prices to demand shocks and the endogeneity of outputs to cost shocks without any instruments by using cost data. In contrast to the indirect approach by Byrne et al. (2022) who recover the pseudo-cost function, and then, derive the cost function from it, we propose a method that directly estimates the cost function without the need for the semiparametric pseudo-cost function. We also propose a method to consistently estimate the coefficient of the observed product characteristic in the market share function when researchers do not know the validity of the instruments. We illustrate our methodology with Cobb-Douglas technology and logit demand structure, assuming a multiplicative cost shock. We also conduct Monte Carlo experiments and show that our method works well even when the conventional instruments are invalid.
日時
2024年7月12日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Soonwoo Kwon (Brown University)
Testing Mechanisms (joint with Jon Roth)
Abstract
Economists are often interested in the mechanisms by which a particular treatment affects an outcome. This paper develops tests for the ``sharp null of full mediation'' that the treatment D operates on the outcome Y only through a particular conjectured mechanism (or set of mechanisms) M. A key observation is that if D is randomly assigned and has a monotone effect on M, then D is a valid instrumental variable for the local average treatment effect (LATE) of M on Y. Existing tools for testing the validity of the LATE assumptions can thus be used to test the sharp null of full mediation when M and D are binary. We develop a more general framework that allows one to test whether the effect of D on Y is fully explained by a potentially multi-valued and multi-dimensional set of mechanisms M, allowing for relaxations of the monotonicity assumption. We further provide methods for lower-bounding the size of the alternative mechanisms when the sharp null is rejected. An advantage of our approach relative to existing tools for mediation analysis is that it does not require stringent assumptions about how M is assigned; on the other hand, our approach helps to answer different questions than traditional mediation analysis by focusing on the sharp null rather than estimating average direct and indirect effects. We illustrate the usefulness of the testable implications in two empirical applications.
【臨時】
日時
2024年8月23日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

 

報告
Víctor Peña (Universitat Politècnica de Catalunya (UPC))
Differentially private methods for managing model uncertainty in linear regression
Abstract
In this article, we propose differentially private methods for hypothesis testing, model averaging, and model selection for normal linear models. We propose Bayesian methods based on mixtures of g-priors and non-Bayesian methods based on likelihood-ratio statistics and information criteria. The procedures are asymptotically consistent and straightforward to implement with existing software. We focus on practical issues such as adjusting critical values so that hypothesis tests have adequate type I error rates and quantifying the uncertainty introduced by the privacy-ensuring mechanisms.
日時
2024年10月4日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
Johannes Lederer(University of Hamburg)
"Sparsity in Data Science: Selected Trends"
Abstract Sparsity can avoid overfitting, speed up computations, and facilitate interpretations. This presentation recaps sparsity in the framework of “classical’’ high-dimensional statistics. It then introduces corresponding notions in modern data-science frameworks, such as deep learning and high-dimensional extremes. Along the way, we discuss different perspectives on data science and establish connections between these perspectives.
日時
2024年11月1日(金 Friday) 16:50-18:35 *Presentation in English, Q&A acceptable in English and Japanese
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
山本鉄平(早稲田大学)
Teppei Yamamoto (Waseda University)
"Using Covariates to Improve Inference in the Preference-Incorporating Choice and Assignment (PICA) Design for Randomized Controlled Trials"
Abstract A key challenge in randomized controlled trials (RCTs) is to ensure external validity so that findings from a study can inform real-world policy decisions, where individual decision-makers may self-select into different treatments based on their own preferences about the treatment options. If the effects of treatments depend on subjects' treatment preferences, the average treatment effects (ATEs) estimated in a standard RCT will be biased for the conditional ATEs among those who actually prefer to take the treatment. Knox et al. (2019) proposed a new experimental design, later coined the preference-incorporating choice and assignment (PICA) design (de Benedictis-Kessner et al., 2019), which employs double randomization to estimate the ATE conditional on treatment choice. In this paper, we extend the PICA design to incorporate subjects' pre-treatment characteristics which might confound effect heterogeneity even after conditioning on their stated preferences. This extension not only relaxes the key identification assumption in the original design to address possible bias but also potentially improves precision in the estimates. After establishing nonparametric identification results, we propose both frequentist and Bayesian approaches for inference and study their finite-sample performance via Monte Carlo simulations. We illustrate the proposed method with empirical application to media exposure experiments.
日時
2024年11月15日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
Yu-Chang Chen (National Taiwan University)
"Post Empirical Bayes Regression"
Abstract Empirical Bayes (EB) methods are widely utilized in economics for estimating individual and group-level fixed effects across diverse contexts, including teacher value-added, hospital qualities, and neighborhood effects. While estimates generated by EB are often incorporated into downstream statistical analyses like regression models, the econometric properties of the two-step procedure have not been justified. This paper addresses this issue through two key contributions. First, we introduce a unified framework for post EB regression that applies to both linear and non-linear models, offering frequentist properties and assessing their robustness against model misspecification. Second, we undertake a critical evaluation of the commonly used two-step EB methods in existing empirical research. Our analysis demonstrates that existing post-EB regression implementation, without proper adjustments, can introduce systematic bias, particularly in non-linear models.
日時
2024年11月22日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
菅澤翔之助(慶應義塾大学)
Shonosuke Sugasawa(Keio University)
「空間的インド料理店過程と地理空間データ解析への応用」
Abstract インド料理店過程 (IBP; Indian Buffet Process) はデータの背後に潜む因子構造を抽出するのに効果的な方法であるが、従来のIBPでは空間的な従属構造を取り入れることができない。本研究では、ガウス過程を用いてIBPに空間情報を導入した空間的インド料理店過程 (SIBP; Spatially-dependent IBP) を開発し、 理論的な性質およびMCMCによる推定アルゴリズムを与える。さらに、SIBPを用いて地理空間データから効果的に因子情報を抽出するモデルを提案し、方言データや植生分布データへの適用例を紹介する。また、関連する研究成果についても紹介する。
日時
2024年11月29日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
Jung Hyub Lee (東京大学CREPE)
Jung Hyub Lee (CREPE, University of Tokyo)
"Causal Mediation Analysis in a Generalized Regression Model"
Abstract We consider a unifying framework to test for direct and indirect treatment effects in nonlinear models. Specifically, we extend a generalized linear-index model to incorporate endogenous treatments and endogenous mediators. We propose kernel-weighted Kendall's tau statistics to test the significance of the direct and indirect effects of endogenous treatments on the outcome variable mediated by endogenous mediators. The proposed semiparametric model allows for treatments and mediators to be discrete, continuous, and/or censored/truncated. For the indirect effect, we construct two distinct kernel-weighted Kendall's tau statistics that capture the effect of (i) the treatment on the mediator and (ii) the mediator on the outcome. Applying the testing approach of van Garderen and van Giersbergen [2020] avoids the problem of under-sized testing of the joint null hypothesis associated with the indirect effect. Monte Carlo Simulations investigate the performance of the semiparametric testing approach.
日時
2024年12月6日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
Zhengfei Yu (筑波大学)
Zhengfei Yu (University of Tsukuba)
"Semiparametric Bayesian Difference-in-Differences"
Abstract We study semiparametric Bayesian methods for estimating the average treatment effect on the treated (ATT) within the difference-in-differences research design. We have two proposals. The first proposal places a Gaussian process prior on conditional mean functions of the control group, together with a Dirichlet process prior on the joint distribution of some transformed data. Our second proposal is a doubly robust Bayesian procedure that adjusts the prior distribution of the conditional mean functions of the control group and then corrects the posterior distribution of the resulting ATT. We prove the asymptotic equivalence of our Bayesian procedures and an efficient frequentist ATE estimator by establishing a semiparametric Bernstein-von Mises (BvM) theorem. For the double robust Bayesian procedure, the BvM result holds under double robust smoothness conditions; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa.
日時
2024年12月13日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
Qiyang Han (Rutgers University)
"Large Random General First-Order Methods: Mean-Field Theory and Statistical Applications"
Abstract General first-order methods (GFOMs), including various gradient descent variants and approximate message passing algorithms, constitute a broad class of iterative algorithms widely used in modern statistical learning problems. Some GFOMs also serve as constructive proof devices, iteratively characterizing the empirical distributions of statistical estimators in the asymptotic regime of large system limits for a fixed number of iterations.
This talk develops a non-asymptotic mean-field characterization of the dynamics of a general class of GFOMs. Our characterizations capture the precise stochastic behavior of each coordinate of the GFOM iterates and, more importantly, hold universally across a broad class of heterogeneous random matrix models. As a corollary, we provide the first non-asymptotic description of the empirical distributions of GFOM iterates beyond Gaussian ensembles.
We demonstrate the utility of these general results through two applications. In the first application, we characterize the mean-field behavior of gradient descent algorithms in a broad class of empirical risk minimization problems. Our theory also facilitates a generic iterative algorithm that consistently estimates key state evolution parameters, which can be used for statistical inference.
In the second application, we develop an algorithmic method for proving the universality of regularized regression estimators. Specifically, we systematically improve universality results for regularized regression estimators in the linear model and resolve the universality conjecture for (regularized) maximum likelihood estimators in the logistic regression model.
日時
2024年12月20日(金 Friday) 16:50-18:35
場所
東京大学大学院経済学研究科 学術交流棟 (小島ホール)1階 第1セミナー室
in Seminar Room 1 on the 1st floor of the Economics Research Annex (Kojima Hall) [MAP]

※ 対面のみでの開催となります。東京大学外の方で参加をご希望の場合は、CIRJE (cirje[at mark]e.u-tokyo.ac.jp) までご連絡下さい。

報告
白糸裕輝(University of Michigan)
Yuki Shiraito (University of Michigan)
"A Unified Model of Text and Citations for Topic-Specific Citation Networks: Application to the Supreme Court of the United States"
Abstract Social scientists analyze citation networks to study how documents influence subsequent works across various domains, including judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), a Bayesian framework that jointly analyzes citation networks and document texts. The PCTM extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs. Our empirical analysis of U.S. Supreme Court opinions in the privacy issue domain demonstrates that citations within individual documents frequently span multiple substantive areas, and citations to individual documents show considerable topical diversity.

 

[ English Top ]