Workshop on Algorithms for Large Data (Online) 2021

This workshop aims to foster collaborations between researchers across multiple disciplines through a set of central questions and techniques for algorithm design for large data. We will focus on topics such as sublinear algorithms, randomized numerical linear algebra, streaming and sketching, and learning and testing.

What	Workshop on Algorithms for Large Data
When	Monday, August 23 - Wednesday, August 25, 2021
Where	The workshop will be held virtually.

Organizers

Ainesh Bakshi (Carnegie Mellon)
Rajesh Jayaram (Google Research NYC)
Samson Zhou (Carnegie Mellon)

Registration

The workshop is now over. Thanks for your interest in attending WALDO21!

Speakers

Sepehr Assadi (Rutgers)
Clément Canonne (Sydney)
Petros Drineas (Purdue)
Talya Eden (MIT)
Alina Ene (Boston University)
Sumegha Garg (Harvard)
Anna C. Gilbert (Yale)
Robert Krauthgamer (Weizmann)
Rasmus Kyng (ETH)
Jerry Li (MSR)
Sepideh Mahabadi (TTIC)
Cameron Musco (UMass Amherst)
Christopher Musco (NYU)
Jelani Nelson (Berkeley)
Richard Peng (Georgia Tech)
Eric Price (UT Austin)
Sofya Raskhodnikova (Boston University)
Dana Ron (Tel Aviv)
Madhu Sudan (Harvard)
Santosh Vempala (Georgia Tech)
Erik Waingarten (Stanford)
David Wajc (Stanford)
David P. Woodruff (Carnegie Mellon)
Qin Zhang (Indiana)

Schedule

12:00 pm ET

Opening Remarks

12:05 pm ET

Robert Krauthgamer: Streaming Algorithms for Geometric Steiner Forest [Abstract]

12:30 pm ET

Sepehr Assadi: Multi-Pass Graph Streaming Lower Bounds for Parameter Estimation and Property Testing Problems [Abstract]

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

Jelani Nelson: Optimal Bounds for Approximate Counting [Abstract]

13:35 pm ET

Sumegha Garg: The Coin Problem with Applications to Data Streams [Abstract]

14:00 pm ET

Junior-Senior Lunch (50 mins)

15:00 pm ET

Madhu Sudan: Streaming Complexity of Constraint Satisfaction Problems [Abstract]

15:25 pm ET

Alina Ene: Adaptive Gradient Descent Methods for Constrained Optimization [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Eric Price: Simulating Random Walks in Random Streams [Abstract]

16:30 pm ET

David Wajc: Streaming Submodular Matching Meets the Primal-Dual Method [Abstract]

12:00 pm ET

Opening Remarks

12:05 pm ET

Rasmus Kyng: Hardness Results for Structured Linear Equations and Programs [Abstract]

12:30 pm ET

Sofya Raskhodnikova: Isoperimetric Inequalities for the Hypercube with Applications to Monotonicity Testing [Abstract]

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

Anna C. Gilbert: Private Inverse Problems and Source Localization [Abstract]

13:35 pm ET

Richard Peng: Fully Dynamic Effective Resistance [Abstract]

14:00 pm ET

Poster Session (50 mins)

15:00 pm ET

Santosh Vempala: Robustly Learning Mixtures of Arbitrary Gaussians in Polynomial Time [Abstract]

15:25 pm ET

David P. Woodruff: Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Petros Drineas: Randomized Linear Algebra for Interior Point Methods [Abstract]

16:30 pm ET

Clément Canonne: The Price of Tolerance in Distribution Testing [Abstract]

12:00 pm ET

Opening Remarks

12:05 pm ET

Dana Ron: Optimal Distribution-Free Sample-Based Testing of Subsequence-Freeness [Abstract]

In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w_1 … w_k, a sequence (text) T = t_1 \dots t_n is said to contain w if there exist indices 1 <= i_1 < … < i_k <= n such that t_{i_{j}} = w_j for every 1 <= j <= k. Otherwise, T is w-free.

While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the ``standard'' sample-based model (i.e., under the uniform distribution), the algorithm is given samples (i,t_i) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w-free, and the case that T is \epsilon-far from being w-free (i.e., more than an \epsilon-fraction of its symbols should be modified so as to make it w-free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O(k^2\log k/\epsilon) samples suffice for this testing task.

We obtain the following results.

+ The number of samples sufficient for sample-based testing (under the uniform distribution) is O(k/\epsilon). This upper bound builds on a characterization that we present for the distance of a text T from w-freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions.

+ We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight.

+ The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples (i,t_i) where $i$ is distributed according to an arbitrary distribution p (and the distance from w-freeness is measured with respect to p).

We highlight the fact that while we require that the testing algorithm work for every distribution and when only provided with samples, the complexity we get matches a known lower bound for a special case of the seemingly easier problem of testing subsequence-freeness under the uniform distribution and with queries by Canonne et al (Theory of Computing, 2019).

This is joint work with Asaf Rosin.

12:30 pm ET

Cameron Musco: Hutch++: Optimal Stochastic Trace Estimation [Abstract]

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

Talya Eden: Approximating the Arboricity in Sublinear Time [Abstract]

13:35 pm ET

Jerry Li: Fast and Near-Optimal Diagonal Preconditioning [Abstract]

14:00 pm ET

Open Problem Session (50 mins)

15:00 pm ET

Sepideh Mahabadi: Two-sided Kirszbraun Theorem [Abstract]

15:25 pm ET

Christopher Musco: Linear and Sublinear Time Spectral Density Estimation [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Qin Zhang: Collaborative Learning with Communication Constraints [Abstract]

16:30 pm ET

Erik Waingarten: Sketching Geometric MST and EMD [Abstract]

Posters

Posters
Tuesday, August 24th, 2-2:30 pm ET:

Mitali Bafna (Harvard): Optimal Fine-grained Hardness of Approximation of Linear Equations
Sabyasachi Basu (UC Santa Cruz): The Complexity of Testing All Properties of Planar Graphs, and the Role of Isomorphism
Alex Block (Purdue): Private and Resource-Bounded Locally Decodable Codes for Insertions and Deletions
Justin Chen (MIT): Worst-Case Analysis for Randomly Collected Data
Tyler Chen (Washington): Simple Algorithms for Spectral Sum and Spectrum Approximation
Zhili Feng (Carnegie Mellon): Dimensionality Reduction for the Sum-of-Distances Metric
Mehrdad Ghadiri (Georgia Tech): Fast Low-Rank Tensor Decomposition by Ridge Leverage Score Sampling
Praneeth Kacham (Carnegie Mellon): Reduced Rank Regression with Operator Norm Error
Iden Kalemaj (Boston University): Sublinear-Time Computation in the Presence of an Online Adversary
John Kallaugher (UT Austin): Separations and Equivalences between Turnstile Streaming and Linear Sketching
Young-San Lin (Purdue): Online Directed Spanners and Steiner Forests
Arvind Mahankali (Carnegie Mellon): Streaming and Distributed Algorithms for Robust Column Subset Selection
Raphael Meyer (NYU): Hutch++: Optimal Stochastic Trace Estimation
Tamalika Mukherjee (Purdue): Differentially Private Sublinear-Time Clustering
Shyam Narayanan (MIT): Learning-Based Support Estimation in Sublinear Time

Tuesday, August 24th, 2:30-3 pm ET:

Aditya and Advait Parulekar (UT Austin): L1 Regression with Lewis Weights Subsampling
Edward (Ted) Pyne (Harvard): Local Access to Random Walks
Archan Ray (UMass Amherst): Estimating Eigenvalues of Symmetric Indefinite Matrices
Raghuvansh Saxena (Princeton): Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs
Ayush Sekhari (Cornell): SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Supratim Shit (Technion): Nonparametric Coreset for Clustering
Sandeep Silwal (MIT): Adversarial Robustness of Streaming Algorithms through Importance Sampling
Kevin Tian (Stanford): Recent Advances in List-Decodable Mean Estimation
Arsen Vasilyan (MIT): On Learning Monotone Probability Distributions over the Boolean Cube
Santhoshini Velusamy (Harvard): How Well Can We Approximate CSPs in Streaming Settings?
Chen Wang (Rutgers): Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
Pruthuvi Maheshakya Wijewardena (Utah): Additive Error Guarantees for Weighted Low Rank Approximation
Taisuke Yasuda (Carnegie Mellon): Exponentially Improved Dimension Reduction in L1
Fred Zhang (Berkeley): Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
Lichen Zhang (Carnegie Mellon): Fast Sketching of Polynomial Kernels of Polynomial Degree

Support

WALD(O)'21 is generously supported by the LEarning and Algorithms for People and Systems group and the National Science Foundation. Web design by Pedro Paredes.