[FM-India] IARCS Verification Seminar Series -- Talk by Rohan Bavishi on January 4, 1900 hrs IST

Tue Jan 4 09:43:58 IST 2022

Dear all,

The next talk in the IARCS Verification Seminar Series will be given by
Rohan Bavishi, a PhD candidate at UC Berkeley. The talk is scheduled on
Tuesday, January 4, at 1900 hrs IST. The details can be found on the
webpage (https://fmindia.cmi.ac.in/vss/), and also appended to the body of
this email.

The Verification Seminar Series, an initiative by the Indian Association
for Research in Computing Science (IARCS), is a monthly, online
talk-series, broadly in the area of Formal Methods and Programming
Languages, with applications in Verification and Synthesis. The aim of this
talk-series is to provide a platform for Formal Methods researchers to
interact regularly. In addition, we hope that it will make it easier for
researchers to explore newer problems/areas and collaborate on them, and
for younger researchers to start working in these areas.

All are welcome to join.

Best regards,
Deepak, Madhukar, Rahul, Srivathsan

=============================================================

Title: Program Synthesis for Data Science

Meeting Link:
https://us02web.zoom.us/j/89164094870?pwd=eUFNRWp0bHYxRVpwVVNoVUdHU0djQT09
(Meeting ID: 891 6409 4870, Passcode: 082194)

Abstract: The Python ecosystem has emerged as a very popular platform for
data science, as it offers a number of powerful tools, in the form of
libraries and APIs, for accessing data, data processing, modeling,
visualization and machine learning. However, these tools present a steep
learning curve for data scientists and developers. Their APIs contain
hundreds of functions, have dense documentation, and often lack sufficient
examples. This presents a huge opportunity for reducing the barrier to
entry and improving the productivity of data scientists by synthesizing
code using these APIs from high-level, easy-to-provide user intent
specifications.

In this talk, I will outline three systems - AutoPandas, Gauss, and
VizSmith that we developed for synthesizing code for two core data science
operations, namely table transformations and data visualization. All three
represent different decision points along the three main controllable
dimensions of synthesis - specification modality or user intent format,
search space, and search algorithm, informed by the domain at hand.
AutoPandas accepts I/O tables from the user, and synthesizes Pandas-based
table transformation code. It combines the idea of generators from the
program testing community with graph neural networks to speed up
enumerative search. Gauss addresses the pain-points of using input-output
examples for table transformations from a user's perspective. It uses a new
UI-based interaction mechanism that captures more information than plain
I/O tables, but requires less effort from the user. It also employs novel
graph-based reasoning algorithms to speed up search performance by 10x on
average, as compared to I/O example-based SoTA systems. Synthesizing
visualizations presents a significantly different challenge. It is
difficult to precisely capture user intent as machine-checkable specs, like
I/O examples. VizSmith accepts keyword-oriented natural language queries
along with the data columns to visualize, and synthesizes multiple
visualizations which users can easily browse and pick from. It employs a
novel code-mining and program analysis based approach to crowdsource its
search space of visualizations. The three systems have been published at
OOPSLA '19, '21 and ASE '21 respectively.

Lastly, I will discuss some of our ongoing work in this space. I will also
outline some discussion points on the future of synthesis for such domains,
especially in the context of large language models such as Codex and GPT-3,
which have demonstrated impressive synthesis capabilities.