home

about me

I am a data scientist with a background in computational quantum chemistry.

I enjoy machine learning, parallel computing, web programming, physics, chemistry, and all things data science.

development

brainsparks & calrissian

Coming soon. Code on GitHub.

Coming soon. Just working out some kinks.

parallel JavaScript math library

I once wrote a parallel JavaScript math and statistics library built around HTML5 Web Workers and the Node.js cluster library. The project is called MathWorkersJS.

The motivation behind MathWorkersJS is to introduce some degree of multi-core parallelism into JavaScript mathematical computations, whether they are client-side/in-browser or server-side. While other good JavaScript math libraries exist, none (as far as I know) seem to take advantage of the fairly new feature of HTML5 to spawn Web Workers to help with CPU intensive computations. There seems to be a similar void for parallelism in Node.js with these libraries, too. That's where MathWorkersJS comes in.

In some tests, I have been able to demonstrate 2-3x speedups with 4 workers for some common linear algebra operations, like matrix-matrix multiplication.

JavaScript, HTML5, Node.js

data network modeling with Cassandra

At Signal, I developed a back-end Java web app that interfaces with a large Cassandra database which contains billions of records related to clients' user data. Cassandra is usually conceptualized as one or more rings of nodes (drawn here in red), and each node is responsible for managing a portion of the full data stored on disk. We used multiple rings to support multi-region availability of our data.

I played a significant role in developing some of the data models as well as providing data analysis using Cassandra. For instance, the data arrives to the system from various sources (e.g., web, mobile, offline), and clients would ideally like a complete view of all these channels in real time. The solution I helped to create is a dynamic network of client user data (drawn here in blue and green) stored on Cassandra via the Java web app. This feature is the engine behind much of the new product development at Signal.

Java, Python, Cassandra, R

massively parallel computing

My primary focus as a postdoc was to improve the scalability of a code on the 3rd fastest supercomputer in the world (at the time), Mira, the Blue Gene/Q machine at Argonne National Laboratory Leadership Computing Facility.

I scaled up a specialized molecular dynamics code used for simulating proton transport. The original code scaled to only about 64 nodes, but I turned it into one that scales to 24,576 nodes (393,216 cores). To accomplish this, I built up a replica exchange external driver to the molecular dynamics code LAMMPS, exposed some untapped parallelism in the multi-state proton transport calculations, multithreaded several compute-intensive loops, and did plenty of other little code optimizations. I also managed to contribute a handful of lines of code to the open-source LAMMPS software package during this time.

C++, C, MPI, OpenMP

genetic algorithm load balancing

The molecule shown here is part of an implicit solvent calculation based on a parallel fast multipole method, which runs faster when atoms belonging to a given node are spatially close to each other and also far away from atoms belonging to other nodes. The fitness function for the genetic algorithm reflects that, and it also tends to ensure that no nodes have far more/less work than other nodes.

The molecule depicted is a single stranded DNA polyadenine, and each sphere represents an atom. Spheres are colored by which node is responsible for its computation. (There are 4 nodes in this example.) The run starts with a randomized distribution of the atoms across the 4 nodes, which is essentially the worst load balance. Remarkably, the genetic algorithm finds a solution that is very close to the ideal solution for this system.

C++, MPI, OpenMP

science

experimental neural networks

Coming soon. Just need to work out some kinks.

Coming soon. Just need to work out some kinks.

excited electrons in DNA

As part of my dissertation, I studied excited electrons in DNA using quantum chemistry simulations.

The red-blue blob here is a depiction of how the three dimensional wave of an excited electron (or, exciton) spreads out between two adenine molecules when it is excited by ultraviolet light. The red and blue indicates the phase of the electron wave. The adenine dimer is somewhat hidden by the orbital, but the dimer is in the stacking geometry found in biological DNA and surrounded by water, serving as a model to understand electronic excitation in solvated DNA.

solvent electrostatics modeling

This is an equation I derived to model continuum electrostatics of salty solvents characterized by a Debye screening length and a dielectric constant.

I called the equation the "Debye-Hückel-like Screening Model" (DESMO). It generalizes the classic Debye-Hückel model—a point charge centered in a spherical solute cavity surrounded by solvent continuum—to arbitrary charge densities and solute cavity shapes. DESMO also reduces to the more familiar "Conductor-like Screening Model" (COSMO) in the limit of zero salt concentration.

DESMO is a simple, fast approximation to much more complicated models.

reactive molecular dynamics

This is a schematic of the Fragment Molecular Orbital Multi-state Reactive Molecular Dynamics (FMO-MS-RMD) model. It's a clever idea I had to combine Fragment Molecular Orbital (FMO) theory with Prof. Greg Voth's Multi-state Empirical Valence Bond (MS-EVB) theory. FMO divides a molecular system into some set of smaller fragments, but such division can be ambiguous in chemically reactive systems. MS-EVB theory deals exactly with that issue by assuming a linear combination of reactive states, like what is shown in the schematic here.

Combining MS-EVB and FMO together results in a new powerful model, FMO-MS-RMD, which incorporates quantum electronic structure theory into the MS-EVB approach. Futhermore, because the states and the molecular fragments can be calculated independently, FMO-MS-RMD can be computed in parallel to achieve considerable speedups.

quantum chemistry software

I am a contributing author for Q‑Chem, a popular quantum chemistry software package used for carrying out molecular simulations. I was the main author of the polarizable continuum model code and also of the internal QM/MM code. I also implemented a handful of other unique features built into developer versions of Q-Chem, such as a parallel adaptive fast multipole method code and a Connolly surface generator.

résumé

employment

Senior Data Scientist June 2016 — Present
Sprout Social Chicago, IL

  • Research and development of data science and machine learning systems to understand customer usage and analyze social media documents/articles [Python, Java, Spark, Redshift]

Lead Data Scientist September 2016 — June 2016
GE Transportation Chicago, IL

  • Created descriptive/predictive analytics solutions for asset performance (e.g. fuel optimization) utilizing modern machine learning and big data technologies [Python, Spark, Hadoop, Zeppelin]
  • Analytics and data engineering pipeline for container ship manifests importing goods to the Port of Los Angeles [Java, PostgreSQL]

Big Data Engineer, iTunes Analytics May 2015 — August 2016
Apple, Inc. Cupertino, CA

  • Developed analytics infrastructure to generate insights into customer experiences on products such as the iTunes Store, App Store, Apple Music, and Apple TV [Java, Python, Splunk, Cassandra, Hadoop, JavaScript]
  • Utilized machine learning, statistics, and data mining to perform data analysis, segmentation, and hypothesis testing

Software Developer August 2013 — April 2015
Signal (formerly known as BrightTag) Chicago, IL

  • Developed data models, algorithms, and back-end services to build and analyze user profile networks for millions of users per day; stored in NoSQL database with billions of records (∼50 TB) [Java, Cassandra, Python, Spark, Kafka, R]
  • Created real-time anomaly detection and network traffic forecasting system using Fourier analysis capable of predicting regular traffic patterns for upcoming week with >90% accuracy [Java, Python, Storm]

Postdoctoral Appointee March 2012 — July 2013
Argonne National Laboratory Leadership Computing Facility Chicago, IL
University of Chicago

  • Optimized massively parallel physics/chemistry simulations on IBM Blue Gene/Q supercomputer (3 on Top500); increased simulation speed over 8x, scalability to ∼0.4 million CPU cores [C++, C, MPI, OpenMP, Python]
  • Invented novel quantum mechanical proton transport model based on fragment electronic structure theory; model fitting via statistical optimization techniques (simulated annealing, regression, swarm intelligence, etc.)

Ph.D. Student Researcher June 2007 — March 2012
The Ohio State University Columbus, OH

  • Published 10 first author journal articles (see publications); presented at 20+ professional and academic events
  • Researched quantum chemistry and statistical thermodynamics; mathematical theory, computation, and algorithms
  • Implemented theoretical physics/chemistry models into efficient code [C++, C, Fortran, MPI, OpenMP]

education

Ph.D. Computational/Physical Chemistry June 2007 — March 2012
The Ohio State University Columbus, OH
Dissertation: Multi-layer Methods for Quantum Chemistry in the Condensed Phase: Combining Density Functional Theory, Molecular Mechanics, and Continuum Solvation Models

B.S. Chemistry, minor in Microbiology August 2003 — June 2007
The Ohio State University Columbus, OH

Formal Courses:
    Quantum Mechanics, Statistical Thermodynamics, Computational Chemistry, Chemical Physics, Multivariable Calculus, Linear Algebra, Differential Equations, Computer Programming, Numerical Methods
Supplementary Online Courses:
  • Udacity: Web Development, Programming Languages, Parallel Programming (GPU/CUDA), Machine Learning, Artifical Intelligence
  • Coursera: Data Science Signature Track (R Programming, Statistics, Data Wrangling), Machine Learning, Algorithms, Databases, Neural Networks

technical skills

Category Proficiency in approximate descending order from left to right
Programming Languages Java, Python, JavaScript, C++, C, awk, Unix/Linux shell (bash), Scala
Web Technologies HTML, CSS/SCSS, Flask, Node.js, jQuery, Jinja, AJAX, web workers
Databases/Storage PostgreSQL, Redshift, Cassandra, MySQL, S3, Elasticsearch, Splunk, HDFS, Kafka, Redis
Data Analysis/Modeling pandas, numpy, scikit-learn, SciPy, Keras, Lasagne, R
Compute Tools Spark, Hadoop, MPI, OpenMP, blas/lapack
Productivity Tools git, Jupyter/IPython, vim, LaTeX, JIRA, svn
Software Engineering Test driven development, scalable architecture design, code review, agile dev
Machine Learning Techniques Linear/Logistic Regression, Neural Networks (MLP, autoencoder, convolution, recurrent, deep learning), Fourier Analysis, Clustering, k-NN, Random Forests, SVD, PCA, NLP, SVMs
Dabblings CUDA, PyCUDA, PyOpenCL, x86 assembly

projects & additional experience

To see some code I have written, visit my GitHub account.

Experimental neural network and deep learning library; SGD and backpropagation analytic gradient implemented from scratch for multi-layer perceptron, 1-D convolution net, particle network (my own invented flavor of ANN); exploring data parallelization via Spark and GPU acceleration [Python, Spark, numpy, PyCUDA, PyOpenCL]

Convolution neural network model for distinguishing between pictures of bacon and/or Kevin Bacon; web app interface for uploading and classifying pictures (formerly hosted at www.isitbacon.net) [Python, Flask, Lasagne, Theano, HTML, CSS, JavaScript, Twitter Bootstrap]

2014 — 2016

Open-source parallel JavaScript math and statistics library built around HTML5 Web Workers and Node.js cluster library capable of speeding up computations on multi-core devices; accompanying documentation website: mathworkersjs.org, available for install on npm [JavaScript, Node.js, HTML5, CSS/SCSS, Python, Flask, Apache Server]

2013 — present

Full stack coding, back-end to front-end; dynamic blog database. [HTML, CSS/SCSS, JavaScript/jQuery/Node.js, MySQL, Skeleton]

2013 — present

Recreational mathematics and programming problems from Project Euler; currently solved more than 110 problems, 99th percentile [Python, C++]

Parallel interface to Q-Chem program for propagating chemically reactive proton transport simulations with analytic gradients; demonstrated scalability to >200 CPUs [C++, C, MPI]

open source & community contributions

Simple error handling for input server connection list [Python]

2007 — 2014

Lead author of PCM solvent modeling, QM/MM, parallel linear algebra solvers, and Fast Multipole Method code; software design committee; 7th author of 161 co-authors on software white paper [C++, C, Fortran]

Multi-copy communication interface to open-source molecular dynamics software for parallel tempering/replica exchange (LAMMPS Ensembles); optimized compute kernel for pairwise interactions [C++, C, MPI, OpenMP, Python]

honors & awards

Chair's Prime Choice in Computational Division at American Chemical Society Conference
2013
Presidential Fellowship from The Ohio State University Graduate School ($33,150)
2012
Chemical Computing Group Research Excellence Award from American Chemical Society ($1,150)
2012
Travel Fellowship to present at American Conference on Theoretical Chemistry ($600)
2011
Selected to attend Telluride School on Theoretical Chemistry ($850)
2011
U.S. Department of Energy Merit Scholarship for top poster presentation ($300)
2010
3rd place (out of ∼30) at Ohio State University Denman Undergraduate Research Forum ($300)
2006
American Society for Microbiology Undergraduate Research Fellowship ($4,000)
2006
Ohio State Arts & Sciences Undergraduate Honors Research Scholarship ($3,500)
2006

publications

Google Scholar Statistics: 500+ total citations, h-index 8

  1. Adrian W. Lange. Whitepaper: Particle Networks: A Variation on Multilayer Perceptrons with Spatial Pairwise Kernels (in preparation).
  2. Yihan Shao, Zhengting Gan, Evgeny Epifanovsky, Andrew T.B. Gilbert, Michael Wormit, Joerg Kussmann, Adrian W. Lange et al. Advances in molecular quantum chemistry contained in the Q-Chem 4 program package Mol. Phys. 1-32 (2014).
  3. John M. Herbert and Adrian W. Lange. Book chapter: Polarizable Continuum Models for (Bio)Molecular Electrostatics: Basic Theory and Recent Developments for Macromolecules and Simulations (2014).
  4. Adrian W. Lange and Gregory A. Voth. Multi-state Approach to Chemical Reactivity in Fragment Based Quantum Chemistry Calculations J. Chem. Theory Comput. 9, 4018-4025 (2013).
  5. Adrian W. Lange, Gard Nelson, Christopher Knight, and Gregory A. Voth. Multiscale Molecular Simulations at the Petascale (Parallelization of Reactive Force Field Model for Blue Gene/Q): ALCF-2 Early Science Program Technical Report Argonne National Laboratory (2013).
  6. Adrian W. Lange and John M. Herbert. Improving generalized Born models by exploiting connections to polarizable continuum models. II. Corrections for salt effects. J. Chem. Theory Comput. 8, 4381-4392 (2012).
  7. Adrian W. Lange and John M. Herbert. Improving generalized Born models by exploiting connections to polarizable continuum models. I. An improved effective Coulomb operator. J. Chem. Theory Comput. 8, 1999-2011 (2012).
  8. Adrian W. Lange and John M. Herbert. A Simple Polarizable Continuum Solvation Model for Electrolyte Solutions. J. Chem. Phys. 134, 204110 (2011).
  9. Adrian W. Lange and John M. Herbert. Symmetric Versus Asymmetric Discretization of the Integral Equations in Polarizable Continuum Solvation Models. Chem. Phys. Lett. 509, 77 (2011).
  10. Adrian W. Lange and John M. Herbert. Response to “Comment on ‘A Smooth, Nonsingular, and Faithful Discretization Scheme for Polarizable Continuum Models: The Switching/Gaussian Approach."’. J. Chem. Phys. 134, 117102 (2011).
  11. Adrian W. Lange and John M. Herbert. A Smooth, Nonsingular, and Faithful Discretization Scheme for Polarizable Continuum Models: The Switching/Gaussian Approach. J. Chem. Phys. 133, 244111 (2010).
  12. Adrian W. Lange and John M. Herbert. Polarizable Continuum Reaction-field Solvation Models Affording Smooth Potential Energy Surfaces. J. Phys. Chem. Lett. 1, 556-561 (2010).
  13. Adrian W. Lange and John M. Herbert. Both Intra- and Interstrand Charge-Transfer Excited States in Aqueous B-DNA Are Present at Energies Comparable to or Just Above the 1ππ* Excitonic Bright States. J. Am. Chem. Soc. 131, 3913-3922 (2009).
  14. Adrian W. Lange, Mary A. Rohrdanz, and John M. Herbert. Charge-Transfer Excited States in a π-Stacked Adenine Dimer, As Predicted Using Long-Range-Corrected Time-Dependent Density Functional Theory. J. Phys. Chem. B 112, 6304 (2008).
  15. Adrian Lange and John M. Herbert. Simple Methods to Reduce Charge-Transfer Contamination in Time-Dependent Density-Functional Calculations of Clusters and Liquids. J. Chem. Theory Comput. 3, 1680 (2007).

contact