home

about me

I am a data scientist/engineer with a background in computational quantum chemistry. I enjoy machine learning, parallel computing, web programming, physics and chemistry, and all things data science.

latest blog entries

Site Redesign: The Sequel
Sun Nov 29 2015

Updated site design with Skeleton

HBFS Machine Learning IV: BrainSparks and Calrissian
Fri Oct 23 2015

Experimental neural network / deep learning code

HBFS Machine Learning III: Beuller
Fri Oct 23 2015

A foray into Natural Language Processing

development

brainsparks & calrissian

placeholder

Coming soon!

parallel JavaScript math library

I am writing a parallel JavaScript math and statistics library built around HTML5 Web Workers and the Node.js cluster library. The project is called MathWorkersJS, and it's open-source. MathWorkersJS is my personal hobby project I'm doing for fun but also to potentially make a useful tool for the web community.

The motivation behind MathWorkersJS is to introduce some degree of multi-core parallelism into JavaScript mathematical computations, whether they are client-side/in-browser or server-side. While other good JavaScript math libraries exist, none (as far as I know) seem to take advantage of the fairly new feature of HTML5 to spawn Web Workers to help with CPU intensive computations. There seems to be a similar void for parallelism in Node.js with these libraries, too. That's where MathWorkersJS comes in.

So far in tests, I have been able to demonstrate 2-3x speedups with 4 workers for some common linear algebra operations, like matrix-matrix multiplies. More to come in the future!

JavaScript, HTML5, Node.js

data network modeling with Cassandra

At Signal, I developed a back-end Java web app that interfaces with a large Cassandra database which contains billions of records related to clients' user data. Cassandra is usually conceptualized as one or more rings of nodes (drawn here in red), and each node is responsible for managing a portion of the full data stored on disk. We used multiple rings to support multi-region availability of our data.

I played a significant role in developing some of the data models as well as providing data analysis using Cassandra. For instance, the data arrives to the system from various sources (e.g., web, mobile, offline), and clients would ideally like a complete view of all these channels in real time. The solution I helped to create is a dynamic network of client user data (drawn here in blue and green) stored on Cassandra via the Java web app. This feature is the engine behind much of the new product development at Signal.

Java, Python, Cassandra, R

massively parallel computing

My primary focus as a postdoc was to improve the scalability of a code on the 3rd fastest supercomputer in the world (at the time), Mira, the Blue Gene/Q machine at Argonne National Laboratory Leadership Computing Facility.

I managed to do this well for a specialized molecular dynamics code used to simulate proton transport. The original code scaled to only about 64 nodes, but I turned it into one that scales to 24,576 nodes (393,216 cores). To accomplish this, I built up a replica exchange external driver to the molecular dynamics code LAMMPS, exposed some untapped parallelism in the multi-state proton transport calculations, multithreaded several compute-intensive loops, and did plenty of other little code optimizations. I also managed to contribute a handful of lines of code to the open-source LAMMPS software package during this time.

C++, C, MPI, OpenMP

genetic algorithm load balancing

The molecule shown here is part of an implicit solvent calculation based on a parallel fast multipole method, which runs faster when atoms belonging to a given node are spatially close to each other and also far away from atoms belonging to other nodes. The fitness function for the genetic algorithm reflects that, and it also tends to ensure that no nodes have far more/less work than other nodes.

The molecule depicted is a single stranded DNA polyadenine, and each sphere represents an atom. Spheres are colored by which node is responsible for its computation. (There are 4 nodes in this example.) The run starts with a randomized distribution of the atoms across the 4 nodes, which is essentially the worst load balance. Remarkably, the genetic algorithm finds a solution that is very close to the ideal solution for this system.

C++, MPI, OpenMP

science

experimental neural networks

placeholder

Coming soon!

excited electrons in DNA

As part of my dissertation, I studied excited electrons in DNA using quantum chemistry simulations.

The red-blue blob here is a depiction of how the three dimensional wave of an excited electron (or, exciton) spreads out between two adenine molecules when it is excited by ultraviolet light. The red and blue indicates the phase of the electron wave. The adenine dimer is somewhat hidden by the orbital, but the dimer is in the stacking geometry found in biological DNA and surrounded by water, serving as a model to understand electronic excitation in solvated DNA.

solvent electrostatics modeling

This is an equation I derived to model continuum electrostatics of salty solvents characterized by a Debye screening length and a dielectric constant.

I called the equation the "Debye-Hückel-like Screening Model" (DESMO). It generalizes the classic Debye-Hückel model—a point charge centered in a spherical solute cavity surrounded by solvent continuum—to arbitrary charge densities and solute cavity shapes. DESMO also reduces to the more familiar "Conductor-like Screening Model" (COSMO) in the limit of zero salt concentration.

DESMO is a simple, fast approximation to much more complicated models.

reactive molecular dynamics

This is a schematic of the Fragment Molecular Orbital Multi-state Reactive Molecular Dynamics (FMO-MS-RMD) model. It's a clever idea I had to combine Fragment Molecular Orbital (FMO) theory with Prof. Greg Voth's Multi-state Empirical Valence Bond (MS-EVB) theory. FMO divides a molecular system into some set of smaller fragments, but such division can be ambiguous in chemically reactive systems. MS-EVB theory deals exactly with that issue by assuming a linear combination of reactive states, like what is shown in the schematic here.

Combining MS-EVB and FMO together results in a new powerful model, FMO-MS-RMD, which incorporates quantum electronic structure theory into the MS-EVB approach. Futhermore, because the states and the molecular fragments can be calculated independently, FMO-MS-RMD can be computed in parallel to achieve considerable speedups.

quantum chemistry software

I am a contributing author for Q‑Chem, a popular quantum chemistry software package used for carrying out molecular simulations. I was the main author of the polarizable continuum model code and also of the internal QM/MM code. I also implemented a handful of other unique features built into developer versions of Q-Chem, such as a parallel adaptive fast multipole method code and a Connolly surface generator.

résumé

employment

Lead Data Scientist September 2016 — Present
GE Transportation Chicago, IL

  • Create descriptive/predictive analytics solutions for railroad customer performance (e.g. fuel optimization) utilizing modern machine learning and big data technologies [Python, Spark, Hadoop, Cassandra, Zeppelin]

Big Data Engineer, iTunes Analytics May 2015 — August 2016
Apple, Inc. Cupertino, CA

  • Developed analytics infrastructure to generate insights into customer experiences on products such as the iTunes Store, App Store, Apple Music, and Apple TV [Java, Python, Splunk, Cassandra, Hadoop, JavaScript]
  • Utilized machine learning, statistics, and data mining to perform data analysis, segmentation, and hypothesis testing

Software Developer August 2013 — April 2015
Signal (formerly known as BrightTag) Chicago, IL

  • Developed data models, algorithms, and back-end services to build and analyze user profile networks for millions of users per day; stored in NoSQL database with billions of records (∼50 TB) [Java, Cassandra, Python, Spark, Kafka, R]
  • Created real-time anomaly detection and network traffic forecasting system using Fourier analysis capable of predicting regular traffic patterns for upcoming week with >90% accuracy [Java, Python, Storm]

Postdoctoral Appointee March 2012 — July 2013
Argonne National Laboratory Leadership Computing Facility Chicago, IL
University of Chicago

  • Optimized massively parallel physics/chemistry simulations on IBM Blue Gene/Q supercomputer (3 on Top500); increased simulation speed over 8x, scalability to ∼0.4 million CPU cores [C++, C, MPI, OpenMP, Python]
  • Invented novel quantum mechanical proton transport model based on fragment electronic structure theory; model fitting via statistical optimization techniques (simulated annealing, regression, swarm intelligence, etc.)

Ph.D. Student Researcher June 2007 — March 2012
The Ohio State University Columbus, OH

  • Published 10 first author journal articles (see publications); presented at 20+ professional and academic events
  • Researched quantum chemistry and statistical thermodynamics; mathematical theory, computation, and algorithms
  • Implemented theoretical physics/chemistry models into efficient code [C++, C, Fortran, MPI, OpenMP]

education

Ph.D. Computational/Physical Chemistry June 2007 — March 2012
The Ohio State University Columbus, OH
Dissertation: Multi-layer Methods for Quantum Chemistry in the Condensed Phase: Combining Density Functional Theory, Molecular Mechanics, and Continuum Solvation Models

B.S. Chemistry, minor in Microbiology August 2003 — June 2007
The Ohio State University Columbus, OH

Formal Courses:
    Quantum Mechanics, Statistical Thermodynamics, Computational Chemistry, Chemical Physics, Multivariable Calculus, Linear Algebra, Differential Equations, Computer Programming, Numerical Methods
Supplementary Online Courses:
  • Udacity: Web Development, Programming Languages, Parallel Programming (GPU/CUDA), Machine Learning
  • Coursera: Data Science Signature Track (R Programming, Statistics, Data Wrangling), Machine Learning, Algorithms, Databases, Neural Networks

technical skills

Category Proficiency in approximate descending order from left to right
Programming Languages Java, Python, JavaScript, C++, C, awk, Unix/Linux shell (bash), Fortran
Web Technologies HTML, CSS/SCSS, Flask, Node.js, jQuery, Jinja, AJAX, web workers
Databases/Storage Cassandra, MySQL, Splunk, HDFS, Kafka, Redis
Data Analysis/Modeling pandas, numpy, scikit-learn, SciPy, Keras, Lasagne, R
Compute Tools Spark, MPI, OpenMP, Hadoop (MapReduce), blas/lapack
Productivity Tools git, IPython/Jupyter, vim, LaTeX, JIRA, svn
Software Engineering Test driven development (units/smoke/integration), architecture design, code review, agile dev
Machine Learning Techniques Linear/Logistic Regression, Neural Networks (MLP, autoencoder, convolution, recurrent, deep learning), Fourier Analysis, Clustering, k-NN, Random Forests, SVD, PCA, NLP, SVMs
Dabblings Julia, Theano, CUDA, PyCUDA, PyOpenCL, x86 assembly

projects & additional experience

To see some code I have written, visit my GitHub account.

Experimental neural network and deep learning library; SGD and backpropagation analytic gradient implemented from scratch for multi-layer perceptron, 1-D convolution net, particle network (my own invented flavor of ANN); exploring data parallelization via Spark and GPU acceleration [Python, Spark, numpy, PyCUDA, PyOpenCL]

Convolution neural network model for distinguishing between pictures of bacon and/or Kevin Bacon; web app interface for uploading and classifying pictures: isitbacon.net [Python, Flask, Lasagne, Theano, HTML, CSS, JavaScript, Twitter Bootstrap]

2014 — present

Open-source parallel JavaScript math and statistics library built around HTML5 Web Workers and Node.js cluster library capable of speeding up computations on multi-core devices; accompanying documentation website: mathworkersjs.org, available for install on npm [JavaScript, Node.js, HTML5, CSS/SCSS, Python, Flask, Apache Server]

2013 — present

Full stack coding, back-end to front-end; dynamic blog database. [HTML, CSS/SCSS, JavaScript/jQuery/Node.js, MySQL, Skeleton]

2013 — present

Recreational mathematics and programming problems from Project Euler; currently solved more than 110 problems [Python, C++]

Parallel interface to Q-Chem program for propagating chemically reactive proton transport simulations with analytic gradients; demonstrated scalability to >200 CPUs [C++, C, MPI]

open source & community contributions

Simple error handling for input server connection list [Python]

2007 — 2014

Lead author of PCM solvent modeling, QM/MM, parallel linear algebra solvers, and Fast Multipole Method code; software design committee; 7th author of 161 co-authors on software white paper [C++, C, Fortran]

Multi-copy communication interface to open-source molecular dynamics software for parallel tempering/replica exchange (LAMMPS Ensembles); optimized compute kernel for pairwise interactions [C++, C, MPI, OpenMP, Python]

honors & awards

Chair's Prime Choice in Computational Division at American Chemical Society Conference
2013
Presidential Fellowship from The Ohio State University Graduate School ($33,150)
2012
Chemical Computing Group Research Excellence Award from American Chemical Society ($1,150)
2012
Travel Fellowship to present at American Conference on Theoretical Chemistry ($600)
2011
Selected to attend Telluride School on Theoretical Chemistry ($850)
2011
U.S. Department of Energy Merit Scholarship for top poster presentation ($300)
2010
3rd place (out of ∼30) at Ohio State University Denman Undergraduate Research Forum ($300)
2006
American Society for Microbiology Undergraduate Research Fellowship ($4,000)
2006
Ohio State Arts & Sciences Undergraduate Honors Research Scholarship ($3,500)
2006

publications

Google Scholar Statistics: 500+ total citations, h-index 8

  1. Adrian W. Lange. Whitepaper: Particle Networks: A Variation on Multilayer Perceptrons with Spatial Pairwise Kernels (in preparation).
  2. Yihan Shao, Zhengting Gan, Evgeny Epifanovsky, Andrew T.B. Gilbert, Michael Wormit, Joerg Kussmann, Adrian W. Lange et al. Advances in molecular quantum chemistry contained in the Q-Chem 4 program package Mol. Phys. 1-32 (2014).
  3. John M. Herbert and Adrian W. Lange. Book chapter: Polarizable Continuum Models for (Bio)Molecular Electrostatics: Basic Theory and Recent Developments for Macromolecules and Simulations (2014).
  4. Adrian W. Lange and Gregory A. Voth. Multi-state Approach to Chemical Reactivity in Fragment Based Quantum Chemistry Calculations J. Chem. Theory Comput. 9, 4018-4025 (2013).
  5. Adrian W. Lange, Gard Nelson, Christopher Knight, and Gregory A. Voth. Multiscale Molecular Simulations at the Petascale (Parallelization of Reactive Force Field Model for Blue Gene/Q): ALCF-2 Early Science Program Technical Report Argonne National Laboratory (2013).
  6. Adrian W. Lange and John M. Herbert. Improving generalized Born models by exploiting connections to polarizable continuum models. II. Corrections for salt effects. J. Chem. Theory Comput. 8, 4381-4392 (2012).
  7. Adrian W. Lange and John M. Herbert. Improving generalized Born models by exploiting connections to polarizable continuum models. I. An improved effective Coulomb operator. J. Chem. Theory Comput. 8, 1999-2011 (2012).
  8. Adrian W. Lange and John M. Herbert. A Simple Polarizable Continuum Solvation Model for Electrolyte Solutions. J. Chem. Phys. 134, 204110 (2011).
  9. Adrian W. Lange and John M. Herbert. Symmetric Versus Asymmetric Discretization of the Integral Equations in Polarizable Continuum Solvation Models. Chem. Phys. Lett. 509, 77 (2011).
  10. Adrian W. Lange and John M. Herbert. Response to “Comment on ‘A Smooth, Nonsingular, and Faithful Discretization Scheme for Polarizable Continuum Models: The Switching/Gaussian Approach."’. J. Chem. Phys. 134, 117102 (2011).
  11. Adrian W. Lange and John M. Herbert. A Smooth, Nonsingular, and Faithful Discretization Scheme for Polarizable Continuum Models: The Switching/Gaussian Approach. J. Chem. Phys. 133, 244111 (2010).
  12. Adrian W. Lange and John M. Herbert. Polarizable Continuum Reaction-field Solvation Models Affording Smooth Potential Energy Surfaces. J. Phys. Chem. Lett. 1, 556-561 (2010).
  13. Adrian W. Lange and John M. Herbert. Both Intra- and Interstrand Charge-Transfer Excited States in Aqueous B-DNA Are Present at Energies Comparable to or Just Above the 1ππ* Excitonic Bright States. J. Am. Chem. Soc. 131, 3913-3922 (2009).
  14. Adrian W. Lange, Mary A. Rohrdanz, and John M. Herbert. Charge-Transfer Excited States in a π-Stacked Adenine Dimer, As Predicted Using Long-Range-Corrected Time-Dependent Density Functional Theory. J. Phys. Chem. B 112, 6304 (2008).
  15. Adrian Lange and John M. Herbert. Simple Methods to Reduce Charge-Transfer Contamination in Time-Dependent Density-Functional Calculations of Clusters and Liquids. J. Chem. Theory Comput. 3, 1680 (2007).

contact