I am a recent PhD graduate pursuing Data Engineering, Big Data, and Software Engineering. I am currently a Data Engineering Fellow at Insight Data Science.
As a Data Engineering fellow, I developed a pipeline to determine popularity of millions of websites (at a subdomain granularity) by analyzing the number of times they are referenced in a data set of over 2.5 billion web pages (>17 terabytes compressed data). I processed the data using Spark deployed on an AWS EMR cluster and loaded results into Postgres (>47 million rows).
I also created an interactive web UI to allow users to view and query website popularity data.
During my graduate studies I used biophysical and molecular biology methods such as NMR and fluorescence to study intrinsically disordered proteins. I also created data analysis tools and pipelines to automate, improve, and simplify various tasks. Specifically, my research was focused on understanding how phosphorylation regulates a disordered transcription factor.
This website contains links to scientific resources that I found useful during my studies and a few tools that I created. These tools were mostly for internal lab use, but each tool comes with example input data that can be used to demonstrate how the tool works.
If you have any questions or would like to access password protected content on this site feel free to contact me.
Last updated: 2019 Oct 1