This page is designed to be a collection of assignments and projects which I have undertaken over the five years of my university education. First off, a little bit about me, I'm currently an Ops manager in the Fulfillment and Logistics team at Revolut in London. I completed an MSc in Statistics at Imperial and four year undergrad in Financial Mathematics at Univeristy College Dublin. My main areas of interest lie in NLP and all things machine learning and data science. I am interested in working at the forefront of these areas, particularly their application to business. With a strong background in Python and R, I would be delighted to discuss any related opportunities. Also if you are interested in working at Revolut please reach out to me through my Linkedin.
Hawkes processes and their applications are a fast-growing area of study. A key difficulty for their application to certain areas is imprecise observation leading to censored event times. The resulting processes have been termed as aggregated Hawkes processes. This thesis aims to address statistical inference and parameter estimation for aggregated Hawkes processes. This is achieved using deep learning frameworks such as Variational Auto-Encoders (VAEs) and Neural Networks. Two methodologies are developed in this thesis; one regarding the successful encoding of the aggregated data using a Variational Auto-Encoder and a Poisson likelihood, and the other uses a Multilayer Perceptron to solve the problem of parameter estimation. The successful application of VAEs to aggregated Hawkes processes allowed for Bayesian inference to be performed on the branching ratio and the baseline intensity. The potential of this method is demonstrated using four test processes with a range of underlying parameters. The problem of parameter estimation was successfully approached using a blended supervised learning technique. This method was tested comprehensively using a range of parameter values to establish an understanding of its performance. It was concluded that the method developed in this thesis provides similar performance to other solutions but with a significantly reduced computational time.
This project was completed as part of a coursework submitted for a data science module. It contains an investigation of Racial bias in three pre-trained word embeddings. This included the usage of the inner product between a 'racial' vector and a 'semantic' vector as a measure of bias. Original visualisations of the level of bias are presented using ggplot2.
This project was completed as part of a coursework submitted for a machine learning module. The first question of the coursework required the comparison of linear and quadratic ridge regressions to a base linear and quadratic models. The second question required the investigation of three different Naive Bayes models for spam detection. The use of libraries was forbidden and therefore we were required to create our own Python modules for this assignment. I decided to make use of Python's object orientated features to create these modules.
This project was completed as part of a coursework submitted for a machine learning module. The three datasets were provided by the lecturer along with prescribed methods to be implemented and tuned on the data. It required the full explanation of Feed-forward Neural Networks, Gaussian Processes and Principle Component Analysis. The neural networks were implemented in Keras through R.
This project was completed as part of two course-works submitted for a data science module. The code written demonstrates an understanding of the fundamentals of API access and unit-test development. The second coursework tested the ability to construct a data pipeline for crime data in London. The code was also profiled using a basic timing function.
This project was completed as part of a 8 hour assignment for a computational statistics module. It included questions on Metropolis-Hasting sampling and the special case known as Gibbs sampling. This project demonstrates an ability to produce accurate reports on short deadlines.