Projects

Statistical Inference for Hawkes Processes with Deep Learning

R

Four Months (2020)

Hawkes processes and their applications are a fast-growing area of study. A key difficulty for their application to certain areas is imprecise observation leading to censored event times. The resulting processes have been termed as aggregated Hawkes processes. This thesis aims to address statistical inference and parameter estimation for aggregated Hawkes processes. This is achieved using deep learning frameworks such as Variational Auto-Encoders (VAEs) and Neural Networks. Two methodologies are developed in this thesis; one regarding the successful encoding of the aggregated data using a Variational Auto-Encoder and a Poisson likelihood, and the other uses a Multilayer Perceptron to solve the problem of parameter estimation. The successful application of VAEs to aggregated Hawkes processes allowed for Bayesian inference to be performed on the branching ratio and the baseline intensity. The potential of this method is demonstrated using four test processes with a range of underlying parameters. The problem of parameter estimation was successfully approached using a blended supervised learning technique. This method was tested comprehensively using a range of parameter values to establish an understanding of its performance. It was concluded that the method developed in this thesis provides similar performance to other solutions but with a significantly reduced computational time.

Investigation of Racial Bias in pre-trained word embeddings.

R

Two weeks (2020)

This project was completed as part of a coursework submitted for a data science module. It contains an investigation of Racial bias in three pre-trained word embeddings. This included the usage of the inner product between a 'racial' vector and a 'semantic' vector as a measure of bias. Original visualisations of the level of bias are presented using ggplot2.

Implementation of Ridge Regression and Naive Bayes modules.

Python

Two weeks (2020)

This project was completed as part of a coursework submitted for a machine learning module. The first question of the coursework required the comparison of linear and quadratic ridge regressions to a base linear and quadratic models. The second question required the investigation of three different Naive Bayes models for spam detection. The use of libraries was forbidden and therefore we were required to create our own Python modules for this assignment. I decided to make use of Python's object orientated features to create these modules.

Analysis of three datasets using Neural Networks, ensemble methods, and Gaussian Processes.

R

Three weeks (2020)

This project was completed as part of a coursework submitted for a machine learning module. The three datasets were provided by the lecturer along with prescribed methods to be implemented and tuned on the data. It required the full explanation of Feed-forward Neural Networks, Gaussian Processes and Principle Component Analysis. The neural networks were implemented in Keras through R.

Demonstration of API access, unit-testing development and data pipelines

Python

Two days (2020)

This project was completed as part of two course-works submitted for a data science module. The code written demonstrates an understanding of the fundamentals of API access and unit-test development. The second coursework tested the ability to construct a data pipeline for crime data in London. The code was also profiled using a basic timing function.

MCMC sampling and computational statistics

R

6 hours (2020)

This project was completed as part of a 8 hour assignment for a computational statistics module. It included questions on Metropolis-Hasting sampling and the special case known as Gibbs sampling. This project demonstrates an ability to produce accurate reports on short deadlines.

Welcome to my webpage!

Projects

Statistical Inference for Hawkes Processes with Deep Learning

R

Four Months (2020)

Investigation of Racial Bias in pre-trained word embeddings.

R

Two weeks (2020)

Implementation of Ridge Regression and Naive Bayes modules.

Python

Two weeks (2020)

Analysis of three datasets using Neural Networks, ensemble methods, and Gaussian Processes.

R

Three weeks (2020)

Demonstration of API access, unit-testing development and data pipelines

Python

Two days (2020)

MCMC sampling and computational statistics

R

6 hours (2020)