Event schedule - plain text

Programme


Tuesday July 26th 09:00-09:30 Registration

Stream 1: Worsley 8.43X and Y

Abstract: The registration desk will be outside 8.43X and Y, Level 8, Worsley Building.


Tuesday July 26th 09:30-10:00 Social

Stream 1: Worsley 8.43X and Y

Abstract: A chance for refreshments and meeting other attendees.


Tuesday July 26th 10:00-10:50 Keynote: Applications of Machine Learning

Stream 1: Worsley 8.43X and Y

Abstract: To be confirmed.

Speaker: Professor David Hogg, Professor of Artificial Intelligence, School of Computing

Biography: David's research is on artificial intelligence and particularly in computer vision. He pioneered the use of three-dimensional geometric models for tracking deformable structures in natural scenes and contributed to establishing statistical approaches to learning of shape and motion as one of the pre-eminent paradigms in the field. He has been Pro-Vice-Chancellor for Research and Innovation at the University of Leeds, visiting professor at the MIT Media Lab, chair of the EPSRC ICT Strategic Advisory Team, and chair of the Academic Advisory Group of the Worldwide Universities Network. He is a Fellow of the European Association for Artificial Intelligence (EurAI), a Distinguished Fellow of the British Machine Vision Association, and a Fellow of the International Association for Pattern Recognition. He is Director of the UKRI Centre for Doctoral Training in Artificial Intelligence for Medical Diagnosis and Care at the University of Leeds.


Tuesday July 26th 10:50-11:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 11:00-12:00 Day 1: Research Portfolio talks

Stream 1: Worsley 8.43X and Y

Curing Brain Cancer - with a Computer

Talk time: 1100-1120

Abstract: Glioblastoma (GBM) is the most common and deadly form of adult brain cancer. GBM cancer cells are messed up! They have been molecularly rewired in ways that mean they grow when they shouldn't, reshape the tissue surrounding them and are currently incurable. We use large scale molecular profiling to try and understand this rewiring and how we can counteract it. This means we create vast amounts of data from noisy samples across many different molecular levels at differing resolutions (from single cell to whole tissue). We believe these data hold the information we need to cure brain cancer but we to get at it we have to first have to code the heck out of it!

Speaker: Dr Lucy Stead, Associate Professor, School of Medicine

Biography: I am a computational cancer biologist interested in the use of high-throughput sequencing to characterise brain tumour genomes and transcriptomes, and the integrated analysis of datasets to further understand the development and progression of brain cancer. My research focuses on inspecting genomic and transcriptomic heterogeneity present within brain tumours and across stages of brain cancer development and progression. I believe that the most biologically and clinically relevant inferences come from continual iteration of computational and wet-lab approaches, and this is the remit within my group. I am interested in investigating intratumour heterogeneity in glioblastoma (GBM); specifically, I wish to test whether treatment-resistant subclones emerge in recurrent tumours, and characterise them in clinically relevant ways in multiple patients. I am a trained computational biologist with expertise in next-generation sequencing data analysis.

Using HPC to predict the spread of COVID19 misinformation

Talk time: 1120-1140

Abstract: I will present our project aimed to improve understanding of COVID-19 misinformation chains in social media using AI tracing tools. We used our HPC cluster (1) to collect a corpus of about 2 million COVID-related messages and corresponding user profiles from Facebook, Telegram and Twitter and (2) to develop AI classifiers to make predictions about the properties of the messages and the profiles using Deep Learning frameworks. Our main hypothesis is that the socio-demographic profile of the audience is an important indicator for the likelihood of falling prey to misinformation, as different readers differ in how much they might be willing to share misinformation of different kinds. We indeed found a strong positive association of the likelihood of sharing COVID-19 misinformation with readers age and their right-wing political orientation. Another association we observed concerns different preferences for sharing misinformation genres, in particular, academic writing vs personal stories.

Speaker: Dr Serge Sharoff School of Languages, Cultures and Societies

Biography: My research interests are related to three domains: linguistics (primarily computational linguistics and corpus linguistics), cognitive science and communication studies. Probably the most interesting bit in my recent research is digital curation of representative corpora automatically collected from the Web, i.e., their annotation in terms of genres, domains or morphosyntactic categories. The current set of resources includes very large corpora for Arabic, Chinese, English, French, German, Italian, Polish, Portuguese, Russian and Spanish.

CryoEM pipelines or: How I Learned to Stop Worrying and Use all the Cores

Talk time: 1140-1200

Abstract: As Cryo-electron microscopy (cryoEM) has been transformed over the last decade we face new bottlenecks in the filed. one of them been real time processing of the data. New software and scripts enable us to use more processing power of both CPUs and GPUs in order to deal with this bottleneck.

Speaker: Dr Yehuda Halfon Astbury Biostructure Laboratory

Biography: I did my PhD at the Weizmann Institute of Science in Ada Yonath's lab doing single particle cryo-EM of ribosomes from pathogenic bacteria. Then I did a short postdoc at the ICR in Alessandro Vannini's lab. And now I'm here working as a cryo-EM scientist in the EM Facility.


Tuesday July 26th 11:00-12:00 Code clinic

Stream 2: Worsley 8.49N

Abstract: Bring your code problems to present and share with others. A chance to collaboratively troubleshoot code issues. Prior submission will be necessary.


Tuesday July 26th 12:00-13:00 Lunch

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 13:00-13:50 Day 1: 10 minute talks session 1

Stream 1: Worsley 8.43X and Y

Using ARC HPC for virtual high-throughput ligand discovery: A journey through time 2009-2022

Talk time: 1300-1310

Abstract: With the exponential rise in the number of viable novel drug targets, computational methods are being increasingly applied to accelerate the drug discovery process. Virtual High Throughput Screening (vHTS) is one such established methodology to identify drug candidates from large collection of compound libraries. This talk will detail my journey into vHTS at Leeds from ARC1 where we were able to screen tens of thousands of compounds, through to present where we are screening tens of millions and hoping to go bigger.

Speaker: Dr Katie Simmons, Research Fellow, School of Medicine

Biography: Katie completed a Master's degree in Medicinal Chemistry at Newcastle University in 2005 before moving to Leeds to do a PhD designing small molecule inhibitors of a new potential enzyme target for antibacterial drug discovery. She then spent four years in the Astbury Centre (2009-2013) working on an EU funded project identifying modulators of a number of different ion channels and membrane transporter proteins. In 2014 she worked on a project bringing a potential antimalarial agent to pre-clinical evaluation before moving to the Leeds Institute of Cardiovascular and Metabolic Medicine to focus on research identifying modulators of new protein-protein interactions involved in metabolic disorders, where she is currently a BHF Mautner Fellow.

PERPL analysis: Finding structures in cells and features of diseases from incomplete point cloud data on protein positions.

Talk time: 1310-1320

Abstract: Super-resolution light microscopy can find the positions of single proteins in a cell to within an impressive 10 nm. This should be very useful for experiments on subcellular structures in health and disease, which should in turn lead to new diagnostic methods and therapies. However, many instances of the targeted proteins are often left unlocalised, so they are missing from the point patterns obtained. There is also a lack of analysis methods for these single protein position distributions. This means that the developments in data acquisition have not led to many new biological findings. I have developed a Python framework called PERPL (Patterns Extracted from Relative Distributions) which allows pattern analysis and modelling even when more than 99% of target proteins are missing from the experimental data. I am working in 2D and 3D data, with experiments on single and pairs of proteins, and on samples from DNA-origami to single cells in a dish to human tissues. I am collaborating with clinicians to explore how this approach can benefit medical areas where diagnosis is expensive and slow, and where the variability of treatment efficacy is not well understood.

Speaker: Alistair Curd, Research Fellow, Faculty of Health and Medicine

Biography: Alistair has moved from physics training, through commercial R&D on display technology and a PhD in human visual studies, to developing and supporting advanced optical microscopy technologies for nanoscale biological experiments. He is now moving into developing new analytical methods of the point cloud data generated in single-molecule localisation microscopy, to allow it to fulfil its potential for new discoveries and diagnostic methods in cell biology and medicine.

Transcriptome analysis of temporal artery biopsies to identify novel pathogenic pathways based on inflammatory patterns in Giant Cell Arteritis

Talk time: 1320-1330

Abstract: Giant cell arteritis (GCA) is the most common form of vasculitis and can lead to serious complications, such as permanent visual loss, if undiagnosed and untreated in a timely manner. The aim of my PhD project is to use transcriptomic data to gain better understanding of the molecular and genetic mechanisms underlying GCA and to identify candidate genes and pathways amenable to therapeutic targeting. The data cohort comprises eighty patients diagnosed with GCA and includes RNA-seq data generated from temporal artery biopsies, a range of clinical variables, and histological images reflecting different phenotypes relevant to the pathology of GCA. The gene expression data was processed using an in-house developed pipeline of software implemented on the HPC. A series of downstream analyses was performed to assess the influence of clinical variables (e.g., sex, age, duration of steroids exposure) and to examine the association of transcript levels with histological and clinical phenotypes. The results showed that patients' sex is likely to has confounding effect and needs to be accounted for in the analysis. Statistical testing revealed lists of genes (statistically significant after multiple testing correction) associated with certain histological features. These results are currently being further investigated using pathway analysis approaches.

Speaker: Michal Zulcinski, PhD Student, Faculty of Health and Medicine

Biography: Michal is a third-year PhD student at the Faculty of Health and Medicine, based in LICAMM and LIDA. He holds a BSc degree in Biotechnology and MSc degree in Bioinformatics. Prior to starting in Leeds, he worked as a bioinformatics data analyst at François Jacob Institute of Biology in Paris, France. His PhD research project aims at using genetic and transcriptomic data to elucidate the pathogenesis of Giant Cell Arteritis to identify new candidate genes or pathways for therapeutic targeting.

In silico screening for small molecule inhibitors of the lectin-like oxidized LDL receptor 1

Talk time: 1330-1340

Abstract: The lectin-like scavenger receptor oxidized LDL receptor 1 (LOX-1, SR-E1, OLR1) is implicated in promoting atherosclerosis. Activation of LOX-1 by oxidized LDL (oxLDL) triggers signaling, production of reactive oxygen species and apoptosis. However, there is a lack of small molecule inhibitors which target LOX-1 in disease processes. To address this issue, we carried out an in silico screen of widely available small molecule libraries e.g. Maybridge and NPASS, for compounds that bind to human LOX-1. Our approach was based on assessing physicochemical properties, 2D dissimilarity score calculation, ligand-based pharmacophore modeling, molecular docking, and ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) calculation steps. We used the four pharmacophore model where 3 models were selected based on the top pharmacophore scores, and one combined model was generated with merged and shared pharmacophore features. Using this approach, 1963 lead compounds compounds were identified, then re-screened and ADMET calculations used to remove non-druggable candidates. We selected the top scoring compounds and carried out in silico docking with LOX-1. We identified 8 molecules that display stable docking to the surface of the LOX-1 C-type lectin-like domain using molecular dynamics simulation. These lead compounds now form the basis for further in vitro and in vivo studies to assess efficacy at targeting LOX-1 functionality.

Speaker: Dhananjay Jade, PhD Student, School of Biomedical Sciences

Biography: Dhananjay Jade completed a MTech degree in Biotechnology at SRM University, India in 2014. Then he was working as Research fellow at Jawaharlal Nehru University, India on Mycobacterium Tuberculosis after that he work as a Senior Research fellow at International Centre for Genetic Engineering and Biotechnology, New Delhi, India on Computational and experimental characterization of stage specific arginine methylation (post-translation modifications) in Plasmodium falciparum proteome. In 2020, He joined PhD at the University of Leeds, working with Dr. Michael Harrison on project "A molecular explanation for oxidized low-density lipid particles recognition by the LOX-1 scavenger receptor" Along with this he has worked on screening database for drug designing, pharmacophore modelling, Molecular docking, and Molecular Simulation. Recently, he published the SARS-CoV-2 research articles in which he was targeting ACE2 receptor, transmembrane serine protease 2 (TMPRSS2), and PL-PRO/3CL-PRO protein to treating COVID-19 infection.

How do we determine the structural changes of a viral surface protein only 16 nm tall?

Talk time: 1340-1350

Abstract: Many viruses use their surface proteins to get inside a host cell and start an infection, like coronaviruses. Here, we sought to visualize Bunyamwera virus (BUNV), a model for more pathogenic viruses like La Crosse virus, which can cause fatal encephalitis. As viruses are small (~0.1 µm), we employed cryo-electron microscopy. By imaging the same virions from multiple orientations, we could computationally reconstruct them into 3D volumes, termed tomograms. We then wanted to image the surface proteins of BUNV. To do this, we utilised the parallel-processing power of ARC to perform sub-tomogram averaging. We selected thousands of sub-volumes from multiple viruses, corresponding to the surface proteins of BUNV. These sub-volumes are iteratively aligned to one another, refining their rotations and locations at each stage until optimally aligned (an approach termed sub-tomogram averaging). This allowed us to generate a 3D structure of the BUNV surface proteins, which are only ~16 nm. In addition, we used Google DeepMind's AlphaFold software to predict the exact amino acid structure of the BUNV surface proteins, which allowed us to interpret our sub-tomogram averaging. Overall, understanding the BUNV surface proteins will allow the future development of vaccines or anti-virals for related pathogenic viruses.

Speaker: Dr Samantha Hover, Research Fellow, Faculty of Biological Sciences

Biography: I am a Postdoctoral research fellow at the University of Leeds, working in the Fontana lab using cryo-electron tomography to investigate the mechanisms of influenza virus infection. I previously completed my PhD at the University of Leeds studying the role of host cell K+ channels in Bunyavirus infection. Using a range of molecular and structural biology techniques to elucidate how manipulation of endosomal K+ it detrimental to virus infection, and how K+ facilitates structural changes in Bunyavirus infection required for fusion events.


Tuesday July 26th 13:00-13:50 Day 1: 10 minute talks session 2

Stream 2: Worsley 8.49N

Towards Arabic sentence simplification

Talk time: 1300-1310

Abstract: Text Simplification (TS) is a Natural Language Processing (NLP) task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). Jin et al., 2021, suggest that TS is a type of Text Style Transfer (TST), where the target style of the generated text is “simple”. The importance of TS involves: (i) designing and simplifying the language curriculum for both second and first language learners; (ii) being a fundamental pre-process in NLP applications such as text retrieval, extraction, summarization, categorization, and translation (Saggion, 2017). We experimented using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT (Safaya et al., 2020), a pre-trained contextualised model, and a model of fastText word embeddings(Grave et al., 2018); (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5 (Xue et al., 2021). We developed our training corpus by aligning the original and the target simplified sentences from the internationally acclaimed Arabic novel “Saaq al-Bambuu”(Al-Sanousi, 2013; Familiar and Assaf, 2016), then evaluated the effectiveness of these methods using the BERTScore evaluation metric (Zhang et al., 2020). The simple sentences produced by the mT5 model achieved P 0.72, R 0.68, and F-1 0.70 via BERTScore, while combining Arabic-BERT and fastText achieves P 0.97, R 0.97, and F-1 0.97. Overall, there are advantages and limitations in the two approaches, both could benefit from adding a post-handler language generation module.

Speaker: Nouran Khallaf, PhD Student, School of Modern Languages and Cultures

Biography: Nouran Khallaf is a PhD student at the University of Leeds. At the Center of Translation studies, funded by a scholarship from the Newton-Mosharafa Fund | British council. She researches Text Simplification methods using tools and theory from natural language processing and machine learning. Nouran. She is particularly interested in natural language processing, data mining, corpus linguistics, and software development. She is also broadly involved in several projects related to corpus collection and annotation, as well as technologies for language learning. Nouran holds an MS in computational linguistics from the University of Alexandria, and a BA in phonetics and linguistics from the same university.

Sketch Engine & Wordsmith in Cross-Lingual Stylometry Research (withdrawn)

Talk time: 1310-1320

Abstract: Text Analysis software are used to process corpus data to present various statistical information and detailed breakdown of the data. There are different tools used to assist with this type of analysis such as Wordsmith and Sketch Engine. One field that utilizes such tools is STYLOMETRY which hypothesize that individuals have uniquely distinguishing characteristics in their writing style that can be measured, even between those less experienced writers (van Halteren et al., 2005). These characteristics may include preferences for vocabularies, expressions, figures of speech, syntactic patterns, and use of punctuations, which all play a part in forming one’s style. This field goes further to CROSS-LINGUAL STYLOMETRY which examines whether one’s style in writing remains the same when bilingual or multilingual author writes in two or more different languages. In this presentation, the use of Wordsmith and Sketch Engine in a Cross-Lingual Stylometry research examining the works of the bilingual writer Gibran Kahlil Gibran is discussed alongside the challenges faced.

Speaker: Albatool Alamri

Exploring the GLIDE model for Human Action-effect Prediction

Talk time: 1320-1330

Abstract: We address the following action-effect prediction task. Given an image depicting an initial state of the world and an action expressed in text, predict an image depicting the state of the world following the action. The prediction should have the same scene context as the input image. We explore the use of the recently proposed GLIDE model for performing this task. GLIDE is a generative neural network that can synthesize (inpaint) masked areas of an image, conditioned on a short piece of text. Our idea is to mask-out a region of the input image where the effect of the action is expected to occur. GLIDE is then used to inpaint the masked region conditioned on the required action. In this way, the resulting image has the same background context as the input image, updated to show the effect of the action. We give qualitative results from experiments using the EPIC dataset of ego-centric videos labelled with actions.

Speaker: Fangjun Li, PhD Student, School of Computing

Biography: I hold BSc and master's degrees from the Shandong University. I'm now a PhD student in School of Computing, University of Leeds under Professor Anthony Cohn and David Hogg. This is the end my second year of my PhD. My research interests centre on the combination of knowledge-based reasoning, natural language processing and computer vision.

Integrating Religious Knowledge: From a Heterogeneous Format to an RDF Model

Talk time: 1330-1340

Abstract: Linguists and religious scholars have analysed, annotated and segmented Qur'anic text using a variety of representations and formats. However, Qur'anic resources on different topics, aspects and formats can be difficult to link due to the Qur'an's complexity. The segmentation and annotation of the Holy Qur'an can be represented in a variety of heterogeneous structures (e.g. CSV, JSON and XML), but there is no standardised mapping formalisation for the data. Therefore, this study aimed to link morphological segmentation tags and syntactic analyses in Arabic and Buckwalter forms to Hakkoum ontology to enable further clarification of the Holy Qur'an. To achieve this, we used the RDF Mapping Language (RML) tool to automatically link Hakkoum ontology concepts to each column of the QacSegment corpus. Then, we used Protégé Editor to extract and link Arabic text by expressing specific rules. The tools used in this study allowed for the automatic mapping and linking of heterogeneous data sources into an RDF data model. In addition, the integrated ontology was evaluated using a SPARQL query on an Apache Jena Fuseki server. This experiment was conducted on all chapters, verses, words and segments of the Qur'an corpus.

Speaker: Ibtisam Alshammari, PhD Student, School of Computing

Biography: Ibtisam is a second-year Ph.D. student in the School of Computing at the University of Leeds. Before that, she completed a master’s degree in computer science at the University of Leeds in 2020. Her research focuses on combining Islamic studies, including the Quran and Hadith resources, into a unified knowledge base by applying AI and NLP methods.

Deep Learning of Semantic Similarity in the Quran

Talk time: 1340-1350

Abstract: Semantic similarity detection is a crucial task in natural language comprehension. It is vital in many NLP applications such as information extraction, words sense disambiguation, text summarization, and text clustering. My work focuses on the semantic similarity in the Quran and proposes a Siamese transformer-based architecture for pairwise semantic similarity detection in the Quran. It exploits a pre-trained model, that is, AraBERT, to derive semantically meaningful sentence embeddings and achieve state-of-the-art results on the semantic similarity measures. My model benefits from the Siamese network architecture, like in SBERT, to finetune the pre-trained model with less computational burden characterizing sentence-pair regression tasks. My research presents a verse-embedding using twin Siamese AraBERT-networks for fast and efficient semantic similarity detection in the Quran. The proposed architecture starts with pre-trained AraBERT models. Then, a Siamese setup is used to finetune the models on a semantic similarity dataset drawn from the Quran. As a result, the architecture achieved a score of 84.96% Spearman correlation representing its ability to assess whether two verses are similar. Furthermore, the model set a high-performance record of 95% F1 score on the Quranic semantic similarity dataset. Indeed, it improves the Quranic semantic similarity measures and performance over previous studies.

Speaker: Menwa Alshammeri, PhD Student, School of Computing

Biography: Menwa Alshammeri is a lecturer at Jouf University in Saudi Arabia. She is a Ph.D. student in the School of Computing at Leeds University. Her research investigates AI applied to text and exploits NLP methods to analyse and explore texts' underlying knowledge. She utilizes ML/ DL models with NLP techniques to examine the semantic similarity task and extract the meanings and concepts from the Quran.


Tuesday July 26th 13:50-14:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 14:00-14:55 Day 1: Lightning talks and poster presentations

Stream 1: Worsley 8.43X and Y

Sensitivity analysis of an agent-based model using HPC

Talk time: 1400-1402

Abstract: We develop a spatially explicit agent-based model (ABM) - BESTMAP-ABM-UK that simulates Humber farmers’ decision-making process, inclusive of farmers’ social, behavioural and economic factors, on adoptions of agri-environment schemes (AES) using NetLogo. AES are government-funded voluntary programs that incentivise farmers and land managers for environmental friendly farming practices. We carry out global sensitivity analysis of the model using the Morris screening method. The Morris screening method is a computationally efficient screening technique that allows us to identify the important model input factors. Because the sensitivity analysis requires running a large number of simulations, we use a R package named nlrx and run the NetLogo model on HPC. The results reveal the rank of importance of the seventeen parameters to the model output, i.e., farm adoption rate, which is useful for prioritising our focus on the most influential parameters in the model calibration stage.

Speaker: Chunhui Li, Research Fellow, School of Geography

Biography: Dr Chunhui (Jenny) Li is a research fellow working on the BESTMAP project in the School of Geography at the University of Leeds. She is trained in computer science and engineering and experienced in multidisciplinary studies. Her work focuses on using agent-based modelling and simulations to solve real-world problems. She has worked on agent-based modelling applications to support decision-making in business emergency management in flooding events, market strategies in financial services and smart cities in emergency scenarios. Currently she is working in a team across EU on developing a model suit to assess the UK and EU agricultural policies.

Building a data pipeline to analyse symmetries in network.

Talk time: 1402-1404

Abstract: A network or graph consists of a set of vertices and a set of edges that link pairs of vertices. Networks are used in a wide range of areas, for example, to study the spread of disease through networks of social contacts. Studying such dynamics on networks accurately involves a significant amount of computation, but this can be reduced by making use of symmetries in networks, where symmetries result from vertices that can be swapped without changing the structure of the network. In my MSc project, I am investigating network symmetries using a data set that consists of all graphs up to 11 vertices, which includes more than a billion networks. In this poster, I will describe my efforts to process and analyse these data using the HPC facility at Leeds.

Speaker: Jai Gomathi Veerakumar

Decolonizing Reading Lists for educational engagement and student success

Talk time: 1404-1406

Abstract: There is growing interest and investment in Educational Engagement and Student Success at Leeds University; eg see UoL Strategy 2020-2030 One aspect of this is computing research on decolonizing the curriculum, by staff in Computing, Library and LITE for decolonizing reading lists. Data Analytics of taught module reading lists has shown that UK and US authors write most Engineering and Science teaching textbooks at Leeds University. If university students are told to read only textbooks by UK and US authors, does this bias their learning? AI research has shown that Machine Learning copies bias in the training data. One challenge of this computing research has been access to data: reading lists can be incomplete or missing, and/or inaccessible. Another challenge has been validated annotation of representative training and test data-sets: marking each reading list item with “bias labels”. We will present further challenges, and initial results of our project on AI for decolonizing reading lists; see https://www.turing.ac.uk/events/turings-cabaret-dangerous-ideas-leeds

Speaker: Eric Atwell, Professor of Artificial Intelligence, School of Computing

Biography: Eric Atwell is Professor of Artificial Intelligence for Language in the School of Computing at the University of Leeds, leading a research group of 17 PhD students and postdocs. Eric teaches Data Mining and Text Analytics at BSc and MSc levels, both online and on-campus modules. Eric is also School of Computing DPGR Director of Post Graduate Research; and Programme Leader for our fully-online MSc programme in Artificial Intelligence (AI), including the online AI student project. Eric also works 40% in Leeds Institute for Teaching Excellence, as a LITE Fellow in AI for decolonizing reading lists; and he is a Turing Fellow with the Alan Turing Institute for AI, researching AI in education.

Spin transitions in ferropericlase in the lower mantle.

Talk time: 1406-1411

Abstract: Ferropericlase (Fe,Mg)O is the second most abundant phase in the lower mantle. It has a simple rock-salt structure, but has a rich and complex chemistry due to the unpaired d electrons of iron. Over recent years it has been shown that iron in ferropericlase should undergo a pressure-induced spin transition in the mantle, going from a high-spin state with four unpaired d-electrons to a low-spin state where all d-electrons electrons are paired. Further studies have shown that this has a significant impact on the density, bulk modulus and viscosity of the phase, with corresponding implications for interpretation of seismic observations and the dynamics of the lower mantle. To date, there are significant differences between the calculated and measured onset and breadth of the spin transition, as well as its temperature dependence. In this study we build upon previous work and go beyond the assumed ideal mixing of high- and low-spin iron. We find that the favourable enthalpy of mixing of neighbouring on-axis iron atoms leads to a broadening of the spin transition compared to previous calculations in better accord with experimental results.

Speaker: Stephen Stackhouse, Associate Professor, School of Earth and Environment

Biography: I am an Associate Professor in Solid Earth Geophysics in the School of Earth and Environment. I mainly use first-principles calculations to predict the properties of minerals at the extreme conditions of the lower mantle, such as equation of state, elastic constants, thermal conductivity and phase transitions. To study larger numbers of atoms I also perform atomic scale simulations using pair potentials and am currently trying to develop machine learning potentials.

Simulations of tidally locked exoplanet atmospheres in 3D

Talk time: 1411-1416

Abstract: Using the Whole Atmosphere Community Climate Model (WACCM), a 3D Earth System Model, I have simulated M dwarf terrestrial exoplanets that synchronously rotate around their host stars. These exoplanets are ‘tidally locked’, meaning the star is unmoving in the sky, such that the exoplanet has a warmer dayside and a colder nightside. Tidally locked exoplanets have an unusual circulation structure. Super-rotating jets form and distribute heat to the nightside, resulting in distinctive cloud patterns that depend on total irradiation and rotation speed. Consequently, the distribution of chemical species is affected. All these properties result in climate predictions that look unique when compared to any planet in our solar system. Therefore, I use the WACCM simulations to predict future observations of these exoplanets with next generation telescopes. To help with my own research and that within the exoplanet community, I have developed two open-source tools written in Python and in Jupyter Notebook. One improves the speed and flexibility of climate data analysis. The other ingests stellar spectra and scales the total irradiance to match the irradiance received by exoplanets in the NASA Exoplanet Archive. This tool speeds up the time it takes to initiate simulations for multiple exoplanets.

Speaker: Gregory Cooke, PhD Student, School of Physics and Astronomy

Biography: I am a 3rd year PhD student in the Astrophysics department. I use the latest version of the Whole Atmosphere Community Climate Model (WACCM6) to simulate paleoclimates and exoplanets, predicting their chemistry, dynamics, and habitability. From these simulations, I then predict future observations with next generation telescopes.

High-Resolution Modelling of Future Air Quality following Shared Socio-economic Pathway Emissions Changes

Talk time: 1416-1421

Abstract: Air pollution is one of the world's leading causes of premature death. It is therefore important to understand how future emissions changes could impact air-pollution related mortality. Futures with more ambitious climate change mitigation could have air quality co-benefits compared to less sustainable futures due to common sources of greenhouse gases and air pollutants. Different futures can be represented with the Shared Socioeconomic Pathways (SSPs), which provide narratives describing different futures and projected socioeconomic and emissions data, and supersede the Representative Concentration Pathways. Current modelling of air quality following the SSPs often uses coarse-resolution global models or reduced complexity models that give regional averages of air pollution related mortality, which may less accurately represent changes in mortality on a smaller scale. Using emissions projections from three SSPs representing very different approaches to climate change mitigation (SSP1-2.6, SSP2-4.5 and SSP3-7.0) and a detailed, high-resolution atmospheric chemistry model (WRF-Chemv4.2) with chemical initial and boundary conditions from WACCM simulations of the same scenarios, we simulate 2050 PM2.5 and O3 in Western Europe. This allows estimation of the future air pollution-related health burden at a more regionally-refined scale than previous research.

Speaker: Connor Clayton, PhD Student, School of Earth and Environment

Biography: Connor is a 2nd year PhD student in the Institute for Climate and Atmospheric Science within SEE. He has an MSc in Environmental Health, a BSc in Environmental Management and is a Chartered Environmental Health Officer. His research interests focus on the interactions between climate change, air quality and health.

Leeds Institute of Fluid Dynamics Machine Learning for Earth Sciences Tutorial

Talk time: 1421-1426

Abstract: One of the biggest hurdles to getting started with applying machine learning to scientific research is getting a working set-up and an example of how to use a technique on real scientific data. CEMAC has teamed up with the Leeds Institute of Fluid Dynamics to create a series of Jupyter Notebooks that take you through a number of machine learning techniques applied to real Earth Science research. These are stripped down to run on laptops, or small servers with continuous integration ensuring longevity in the tutorials and where possible quick look binder links to explore the code without even having to install the given python environment. Currently, there are 4 available tutorials covering a range of topics from using Convolutional Neural Networks to identify and classify volcanic deformation to using Random Forests to determine controls on leaf temperature. Another 3 notebooks will be added this summer and we hope to continue expanding this resource in future years.

Speaker: Helen Burns, Software Development Scientist, Centre for Environmental Modelling and Computation

Biography: Helen is a software development scientist at CEMAC (Center for Environmental Modelling And Computation. She joined CEMAC in the school of Earth and Environment in 2018 after completing her PhD at the National Oceanography. Her main areas of interest are numerical modelling and machine learning.

Electronic structure of CuWO4

Talk time: 1426-1431

Abstract: As the energy production diagram shifts from the fossil fuel to carbon-free, low-cost energy, using copper tungstate (CuWO4) to split the water molecules (H2O) into hydrogen (H2) and oxygen (O2) via photocatalysis has become one of the promising ways to obtain sustainable and renewable alternative. The suitable band gap (2.0-2.3 eV) for visible light harvesting, facile surface reaction kinetic, earth-abundance, and superior stability in electrolyte make CuWO4 a perfect candidate for water splitting via photocatalysis. In this work, density functional theory is applied for studying the electronic structure of CuWO4. The computational results fit the experimental data well within a 1.5% deviation.

Speaker: Xuan Chu

Faster estimation of choice models using HPC

Talk time: 1431-1436

Abstract: The main goals of choice models are typically to predict a set of outcomes given a discrete set of choice alternatives and to forecast choices given a change in choice context. Many different economic and psychological concepts can be incorporated within a choice model, leading to frequent implementation of very complex models. Additionally, recent technological advances have led to the availability of a wider range of data for behavioural modelling. In particular, physiological data (eye-tracking, EEG, etc) and large-scale revealed preference data (e.g. travel card or supermarket clubcard data) have meant that we have have more complex and larger choice datasets. The combination of complex models for complex datasets requires significant computational power, far beyond what can be computed in a reasonable time for standard desktops, which may take weeks or months to compute estimates for the model parameters that best explain the full set of choice observations. In this presentation, we demonstrate examples of how the Apollo package in R allows us to run models on multiple cores, which can be estimated at least 20 times faster with the Leeds HPC.

Speaker: Dr Thomas Hancock, Research Fellow, Institute for Transport Studies

Biography: Thomas Hancock is a Research Fellow at the Institute for Transport Studies (ITS) under the NEXUS project. His main areas of interest are understanding the decision-making process, the integration of (econometric) choice modelling and mathematical psychology, model specification and interpretation and moral choice behaviour.

Power up your Shiny app with custom outputs

Talk time: 1436-1441

Abstract: Everybody loves a Shiny app, and what better way to manipulate and prepare data for visualisation than to use the power of R packages such as dplyr and tidyr? That’s great if you want to visualise your data in a standard output format such as a table or a plot which can be easily integrated into a Shiny app, but what if you want to display your research outputs in a non-standard format? In this presentation I’ll demonstrate a Shiny app that uses a JavaScript generated custom output format for displaying the results of a literature review, providing an interface that can be used by policy makers, analysts and the general public to filter and display relevant information at different levels of detail.

Speaker: Tamora James, Software Development Scientist, Centre for Environmental Modelling and Computation

Biography: Tamora worked for over a decade in web and software development before returning to academia in 2015 to complete a PhD in conservation demography at the University of Sheffield. She joined the Centre for Environmental Modelling and Computation (CEMAC) as a Software Development Scientist in July 2020 in support of the Global Challenges Research Fund African Science for Weather Information and Forecasting Techniques (GCRF African SWIFT) project. Highlights from SWIFT include working on near real-time weather data (nowcasting) provision and automated synoptic plotting for operational forecasting. More recent work includes developing web applications for visualising research results ranging from water resource management models to climate mitigation co-benefits.

Proposing a new Online Application Architecture explored through creating a Mobile Online Multiplayer Chess Game for Android Devices using the Kotlin programming language

Talk time: 1441-1446

Abstract: This project explores the ever-evolving field of online multiplayer games for mobile phones with the aim of finding an architecture that would provide a minimum maintenance cost for its developer of mobile and desktop apps in general. For this reason, the Kotlin programming language and Google Firebase were utilised to develop a real-time multiplayer Chess Game for Android smartphone devices. The already existing architectures were investigated to compare their pros and cons and devise a new architectural design that would be able to give us the following: High Speed on all Devices, Low Bandwidth, Low Memory Requirements, and Low Maintenance cost. For this purpose, a simple online game consisting of just a colour-changing button was created, to make sure that the architecture was feasible. Then an online mobile Chess Game was developed to test the architecture under the more complex processes involved in such software. Finally, the architecture was tested using Unit Testing and a Real-life Experiment in the form of a Tournament with players from three different cities around the UK and Europe. Overall the architecture proved to be a success and achieved all of its goals. This paves the way for other applications (not only games), and developers who need to cut down on the costs of Developing and Maintaining an app, especially those without a large capital (freelancers and startups).

Speaker: Konstantinos Biris, Undergraduate Student, School of Computing

Biography: Coming from Athens, Greece, Konstantinos studied for his Bachelor of Science in Computer Science at the University of Leeds. His interests include Cloud Computing, and Complexity Theory and he is currently on his way to obtaining an MSc in Data Science and Analytics at King's College London. In the meantime, he likes to independently research ways to reduce the cost of operating applications using Cloud Services and reduce the amount of data needed for those applications to function. This way he aims at making software development more affordable to everyone, from hobbyists and start-ups to large companies and organizations.

NLTK and SBERT, and their use in Parallel Sentence Alignment Using Transformers

Talk time: 1446-1451

Abstract: A parallel sentences corpus is an essential resource for many applications of Artificial Intelligence (AI) and Natural Language Processing (NLP), including chatbots, translation, and question/answering systems. By improving the quality of the parallel sentences corpus, the performance of these applications is improved. In this research, we went through two main levels of preparing the English-Arabic parallel sentences: the document and sentence levels. Each English document was aligned with its corresponding Arabic document at the document level. Then, using the NLTK tokenizing package "sent_tokenize" all the source English and targeted Arabic texts are converted into sentences with different lengths. Furthermore, four pre-trained SBERT models were applied at the sentence level to derive comparable meaningful sentences using cosine similarity. As a result, we get parallel sentences, each with a cosine similarity score. The highest average cosine similarity score is 0.76, obtained from the pre-trained model "paraphrase-multilingual-mpnet-base-v2". The preliminary results are promising and encourage further research in this area.

Speaker: Moneerh Aleedy, PhD Student, School of Computing

Biography: I am Moneerh Aleedy, a split-site Ph.D. student from Riyadh, Saudi Arabia. I work as a lecturer in the Information Technology Department at Princess Nourah Bint Abdulrahman University, one of the biggest universities in Saudi Arabia. Currently, I am a Ph.D. student in the School of Computing at the University of Leeds.


Tuesday July 26th 14:00-14:55 Day 1: 10 minute talks session 3

Stream 2: Worsley 8.49N

Optimal Seed Generator in Biosequence Search Algorithms

Talk time: 1400-1410

Abstract: Sequencing and pattern matching stays one of major problems in bioinformatics. With the amount of data to process ever-increasing, the development of fast and efficient algorithms stays very important. Seeding is one of the techniques which efficiently speeds up the searching algorithms. However, finding optimal seeds, which maximise efficiency is still an open problem. It has been acknowledged that spaced and subset seeding (non-binary) results in a more efficient search. A few seed generators, such as Iedera, Rasbhari or SpEED were suggested; however, the speed-up is achieved at the cost of reducing sensitivity. We suggest a new framework (PerFSeeB) to find a set of optimal spaced (binary) seeds for a given set of parameters. It has been noticed that the optimal seeds possess a periodic structure. The found spaced seeds guarantee to locate all positions within a reference genome for a predefined number of allowed mismatches. The code is written in C++, it is vectorized and optimized for a multicore workstation. A special way to store the dataset has been suggested. The operations with the dataset were also optimized.

Speaker: Sofya Titarenko, Lecturer, School of Mathematics

Biography: My career started in Applied Mathematics (inverse and ill-posed problems with application to Computerised Tomography). I joined the University of Leeds, School of Earth and Environment, in 2011, where I spent my first years working on computational geophysics problems. In 2017, I received a great opportunity to join the School of Mathematics, working on the Data Mining project. Since that, I become interested in Data Science algorithms and their applications. I have worked as a Senior Lecturer in Mathematics at the University of Huddersfield from 2019 to 2022. I am back at the University of Leeds from 2022 as a Lecturer in Statistics. All through my research career, I was interested in optimising the developed mathematical algorithms to minimise the calculation time and maximise efficiency. In particular, I was interested in optimisation for GPU-based systems (using CUDA) and multicore workstations (using openMP and vectorisation techniques). I strongly believe in the power of combining mathematical approaches and computer science methods to solve challenging big data problems.

Brains Over Brawn: Thinking About New Paradigms for Biophysical Simulations

Talk time: 1410-1420

Abstract:

Climate change is a serious problem for humanity, and the high-performance computing sector has its own part to play in its alleviation[1]. Yet in addition to this challenge, the socio-economic push for hardware growth has its own scientific issues, particularly regarding the nature of the scientific method, Ockham’s Razor and the nature of understanding. I will present three biophysical simulation techniques developed as part of my own research which showcase alternate, bespoke ways of thinking about the physics at work in novel biological systems. Fluctuating Finite Element Analysis (FFEA) models large, globular biological systems as continuum mechanical systems[2,3]. BioNet models hierarchical protein networks using experimentally relevant building block[4,5]. Finally, our mechano-kinetic model utilises bespoke energetic functions to modify Markov state models and couple long- and short- timescale biological processes together[6]. I will discuss why each of these techniques is justified in leaving out specific types of fine-detail, thus saving computational power, and conclude with a perspective on the responsibility of computational scientists to favour the creation of smart methods over a reliance on hardware improvements.

[1] Frost, J.M. (2017). https://jarvist.github.io/post/2017-06-18-carbon-cost-of-high-performance-computing/

[2] Solernou, A., Hanson, B.S. et al. (2018). PLoS Comp. Biol. 14(3), 1-29

[3] Hanson, B.S. et al. (2021). Methods. 185, 39-48

[4] Hanson, B.S. et al. (2019). Soft Matter. 15(43), 8778-8789

[5] Hanson, B.S. et al. (2020). Macromolecules. 53(17), 7335-7345

[6] Hanson, B.S. et al. (2022). bioRxiv. https://doi.org/10.1101/2020.11.17.386524

Speaker: Benjamin Hanson, Lecturer, School of Physics and Astronomy

Biography: Ben's research career has focused on the development & application of novel computational methods to biophysical systems. He has studied the molecular motor dynein by developing continuum mechanical simulation techniques, colloidal and protein-based hydrogel systems using the bespoke simulation software “BioNet”, and protein unfolding by developing multiscale mathematical methods. He is currently moving his focus towards teaching and scholarship in the Physics Education Research Group at the University of Leeds, and is beginning his research into the pedagogical potential of virtual reality in higher education physics.

Deconvolution of bulk transcriptomic data reveals immune cell landscape of inflammatory infiltrates in giant cell arteritis (withdrawn)

Talk time: 1420-1430

Abstract: The cellular landscape of many rare diseases remains enigmatic, due to limitations of data availability. Yet the knowledge of cellular composition is crucial to understand molecular events and cellular players driving the conditions. Recent advances in computational deconvolution methods have made it possible to infer cell type proportions from bulk RNA-seq datasets, that are more prevalent and cost-effective than single-cell RNA-seq datasets. We performed deconvolution of bulk RNA-seq dataset (n=88) generated from temporal artery biopsies of patients with Giant Cell Arteritis (GCA), using a single-cell RNA-seq dataset (n=9,) as a reference (also generated in GCA patients). The main objective of the study was to uncover cell type proportions in biopsy samples and shed light on cell-type-specific associations with clinical and histological phenotypes in GCA. Several deconvolution software packages were used, and the obtained cell type proportions were compared to determine methods reliability and its suitability for vascular tissue data. Overall, the findings reveal a previously unreported landscape of cell population abundance levels in GCA biopsies and provide novel insights into cell-type-specific expression profiles of both, transcripts already known to be involved in GCA pathogenesis, as well as novel molecular signatures that might have potential for therapeutic targeting.

Speaker: Michal Zulcinski, PhD Student, Faculty of Health and Medicine

Biography: Michal is a third-year PhD student at the Faculty of Health and Medicine, based in LICAMM and LIDA. He holds a BSc degree in Biotechnology and MSc degree in Bioinformatics. Prior to starting in Leeds, he worked as a bioinformatics data analyst at François Jacob Institute of Biology in Paris, France. His PhD research project aims at using genetic and transcriptomic data to elucidate the pathogenesis of Giant Cell Arteritis to identify new candidate genes or pathways for therapeutic targeting.

Video Synthesis of Talking Head

Talk time: 1430-1440

Abstract: My talk is about synthesising talking-head videos from speech audio using AI. I will give a introduction about the task and the application of it. I will talk about the approach we propose and the results. I will also talk about the Python packages that I use and other packages as well.

Speaker: Mohammed Alghamdi, PhD Student, School of Computing

Biography: Mohammed is currently a PhD student of Artificial Intelligence at the University of Leeds., working under Professor David Hogg and Professor Andy Bulpitt. He has a BSc degree in Computer Engineering and an MSc in Computer Science. Before coming to Leeds, he worked as a lecturer at Taif University. His research interests focus on video and image generation.

Democratising Billion-Scale Deep Learning Model Training

Talk time: 1440-1450

Abstract: Deep neural networks (DNNs) with billion-scale parameters have demonstrated impressive performance in solving a wide range of tasks, from image generation and natural language understanding to drive-less cars and bioinformatics. Unfortunately, training a billion-scale DNN is out of the reach of many data scientists and academics because it requires high-performance GPU servers that are too expensive to purchase and maintain. I will talk about STRONGHOLD - an open-source library developed during a collaborative project between U. Leeds and Alibaba (a major industry AI player). STRONGHOLD is integrated with the widely used PyTorch framework and requires no change to the user code. It scales up the trainable model size on a single GPU by over 30x compared to the state-of-the-art approach without compromising the training efficiency. We showcase that STRONGHOLD supports the training of a model with 39.5B parameters on a single 32GB V100 GPU. By combining compute and memory efficiency with ease of use, STRONGHOLD democratises large-scale model training by making it accessible to data scientists with access to just a single GPU.

Speaker: Xiaoyang Sun, PhD Student, School of Computing

Biography: Xiaoyang Sun is a PhD student of the Distributed Systems and Services Group at the University of Leeds. He participated in research internships in Alibaba Group Inc., working on resource management and task scheduling on the large-scale clusters, accelerating pre-trained models in the resource-limited environment. His primary research focuses on system optimization for deep learning workflows on heterogeneous resources.


Tuesday July 26th 14:55-15:00 Break

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 15:00-15:50 Discussion Panel: Open Research

Stream 1: Worsley 8.43X and Y

Abstract: Computational methods have a part to play in helping to enable or even ensure that research is open – reproducibility. At the very minimum code and any other “processing” of data or information are an important research output especially alongside the raw and processed or generated data and perhaps even as a first class research output in their own right. The big advantage of computational or code based processing is that it is possible to rerun the analyses and verify the results. This is much more problematic with GUI based tools which rarely record the processing journey. Could you reproduce your own research let alone that of another researcher? How does this apply in the different research disciplines?

Speaker: Kelly Lloyd, PhD Student, Leeds Institute of Health Sciences

Biography: Kelly Lloyd is a PhD student at the Leeds Institute of Health Sciences, examining decision-making in cancer preventive therapy among patients and healthcare professionals. She is a keen advocate for open research, using reproducible practices in her own research such as sharing code and data, and is the lead organiser of the ReproducibiliTea journal club at the University of Leeds.

Speaker: Viktoria Spaiser, Associate Professor, School of Politics and International Studies

Biography: Viktoria Spaiser is an Associate Professor in Sustainability Research and Computational Social Science at the School of Politics and International Studies and currently a UKRI Future Leaders Fellow researching how we can accelerate social change in response to the climate crisis. She is a keen learner when it comes to open and transparent research, uses open research practices in her own research such as sharing code, data and preprints and explores more recently pre-registration practices.

Speaker: Marlène Mengoni, Associate Professor, School of Mechanical Engineering

Biography: Marlène Mengoni is an associate professor in computational medical engineering, researching how we can identify patient-specific sources of variation in musculoskeletal biomechanics and how this affects the performance of interventions and repairs. She has been an adopter of open-science since her PG studies and is an advocate of open-data and its relation to better research culture.


Tuesday July 26th 15:00-15:50 Code clinic - Conda package manager

Stream 2: Worsley 8.49N

Abstract: Setting up your software to get started with your research can often be a big stumbling block for researchers. With problems like admin rights, how to install packages and libraries and their dependencies causing untold headaches. In this code clinic, Nick Rhodes from the Research Computing team will introduce the conda package manager tool and show how it can be used to install the packages you need and configure environments to separate dependencies within your different projects.


Tuesday July 26th 15:50-16:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 16:00-16:50 Day 1: 10 minute talks session 4

Stream 1: Worsley 8.43X and Y

Flame instability of ammonia aerosol combustion: numerical simulations from astrophysics to industry

Talk time: 1600-1610

Abstract: The Transport sector uses 20% of the world's energy. The heavy-duty and off-highway sector (e.g. mining, construction, shipping and aviation) represents 25% of all transport, and is today almost exclusively powered by oil-derived fuels and thus contributes significantly to carbon emissions. This sector has a very wide range of machinery, leading to a range of different ways to decarbonise. Among those, alternative fuels, hydrogen and batteries are feasible solutions for short-range applications due to the feature of low-energy density (per volume). Ammonia (NH3) is likely the main fuel for the future long-range maritime and aviation sectors. Being carbon free, NH3 offers the possibility of fuelling gas turbines, fuel cells and reciprocating engines without direct CO2 emissions. However, for the successful application of ammonia as a fuel, one main challenge related to its combustion needs to be overcome: its low reactivity requires a high ignition energy, and leads to a narrow flammability range and low burning velocity. This complicates the stabilisation of the combustion flame and thus inevitably causes unreliable ignition and unstable combustion. We study the properties of NH3 aerosol combustion and focus on the flame stability using a numerical tool developed for astrophysics, namely the adaptive mesh MG code. We will present an overview of the model, a range of numerical results and will highlight the impact of people, PGR training and tools from astrophysics at Leeds in this industry-linked project and previous similar projects.

Speaker: Dr Christopher Wareing School of Physics and Astronomy

Biography: Dr Chris Wareing is a Fluid Dynamicist with 20 years research experience of using HPC. He has worked in varied areas in academia including astrophysics, physics, mathematics, chemical engineering and mechanical engineering. He has also worked in academia in HPC support, with industry on projects regarding carbon capture and storage and next generation manufacturing. Outside academia, he has worked with industry as a Fluid Dynamics Consultant and on and off in Science Communication. Currently, he is a Research Fellow in the School of Physics and Astronomy working on the flame stability of ammonia aerosol combustion.

Connecting Bradford with APIs for automated data linkage at a district level

Talk time: 1610-1620

Abstract: We have created Application Programming Interfaces (APIs) within Python to automatically clean, reformat, and link routinely collected public service data. The APIs populate connected tables of data for hundreds of thousands of citizens, and ensure these tables update regularly as more data become available. Complementary visualisation dashboards help researchers understand the data available within the resulting ‘Connected Bradford’ database. Connected Bradford contains linked health, education, social care, environment, and local authority data for citizens across the whole Bradford district. The dashboards allow researchers to obtain information on participant demographics, the size of a cohort of interest etc so that researchers can determine whether it is possible to test a hypothesis prior to a formal data request. For example, our team were able to establish the availability of health records linked with Department of Education data to address a research question requiring Early Years Foundation Stage Profile (EYFSP) scores across multiple time-points. We found that Connected Bradford had sufficient data to describe the relationship between EYFSP and Key Stage 1 attainment scores as a function of Special Educational Needs (SEN) status. The findings allow teachers to identify children requiring additional help within the classroom and thereby provide timely interventions.

Speaker: Megan Wood, Research Data Quality Analyst, Bradford Institute for Health Research

Biography: Megan is a Research Data Quality Analyst for ActEarly and BIHR. She received her PhD from the University of Leeds in 2021 which investigated influences of children’s sensorimotor control using data from the Born in Bradford cohort. Megan currently works on several projects, including the development of the Connected Bradford linked database. She is particularly interested in using creative visualisations to help people better understand data.

Model Builds Model: Mimicking Process-based Numerical Models using Machine Learning for Earth and Environmental Modelling

Talk time: 1620-1630

Abstract: Upscaling mechanistic-based Earth and environmental models from laboratory- and field-scale to global scales is a critically important yet hugely challenging task. One challenge is to maintain the accurate consideration of underlying processes at larger scales without inducing prohibitive computational expense. Another issue is the large number of unknown model parameters. To address these, Monte Carlo and machine learning techniques can be used to substitute process-based models such as reactive transport models in porous media. This procedure gives insight into the sensitivity of global models to unknown parameters, and the trained network (meta-model) can then be used for global domain simulations, e.g., elemental cycling in the Earth system. At the global scale, the unknown parameters can be dealt with by running the meta-model iteratively using a Monte Carlo approach, for every point of the global grid to conduct a forward prediction. The extremely large number of model runs (e.g., ~107 times) becomes possible within reasonable simulation durations (e.g., < a day) owing to the very low computational demand of the meta-model. We demonstrate the promising application of these algorithms for estimating global carbon burial efficiency in marine sediments.

Speaker: Peyman Babakhani, Research Fellow, School of Earth and Environment

Biography: Dr Peyman Babakhani is a Postdoctoral Research Fellow in the School of Earth and Environment at the University of Leeds. He obtained a dual PhD from the School of Engineering at the University of Liverpool (UK) and the College of Nuclear Science at National Tsing Hua University (Taiwan) in 2019. Peyman’s research crosses the areas of environmental nanotechnology, environmental modelling, environmental impact assessment, sustainability, and global biogeochemical modelling. He uses a broad range of modelling approaches individually or in combination, including mechanistic mass- and population-balance models, machine learning, stochastic models, and life cycle assessment techniques. He also devises new experimental approaches that can represent environmental problems at a laboratory scale to meet the data requirements for the validation of new models. Peyman would like to combine his knowledge and skills in the two fields of environmental nanotechnology and biogeochemistry to address today's environmental problems such as climate change and pollutants.

The variational and numerical modelling of water waves generated by numerical wave-tank

Talk time: 1630-1640

Abstract: Oceans are significant for human life as they are the means of international trading, food, and energy (renewable and non-renewable) resources. The exploitation of these resources is possible because of maritime engineering, which is concerned with the design of structures that can endure extreme hydrodynamic loads. Before the advent of computational advances, the study of hydrodynamic loads was done by scaled-model testing in wave basins which is an expensive and time-consuming process. However, with the increase in computational power, various sophisticated numerical models emerged to solve the intricate fluid dynamics problems. The most prominent high-fidelity models are based on Reynolds-Averaged Navier Stokes (RANS), Large-Eddy Simulations (LES), and Smoothed Particle Hydrodynamics (SPH). Although these models can accurately predict the complex non-linear wave phenomena, they are computationally expensive because ocean wave modelling requires a disparate range of scales. This project aims to develop a novel, cost-effective numerical wavetank based on the nonlinear variational potential-flow model to generate water waves and solve fluid-structure interaction problems without being computationally intensive. In addition, a novel way of implementing the model is explained which automates the process of calculating the weak formulations, subsequently reducing the time and human error in the coding process.

Speaker: Wajiha Rehman, PhD Student, School of Mathematics

Biography: Wajiha Rehman is a PhD student from the School of Mathematics. She is working as an Early-Stage Researcher (ESR) for the H2020 Marie Curie European Union European Industrial Doctorate (EID) program; Eagre/Aegir: High-seas wave-impact modelling. As an ESR, Wajiha will be working to develop a mathematical and numerical model to perform the wave-structure interaction analysis of water waves impact on a hyperelastic offshore wind-turbine mast by using Continuous Galerkin's Finite Element Method.

Learning the groundwater operator

Talk time: 1640-1650

Abstract: Computational modelling of subsurface flow is able to analyse and forecast the response of an aquifer system to a change of its state. Numerical methods calculate the hydraulic head by iteratively solving an implicit system of equations at each time step in the discretized time and flow domains. Despite great progress in these techniques, running the groundwater model is often prohibitively expensive, especially when the scale of the system is large. Machine learning has emerged as a promising alternative and in this study, I will present my current researcg on machine learned solutions to groundwater problems.

Speaker: Maria Taccari, PhD Student, School of Civil Engineering

Biography: I am a 2nd year PhD student at the University of Leeds, UK, working with Dr. Xiaohui Chen from School of Civil Engineering, Dr. He Wang and Prof. Peter Jimack from the School of Computing and collaborating with an applied research institute in the Netherlands, Deltares. I previously obtained a BSc and MSc with honors in Civil Engineering from University of Padua (Italy) and TU Delft (the Netherlands). For 5 years, I have then worked as an employee at the Dutch independent research institute Deltares where the idea of this research was conceived. My PhD research is on “Physics informed deep learning for groundwater prediction” and I generally find this new field of scientific machine learning really exciting!


Tuesday July 26th 16:00-16:50 Code clinic - Python Challenges

Stream 2: Worsley 8.49N

Abstract: Do you ever use Python? Would you like a go at some small code challenges? Patricia Ternes from the RSE team have developed some small challenges to test your python skills. Join this session to have a chat about python, have a go at the challenges and hopefully learn something new!

Speaker: Dr Patricia Ternes, Research Software Engineer, Research Computing

Biography: Patricia is a Research Software Engineer at the University of Leeds. She has a PhD in Theoretical Physics, and her research focuses on the development of computational models that help to understand complex systems; initially focusing on phenomena such as the behaviour of fluids at nanoscales and lately modelling emergence in spatio-social systems.


Tuesday July 26th 16:50-17:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Tuesday July 26th 17:00-17:30 Social

Stream 1: Worsley 8.43X and Y

Abstract: A chance for refreshments and meeting other attendees.


Wednesday July 27th 09:00-10:00 Registration

Stream 1: Worsley 8.43X and Y

Abstract: The registration desk will be outside 8.43X and Y, Level 8, Worsley Building.


Wednesday July 27th 10:00-11:00 Social

Stream 1: Worsley 8.43X and Y

Abstract: A chance for refreshments and meeting other attendees.


Wednesday July 27th 11:00-12:00 Day 2: Research Portfolio talks

Stream 1: Worsley 8.43X and Y

Calculating the Infrared and Terahertz Complex Permittivity of Crystalline Materials

Talk time: 1100-1120

Abstract:

The main aim of our research is to understand and interpret the vibrational spectra of a large range of molecular crystals. As such, we use a range of solid-state density functional packages including Castep, VASP, Crystal and CP2K along with additional phonon calculation tools such as Phonopy to determine the vibrational dynamics of these materials. In the majority of cases these calculations determine both the phonon frequencies and Born charges which can be used to determine the IR intensity of the vibrational modes. However, comparing these to the experimental spectra, particularly at low frequencies can be non-trivial because the experimental spectra can be influenced by multiple other parameters including the nature of the sample (powder or single crystal), the particle size and shape along with the measurement geometry. As such we have developed the python post-processing tool PDielec [1] which can be used to understand and visualise phonon calculations. As well as acting as a general python parser for a number of density functional packages, this tool provides a QT-5 based GUI that allows the visualisation of structures and vibrational motion, and a number of tools to understand differences and improve correlation between calculated and experimental spectra.

[1] https://doi.org/10.5281/zenodo.5888313

Speaker: Andrew Burnett, Associate Professor, School of Chemistry

Biography: I completed my undergraduate studies at the University of Bradford in chemistry with forensic science in 2004, I then gained my PhD in Electronic and Electrical Engineering from the University of Leeds in 2008. I continued my work at Leeds, through several postdoctoral positions until 2011 when I moved to the Faculty of Biological Sciences in Leeds as EPSRC Posdoctoral Fellow. After completion of the Fellowship in 2015 I moved to the School of Chemistry at Leeds as a Teaching Fellow before being awarded an EPSRC Early Career Fellowship and became a University Academic Fellow in 2016. I was promoted to Associate Professor in February 2022. My Research focusses on the development of Terahertz (THz) spectral instrumentation and the understanding of THz spectral measurements for a range of materials relevant the physical and life sciences with a particular focus on crystalline materials.

Nutrition and Lifestyle Analytics @ Leeds

Talk time: 1120-1140

Abstract:

Speaker: Michelle Morris, Associate Professor Nutrition and Lifestyle Analytics, Leeds Institute for Data Analytics

Biography: Michelle is an Associate Professor and Turing Fellow based in the Leeds Institute for Data Analytics and Leeds Institute of Medical Research at the University of Leeds. She leads the Nutrition and Lifestyle Analytics team. Her interdisciplinary research uses novel forms of data, including supermarket transaction records, for research into lifestyle behaviours and health. She is a co-investigator at the Consumer Data Research Centre and leads the research on the IGD’s Healthy and Sustainable Diets programme.

Use of large scale micro-data in understanding travel, activity and health exposure

Talk time: 1140-1200

Abstract: A better understanding of the interfaces between travel choices and the chain of consequential impacts is needed to improve liveability in urban spaces and communities globally. In particular, interfaces between transport choices and population health consequences have risen to the fore recently, in particular the disease burdens arising from exposure to pollutants, obesity consequences from inactivity and more recently virus transmission and well-being aspects. A number of models go some way to capturing the links between travel choices and other sectoral impacts, including the ability to explore transport-health interrelations. These models have, however, been largely developed based on traditional (and mainly aggregate) travel data sources such as travel diaries, fixed based traffic counts etc. Many of the models take a ‘snapshot’ of the transport system at fixed-base locations. In this presentation, recent research will be presented that harnesses large scale pervasive technologies and high resolution location based micro-data in order to address some of the societal challenges.

Speaker: Professor Susan Grant-Muller, Chair in Technologies & Informatics, Institute for Transport Studies

Biography: Prof. Grant-Muller's research is concerned with the use of new forms of data and technologies to inform socially and environmentally sustainable transport policy. A statistician by training, her interests lie in understanding the wider impacts of mobility infrastructure and policy, such as health, energy and carbon impacts. She works with micro-level user-generated data, involving multidisciplinary and cross-sectoral teams. Her research is concerned with the distributional effects of transport and transport related policies – how are different parts of the population and different geographies impacted by policies and strategies? Susan is an Alan Turing Fellow, Co-Director of the Leeds Institute for Data Analytics (LIDA): Societies Community and Co-I to the CDRC.


Wednesday July 27th 11:00-12:00 Code clinic - Docker

Stream 2: Worsley 8.49N

Abstract: Have you used Docker? Do you want to learn more about Docker? Join this code clinic session to share your experiences using Docker or learn more.


Wednesday July 27th 12:00-13:00 Lunch

Stream 1: Worsley 8.43X and Y


Wednesday July 27th 13:00-13:50 Day 2: 10 minute and poster talks session 1

Stream 1: Worsley 8.43X and Y

OpenInfra - open access data for transport research: tools, modelling, and simulation

Talk time: 1300-1310

Abstract: OpenInfra project is exploring the utility of OpenStreetMap (OSM) to support data-driven decision-making for local infrastructure planning and produce “transport infrastructure data packs” for local transport authorities in the England. Data on transport infrastructure is essential for planning inclusive streets, especially in the context of cycling, walking, and wheeling. OSM provides an extensive dataset on the existing transport infrastructure even if often lacking detailed information, such as the type of sidewalk surface. Besides the questions regarding data completeness, there are additional challenges concerning OSM data analysis. For example, inconsistent infrastructure tagging schemes sometimes result in queries returning infrastructure networks not reflective of the actual query. Moreover, bugs in the packages used to get OSM data might hinder the analysis and/or its reliability. Lack of in-depth documentation also makes it hard to understand how, for instance, cycling network is defined. Whilst many tutorials exist demonstrating how to add data to OSM, there is a lack of material on working with OSM data. One of the OpenInfra objectives is to address this gap. Thus, this talk will discuss OpenInfra project, the problems we have faced and our attempts to solve them in order to produce “transport infrastructure data packs”.

Speaker: James Hulse, Data Scientist, Leeds Institute of Data Analyticss

Biography: James Hulse is a data scientist with the LIDA data scientist development program. Having studied for an integrated masters for aerospace engineering at the University of Sheffield, completing a dissertation on event and topic detection algorithms within social media (Twitter) streams concerning data mining, textual analysis, and natural language processing. It was during this dissertation that James became fascinated by data and how it could be used to better understand and prioritise development of the world we live in.

Computer simulations of CO2 reduction over iron sulfide surfaces

Talk time: 1310-1320

Abstract: Iron sulfides have attracted wide research interest due to their catalytic properties towards the conversion of the greenhouse gas CO2 into value added chemicals and to mitigate global warming. Several iron sulfide phases have also been associated with the catalytic centres in hydrothermal vents, that are thought to have converted CO2 into the first small organic molecules according to several origin of life theories, collectively known as iron-sulfur world hypothesis. In this talk, we will discuss the development of realistic computational models to describe the surfaces of thiospinel structured materials using thermodynamic and kinetic arguments. We will show the impact of various partial oxidation degrees on the stability of the surfaces of the sulfides and to rationalise the core-shell iron sulfide-iron (hydr)oxide structure of the nanoparticles of the catalyst. We will also illustrate the effect of partial surface oxidation on the adsorption of H2O and CO2 and their catalytic conversion into a number of small organic molecules. Finally, we will present a comparison of the catalytic properties of partially oxidised irons sulfides and partially sulfurised iron oxides to explain the enhanced activity of the former material with respect to the latter.

Speaker: David Santos-Carballal, Senior Research Fellow, School of Chemistry

Biography: David Santos-Carballal received his BSc in Chemistry from University of Havana, Cuba, in 2007 and completed his MRes and PhD at University College London, UK. He was a Postdoctoral Research Associate at Cardiff University and is currently a Senior Research Fellow at the School of Chemistry of the University of Leeds, where he uses density functional theory-based calculations to understand the solid state and surface chemistry of materials for catalysis and energy applications. Dr Santos-Carballal was awarded a prestigious Postdoctoral Fellowship by the DST and NRF of South Africa in 2016, to carry out research at University of Limpopo.

Exploring the effect of Nb doping and ethylene carbonate adsorptions on the LiMn2O4 major surfaces: DFT+U study

Talk time: 1320-1330

Abstract: Cationic doping plays a crucial role in stabilizing and improving the electrochemical performance of cathode materials in Li-ion batteries. In LiMn2O4 cathode material, the incorporation of dopants reduces the number of trivalent manganese (Mn3+) ions that undergo a disproportionation reaction and limits Mn2+ ion dissolution into the electrolyte upon cycling. In this work, we discuss the effect of surface Nb doping and the adsorption of the electrolyte solvent, ethylene carbonate (EC) using the density functional theory (DFT+U-D3). Upon Nb doping, the surface energies of all the Nb-doped configurations increase as compared to the pristine surfaces, indicating destabilizing effect. Interestingly, the surface stability of the (111) surface improves, resulting in an enhancement of the (111) plane on the morphologies when Nb is doped on the second surface layers. Furthermore, the EC adsorption greatly prefers to bind with the surface when placed parallel to the facets, with the highest binding energy for Nb-doped on the (011) surface second layers. On the morphologies, the (111) surface plane upon adsorption further improves. However, we observed a minimal charge transfer between the doped surfaces and the molecule, which was dominated the electronic rearrangement within the EC molecule. This finding are interesting since exposing the (111) facet promotes the formation of stable solid electrolyte interface (SEI), which significantly limits the Mn-dissolution.

Speaker: Brian Ramogayana, PhD Student, Faculty of Engineering and Physical Sciences

Biography: Brian Ramogayana is a Ph.D. exchange student from the University of Limpopo - material modelling centre (MMC), South Africa visiting the University of Leeds (Prof N.H. de Leeuw’s research group) for 12 months. He started his Ph.D. in 2021 at the University of Limpopo, under the supervision of Prof P.E. Ngoepe and Dr K.P. Maenetja. His research is based on the modification of cathode materials for lithium-ion batteries and their electrolyte interactions upon cycling.

The bulk and surface properties of the monoclinic and orthorhombic FeNbO4

Talk time: 1330-1340

Abstract: The FeNbO4 materials has shown the potential of being used as the catalytic electrode to the hydrogen evolution/oxidation reaction. In this study, we have employed the Density Functional Theory (DFT) to study the pristine surfaces and the related dissociation reactions of H2 and H2O respectively. The simulation results show that there are three pristine surfaces, namely (010), (110), and (111), and those three surfaces in both the monoclinic and orthorhombic FeNbO4 showed a similar configuration, even if the distribution of the cations are totally different. We have also found that the oxygen within the water molecule prefers to get coordinated with the surface cations, forming the chemisorption, while the adsorption reaction between the H2 and surface atoms is so weak that only physisorption was formed here. In addition, out of those three surfaces, the (110) shows to be the most reactive one, which refers to that the lowest energy barriers of the dissociation reaction of the H2 and H2O occur on the (110) surface.

Speaker: Xingyu Wang

Challenging Deep Learning Models with Classical Arabic

Talk time: 1340-1342

Abstract: The use of artificial intelligence to understand human language has gone a long way, especially in English where deep learning models showed near-perfect results on several downstream tasks. However, their performance in classical Arabic texts is largely unexplored. To fill this gap, I evaluate state-of-the-art Transformer-based models on a binary classification task of identifying relatedness between the Quran and the Hadith, which are classical-Arabic religious texts with embedded meanings. Before conducting the experiment, the need for classical Arabic training and testing datasets is identified. Hence, I developed a methodology to automatically collect the training dataset of Quran-verse pairs by relying on existing Quran ontologies. Then to collect the testing dataset, I introduced an approach of automatically identifying related Quran-Hadith pairs from reputable religious websites by following rigorous heuristics. Once the datasets are created, the experiments begin by fine-tuning the models using PyTorch library, Tesla T80 GPU processors, and various hyperparameters of leaning-rate, batch size, number of epochs with early-stopping to avoid overfitting. The results show a ~20 points drop in F1 score across all the models which call for the imminent need to explore avenues for improving the quality of these models to capture the semantics in such complex texts.

Speaker: Shatha Altammami, PhD Student, School of Computing

Biography: Shatha Altammami is a lecturer in the Information Systems department at King Saud University, Saudi Arabia. She started her Ph.D. in October 2018 at the University of Leeds under the supervision of Professor Eric Atwell and Dr. Ammar Alsalka. Her current research interest is in Arabic Natural Language Processing (NLP) and Artificial Intelligence (AI). Shatha received her MSc in Information Security and Privacy with distinction from Cardiff University and her BSc. in Information Technology with honors from King Saud University.

The NETWORKx and PANDAS tools, and their use in fake news detection (withdrawn)

Talk time: 1342-1344

Abstract: Since the advent of social media and the exponential growth of online information, it is becoming entirely complicated to decipher the true from the false, which leads to the problem of fake news. In this case study, we utilised various tools to automate the process of detecting misinformation. The most distinguished and productive tools we employed are Pandas and Networkx. The Pandas is a Python library for data manipulation and analysis typical to statistics, finance, social sciences, and many other fields. At the same time, the Netwrokx is a Python package for creating, manipulating and studying complex networks' structure, dynamics and functions. We would like to showcase how we utilised these tools in combination in our research, which could be of interest to other fields and case studies.

Speaker: Saud Althabiti

Knowledge Representation for Islamic Research

Talk time: 1344-1346

Abstract: Recently ontologies have been used to improve the efficiency of different computational models such as search systems. The main aim of our research project is to utilize ontology as a source of Islamic knowledge representation avaliable for Quraan and Hadith to improve the Islamic search model. The first phase of our research is based on analyzing the coverage of Islamic topics in the Quranic ontologies focusing on the Hajj domain as a case study. For the evaluation process, we used one of the available ontologies called QuranOntology. The ontology consists of roughly 110,000 concepts with more than one million RDF triples. The Protégé tool was used to view the ontology, but the ontology's gigantic size has imposed a challenge. Therefore, the alternative solution was to use Apache Jena Fuseki platform to explore the ontology. After reconfiguring the tools, we noticed that the Protégé tool provided an excellent User Interface to browse the ontology, while Apache Jena Fuseki was more efficient for querying the ontology.

Speaker: Sanaa Alowaidi, PhD Student, School of Computing

Biography: Ph.D. student, School of Computing, University of Leeds


Wednesday July 27th 13:00-13:50 Day 2: 10 minute talks session 2

Stream 2: Worsley 8.49N

Using Resultative Connectors in Arab Students' Academic Writing (A corpus Based Study)

Talk time: 1300-1310

Abstract: Using discourse connectors is one of the problems learners face when writing an academic text. With the development of corpus tools, researchers are now able to investigate the actual usage of this linguistic feature. In this study, I will answer two research questions. First, how frequently advanced Arab learners of English use resultative connectors. For this question, I will follow the taxonomy adopted from Quirk's et al., (1985) which consists of six connectors (so, because, as a result, hence, therefore and accordingly). Second, whether Arab learners’ use of resultative connectors differs to that of native speakers of American and British English. The data for this study are taken from three corpora: BALK, consisting of texts written by either Arab first-year university students or third-year secondary school students, and the American and British English sub-corpora of the Pearson International Corpus of Academic English (PICAE). Both quantitative and qualitative approaches were used for the analyses and a Log-Likelihood (LL) statistical test was conducted to compare the frequencies of words between corpora. The results show that there is a significant difference between the groups,. with Arab students generally overusing resultative connectors.

Speaker: Amani Al Onayzan, PhD Student, School of Languages, Cultures and Societies

Biography: PhD candidate in Corpus Linguistics at the University of Leeds.

Optimisation models for sustainable fashion: An approach towards zero waste fashion design

Talk time: 1310-1320

Abstract: The fashion industry's impact on the environment is a critical global problem. One of the industry's global impact is affected by the amount of waste generated in the cutting stage of fabric. The distinction between the roles of fashion designers and marker makers caused the ”design” and ”make” processes to be linear, which allowed for more waste on the cutting floor. Some designers started exploring the Zero Waste Design concept, which means designers consider the allocation of pattern pieces while designing garments. However, this approach has been criticised for not allowing designers to have aesthetic control over designs. This research aims to transform the “design” and “make” processes from linear to circular and allow designers to have aesthetic control when designing a zero-waste or a minimal waste design. By developing an optimisation algorithm based on the overlap minimisation problem that aims to minimise the waste generated from the cutting stage of the fabric where irregular shapes need to be cut from a fabric roll, also known as the “Two Dimensional Irregular Strip Packing Problem”. Through a collaboration with a designer, we aim at integrating the designer’s design-making process with the algorithm design to inspire the designer to make adjustments to the initial design through an iterative process until a satisfactory marker with minimal to zero waste is achieved.

Speaker: Nesma ElShishtawy, PhD Student, Faculty of Business

Biography: I am part of a team working on a multidisciplinary project between the Business School and the School of Design that focuses on finding optimisation models for sustainable fashion, which works on different solutions to reduce the amount of waste generated during the cutting stage of fabric, additionally, we are working on the post-consumer textile waste problem and upcycling. The fashion industry impact on the environment is a significant global problem. One avenue for reducing the industry’s global impact is by reducing the amount of waste generated in the cutting stage of fabric. This research aims to transform the “design” and “make” processes from linear to circular and allow designers to have aesthetic control when designing a zero-waste or a minimal waste design. By developing an optimisation algorithm that aims to minimise the waste generated from the cutting stage of the fabric where irregular shapes need to be cut from a fabric roll, also known as the “Two Dimensional Irregular Strip Packing Problem”. Through a collaboration with a designer, we aim at integrating the designer’s design making process with the algorithm design to inspire the designer to make adjustments to the initial design through an iterative process until a satisfactory marker with minimal to zero waste is achieved.

Make Academic Findings Shiny: Web Application Development and Best Practices for Using R Shiny to Communicate Research Results

Talk time: 1320-1330

Abstract: Shiny and related packages provide a framework for creating web applications using R programming language. Through the use of interactive widgets, the outputs such as plots, tables or maps change based on the input from the user. Therefore, R Shiny provides a compelling way to share research findings. Despite the advantages, the extensive functionality of Shiny may serve as a hindrance when starting to use the tool. This talk is a walk-through of the dashboard that I developed for the Urban Transport Modelling for Sustainable Well-Being in Hanoi project (https://urban-analytics.github.io/UTM-Hanoi/intro.html) as a communication tool of the household survey analysis. I discuss both the code and the functionality including related packages such as `golem`, `shinydashboard` and `DT`. Finally, useful practices (modules, deployment and app structure) and starting-point resources are shared.

Speaker: Kristina Bratkova, Data Scientist, Leeds Institute of Data Analytics

Biography: Kristina Bratkova studied BSc Mathematics and Music (2018-2021), but she recently transitioned to the world of data science and is currently working as a Data Scientist at the Leeds Institute for Data Analytics. Kristina’s research interests are urban analytics, GIS and everything R-related. She created R Shiny dashboards to communicate the findings of the transport survey analysis conducted for the Urban Transport Modelling for Sustainable Well-Being in Hanoi project—supervised by Prof Nick Malleson and Prof Lex Comber from the School of Geography.

Save your tears for the data: A touch of Docker in a Data Scientist's workflow

Talk time: 1330-1340

Abstract: Many data science teams have become multilingual, leveraging R, Python, Julia and friends in their work. Into the bargain, different data scientists have different preferences in their coding environments and operating systems. While this diversity allows data scientists to work with the tools they are most comfortable with, it can become a pain to share the same projects on different machines with different configurations. This talk illustrates how data scientists can leverage Docker containers to create portable, reproducible and tailored development environments, which can be instantiated reliably in different environments, operating systems and hardware. Data scientists can therefore focus on what they love and do best (i.e data science) without having to worry about the hassle required to reproduce their work, deploy their analysis dashboards or even deploy their models.

Speaker: Eric Muriithi, Data Scientist, Leeds Institute for Data Analytics

Biography: Eric Wanjau is an Early Career Researcher at the Leeds Institute for Data Analytics. He holds a BSc in Electrical and Electronic Engineering (2021) and his research interests span domains such as robotics, computer vision, signal processing and more recently urban analytics. Under Prof Nick Malleson, he created spatial and machine learning models to address the attitudes towards a proposed motorbike ban in Hanoi and the corresponding impact on infrastructure and travel behaviours. He is currently working with Prof Geoff Hall and Prof Serge Sharoff in creating clinical NLP models for extracting oncology entities in electronic health records. Eric is keen on carrying out real-world relevant research for the public good.

Leeds Analytics Secure Environment for Research (LASER) - an overview

Talk time: 1340-1350

Abstract:

The Leeds Analytics Secure Environment for Research (LASER) is a University of Leeds (UoL) service hosted in the Leeds Institute of Data Analytics (LIDA) on Microsoft Azure.

LASER offers the combination of meeting the highest standards of security for data analytics, ensuring ISO27001 and NHS Data Security and Protection Toolkit compliance with the flexibility to enable constant agility in design and function; alongside scalability depending on the researcher need.

LASER is the platform upon which we can build and host Virtual Research Environments (VREs). In their simplest form, a VRE is a virtualised environment consisting of virtual machines and shared storage where data flow is strictly controlled. Taking a 'walled garden' approach, there is no access to the internet or other networks from inside a VRE.

The LASER Platform has been designed with and for researchers and includes the following capabilities:

Speaker: Adam Keeley, Data Analytics Team Manager, Leeds Institute of Data Analytics

Biography: Leeds Institute of Data Analytics (LIDA). As well as supporting the use of Leeds Analytics Secure Environment for Research (LASER) the DAT provide expertise in data handling, data wrangling, data curation and quality standards, data linkage, database set up, software development and data visualisation. Before joining the University of Leeds Adam worked in data based Management Information roles in both public and private sector. Used to manipulating personally and commercially sensitive data in bulk he is familiar with issues surrounding both data management, quality and governance.


Wednesday July 27th 13:50-14:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Wednesday July 27th 14:00-14:55 Day 2: Lightning talks and poster presentations

Stream 1: Worsley 8.43X and Y

Pinnipeds Comparative Genomics (so far..)

Talk time: 1400-1402

Abstract: The evolutionary history of Baikal and Caspian seals which inhabit large landlocked waterbodies in Central Asia is still ambiguous. Having a fully resolved phylogenomic tree for pinnipeds is key to understanding patterns of speciation and identifying genes underpinning adaptions in this group. In this study, we seek to reassess the taxonomic position of the enigmatic Caspian/Baikal seal and their related sister taxa (Grey, Ringed seal). We will incorporate a Baikal seal (Pusa sibirica) genome assembly that is recently available online in a phylogenomics reconstruction with our existing assemblies for Caspian seal (Pusa capsica) and hooded seal (Cystophora cristatus), with other publicly available pinniped genomes. The P.sibirica genome will first be annotated in suitable pipelines (e.g. BRAKER which employs the tools, GeneMark -ES/ET and AUGUSTUS); then computationally infers gene candidates based on similarity of sequences in public repositories for functional annotation. To investigate their phylogenetic relationships, a set of one-to-one orthologous genes will be identified using OrthoMCL, and these datasets will be used to generate Maximum Likelihood phylogenomic trees using RaxML based on suitable sequence evolution model. Subsequently, the trees will be used as basis for selective pressure analyses. Possible pitfalls that might be encountered during these analyses will be discussed.

Speaker: Shairah Abdul Razak, Visiting Research Fellow, Faculty of Biological Sciences

Biography: Shairah is a Senior Lecturer in Genetics Program under Faculty of Science & Technology, Universiti Kebangsaan Malaysia (UKM, The National University of Malaysia). She's also affiliated with Department of Applied Physics under the same faculty. Her field of expertise include molecular ecology and quantitative biology, population genetics, genomics, and metagenomics.

Medical image reconstruction under Bayesian modelling

Talk time: 1402-1404

Abstract: Due to loss of information during the scanning process, the observed image is often blurred and contains noise. As a result, the observed image is generally degraded and is not helpful for clinical diagnostics. Bayesian methods have been identified to be particularly useful when there is access to limited data but with a high number of unknown parameters. Hence, our research--- under the Bayesian hierarchical modelling structure employing multiple data sources---aims to develop new reconstruction methods capable of providing more robust results with increased image quality.

Speaker: Muyang Zhang, PhD Student, School of Mathematics

Biography: Before I came to the UK, I finished my undergraduate and master degrees in China. My primary background was economics, but I became 'more statistician' after receiving another master programme training in advanced Statistics from the University of Glasgow. Currently, as a third-year PhD in the University of Leeds, I have been looking into applying Bayesian modelling to enhance medical imaging by combining various sources such as PET and MRI.

Coarse-grained mesoscale rod simulations of fibrinogen under extensional flow

Talk time: 1404-1406

Abstract: The Fluctuating Finite Element Analysis (FFEA) software uses continuum mechanics to model proteins at the biological mesoscale as 3D tetrahedral meshes and 1D elastic rods that deform viscoelastically in response to thermal noise. Tetrahedra additionally experience repulsive excluded volume and attractive surface-surface interactions, neither of which are included in the rod model, but are necessary to describe protein function. Viscous drag is represented as an isotropic force acting on mesh elements due to a stationary background fluid. Fibrinogen (MW ~ 340 kDa) is a fibrous protein that polymerises into a fibrin network to form a crucial supportive component of blood clots. The effects of shearing flow on fibrin(ogen) are well-documented, but less is known about extensional flow, which is predicted to elongate von Willebrand Factor, another fibrous clotting factor, significantly more than shear. Extensional flow-induced aggregation of antibodies has been demonstrated at typical industrial strain rates. My PhD aims to improve the simulation capability of the FFEA rod model by implementing protein-protein interactions and viscous effects such as extensional flow and complex hydrodynamic interactions. FFEA simulations of fibrinogen under flow will be experimentally validated by studying its aggregation propensity in-vitro at physiological strain rates consistent with healthy and stenotic arteries.

Speaker: Ryan Cocking, PhD Student, School of Molecular and Cellular Biology

Biography: Ryan has a background in computational physics and is currently a Wellcome Trust PhD student at the Astbury Centre for Structural Molecular Biology. His research is focussed on developing the rod model of the Fluctuating Finite Element Analysis (FFEA) protein simulation software, with the main codebase written in C++ and an accompanying tools suite written in Python. When he is not pondering variable names or chasing down bugs, he is dabbling in wet lab experiments that will hopefully validate the simulations he runs in the future.

Big data to useful data: making use of large scale GPS data.

Talk time: 1406-1411

Abstract: The increasing use of secondary and commercial data sources paired with the decreasing cost of GPS enabled devices has seen a huge increase in the amount of GPS data available to researchers. These data are valuable in understanding human behaviour and how the environment around us influences our health. This presentation takes you thought the method developed to clean individual level GPS data to the OpenStreetMap Network and calculate environmental exposure(s). Scalable to large-scale GPS data with global coverage this code allows reproducible comparisons of environmental exposures across a wide range of spatial contexts. Moreover, as the code is open source and sits along existing open source packages it provides a valuable tool for policy and decision makers.

Speaker: Dr Francesca Pontin, CDRC Research Data Scientist, Consumer Data Research Centre

Biography: Fran Pontin is a CDRC Research Data Scientist whose research interests lie in the use of consumer data to capture and better understand health behaviours and drive targeted policy change.

Shared Journeys: Aggregating GPS Data into a Shareable Product

Talk time: 1411-1416

Abstract: The Consumer Data Research Centre provides researchers with access to a variety of open, safeguarded, and secure datasets. This talk will explain a sample process of using a securely held mobility dataset, hosted by an external company, to create a derived aggregated dataset that the CDRC is permitted to share with other researchers on our data store. We will discuss important considerations when working with individual-level GPS data such as time zone handling, detection of unreliable data, and preservation of anonymity. We will give an overview of the aggregation process itself and explore how different spatial and temporal resolutions can impact the derived dataset.

Speaker: Dustin Foley, Research Data Scientist, Consumer Data Research Centre

Biography: Dustin Foley is a Research Data Scientist for the Consumer Data Research Centre (CDRC) at the University of Leeds. His research interests include transport modelling, diet optimisation, and natural language processing.

Edubots: Chatbots for university education and student success

Talk time: 1416-1421

Abstract:

There is a growing interest in the use of chatbots in Universities, as they can provide efficient and timely services to students and educators; eg see UoL Strategy 2020-2030

The EDUBOTS project, funded by Erasmus+, explored best practices and innovative uses of university chatbots. We implemented case study chatbots using platforms HUBERT for student feedback, and DIFFER for student interaction and fostering a sense of belonging. Our case studies and surveys provided feedback from students and educators regarding the possible uses of chatbots in higher education (HE). We present our key findings:

Educators and students agreed the importance of chatbots for

Responding to FAQs that relate to administration, eg admissions, IT helpdesk, Student Success service;

Conducting tests and quizzes with students to assess their conceptual understanding of a topic;

Feedback from students to the instructor for course evaluation.

Students were much keener than educators on "offering personalised feedback to students on their conceptual understanding of a topic" and "offering tutorials to students related to courses". Social aspects were not very popular with the educators group. On the other hand, more students wanted a chatbot that can facilitate communication with their mentors and establish study groups within their courses.

Speaker: Noorhan Abbas, Research Fellow, School of Computing

Biography: Noorhan Abbas is a Research Fellow in the School of Computing at the University of Leeds. She conducts research in artificial intelligence (AI), data mining, text analytics and data science with a special focus on the use of chatbots in Higher Education institutions. She also works as a Teaching Fellow in our online distance learning MSc programme in Artificial Intelligence.

Simple Transformers Question Answering Model for Finding Answers to Questions from Qur'an

Talk time: 1421-1426

Abstract: Recently, research tend to develop question answering systems for Arabic Islamic texts, which may impose challenges due to Classical Arabic. The Qur'an Question Answering shared task competition (Malhas, 2022) is providing Qur'anic Reading Comprehension Dataset that has 1093 tuples of question-passage pairs. The passages are verses derived from the Holy Qur'an while the questions are set to be in Modern Standard Arabic. Simple Transformers is a library, which is based on Transformers architecture, that provides models for specific tasks such as Classification, NER and QA models. We used the Question Answering (QA) model along with three Arabic pre-trained language models, which are AraBERT (Antoun, 2020), CAMeL-BERT (Inoue, 2021), ArabicBERT (Safaya, 2020). The QA model is set to return five answers ranking from the best to worst based on their probability scores according to the task details. The official measurement is partial Reciprocal Rank (pRR) (Malhas, 2020). Our experiments with development set shows that AraBERT V0.2 model outperformed the other Arabic pre-trained models. Therefore, AraBERT V0.2 was chosen for the test set and it performed fair results with 0.445 pRR score, 0.160 EM score and 0.418 F1 score. The team ranked 7th on the test set while the first place scored 0.586 pRR score, 0.261 EM and 0.537 F1 score (Malhas, 2022).

Speaker: Abdullah Alsaleh, PhD Student, School of Computing

Biography: Abdullah Alsaleh is an academic member at the Department of Information Systems at King Abdulaziz University in Jeddah, Saudi Arabia. Currently, he is a second-year PhD student at the School of Computing, supervised by Prof Eric Atwell and Dr Abdulrahman Altahhan. His research interest is in Natural Language Processing particularly in semantic similarity between religious Arabic texts using Transformers-based models.

Modelling Ambient Populations under Different Restriction Schemes

Talk time: 1426-1431

Abstract: How have cities changed during the pandemic? Which changes will remain as the pandemic subsides? This LIDA project addresses the above questions by building upon previous CDRC-funded work and creating an open-source spatial-temporal machine-learning model to predict overall change in footfall at specific city-centre locations. It also will consider the local urban configuration, external factors (like weather conditions) and, importantly, the heterogeneous impact of various mobility restriction measures. The model is currently being trained using pre-pandemic footfall data provided by the project's external partner Leeds City Council and different lockdown restriction conditions will be incorporated thereafter. A functional dashboard is additionally being developed to present related visual outputs i.e. graphs and maps to help policymakers easily explore different scenarios. Although based in Leeds, it is expected that the work will be generalizable to other cities and ultimately, we aspire to attract further funding to construct a nationwide footfall model, which would present as a great methodological advancement and an attractive contribution to furthering the public health and urban developments. This talk will mainly be an introduction of the project, then a presentation of the work carried out so far led by a discussion of the next steps and plans.

Speaker: Indumini Ranatunga, Data Scientist, Leeds Institute of Data Analytics

Biography: Hello! I'm Indumini! Having spent two decades in the UAE and Sri Lanka, and after experiencing first-hand the positive impact that technological development can have on a country, I began to slowly develop a budding fascination with tech. Later, with the realisation that my curiosity and wanderlust could no longer be kept bounded, I decided in 2018 to set sail for the UK in search of better prospects. By the summer of 2021, I successfully graduated from the University of Leeds with a bachelor’s in Electronics and Computer Engineering. Having had my interest piqued by a glimpse of the value and variety of data science applications discovered through my final year project research, and seeing that my wish to work for a meaningful cause can be fulfilled by joining LIDA, I decided to apply for their internship programme. At the moment, I am thoroughly enjoying the programme, the proof being my excitement at the chance to meet all of you, learn about other amazing works and discuss a project that I am currently involved in (and quite passionate about) through the Research Computing Leeds Conference! When I am not reading about machine learning algorithms or sharing my data science encounters with others however, you can often find me spending my downtime listening to music and journaling away in a corner. :)

Computer Simulations of Post-Translationally Modified Microtubules

Talk time: 1431-1436

Abstract: Microtubules are hollow, cylindrical macromolecules made of a dimeric protein called tubulin. They play a vital role in many cellular functions ranging from motility, to separating DNA strands during mitosis, and acting as tracks for molecular motors. These structures are highly decorated with chemical modifications inside living cells. It is believed that these modifications form a code that both directly and indirectly regulates the structure and dynamics of microtubules, affecting many downstream factors. The difficulty in resolving structures of modified microtubules and our incomplete understanding of microtubule-associated proteins have hindered our ability to decipher this code with lab-based studies. My project aims to develop a methodology by which modified microtubules can be studied using molecular dynamics simulations in Amber. This would provide models of modified microtubules in motion and in atomistic detail. Using these, we can begin to link patterns in structural or dynamic changes to specific modifications, predict modified microtubule behaviour inside a cell and the mechanisms that underpin these behaviours. For example, I have observed that poly-gamma-glutamate chains added to beta-tubulin tails may be able to destabilise microtubules by sequestering inter-protofilament interactions, causing tears in the structure.

Speaker: Christopher Field, MRes Student, Faculty of Engineering and Physical Sciences

Biography: Chris is a biologically-minded MRes student at the Faculty of Engineering and Physical Sciences, interested in applying molecular dynamics simulations and other in silico techniques to provide new insight into tricky biological problems.

First Principles and Molecular Dynamics Modelling of a Mucoid Pseudomonas aeruginosa Biofilm Extracellular Matrix

Talk time: 1436-1441

Abstract: Mucoid Pseudomonas aeruginosa is a prevalent Cystic Fibrosis (CF) lung coloniser and the chronicity of such infections is definitively associated with this bacterium's ability to form an anionic, linear, acetylated alginate rich exopolysaccharide (EPS) biofilm matrix. This talk will detail how atomistic modelling, based on quantum chemical Density-Functional Theory (DFT) and classical Molecular Dynamics (MD) - performed using the CASTEP and DL_POLY4 codes respectively - has been deployed on ARC4 HPC architecture to construct the first atomistic (computer) models of the mucoid P. aeruginosa EPS to be structurally representative of the in vivo scaffold. These models have served to draw biophysical relationships between the mucoid P. aeruginosa EPS structure and bacterial virulence in the lungs of CF patients. Explicitly, these models have shed light onto the critical influence that CF sputum ions possess over biofilm matrix chronicity as well as rationalising the atomistic origins of the discontinuous, dendritic, bulk EPS architecture observed in vivo. The motion of bacterial messengers through the EPS has been accurately described using these models and, as such, these models are serving to identify structural chemistry critical to development of novel “EPS-penetrating” antimicrobials.

Speaker: Oliver Hills, PhD Student, Faculty of Environment

Biography: I am a final year PhD student in the Computational Chemistry, Biomaterials and Interfaces Modelling group within the School of Food Science and Nutrition at the University of Leeds. I am interested in using quantum chemical and molecular dynamics simulation to study the conformational energetics, structural chemistry and physico-chemical interactions that drive virulence, pathogenicity and chronicity within mucoid P. aeruginosa biofilm infections in the lungs of Cystic Fibrosis (CF) patients. During my PhD research I have used simulation to elucidate the key geometrical principles behind cation-induced biofilm matrix stabilisation, in addition to elucidating the molecular functionality and site-directed interactions that permit the motions of cell-to-cell signals through the biofilm matrix. More recently, I have been using simulation to predict the effects that novel therapeutics have on biofilm matrix morphology.

SketchEngine for large-scale text data collection and analysis

Talk time: 1441-1446

Abstract: SketchEngine is a tool for collection and analysis of large text data-sets, konwn as corpuses. You can upload your own text corpus if you already have one, and/or use one of the pre-existing corpuses in many languages, such as EnTenTen (10000000000 words of English texts) or ArTenTen (10000000000 words of Arabic) etc; and/or use the SketchEngine web-crawler to harvest a corpus to your specifications. You can analyse your corpus in various ways, e.g. against a Gold Standard to highlight key words and phrases with specialist meanings and uses. I have used SketchEngine in Computing research and teaching, for example to collect a world-wide Corpus of National Dialects. SketchEngine comes with easy-to-use graphical interface, tutorials and guides, a world-wide user community and excellent technical support. I will demonstrate a use of SketchEngine to collect a corpus of Leeds University management and administration language, and use this to uncover some of the new English developed by Leeds University management, such as "curriculum redefined" "student success", "educational engagement", "decolonizing the curriculum".

Speaker: Eric Atwell, Professor of Artificial Intelligence, School of Computing

Biography: Eric Atwell is Professor of Artificial Intelligence for Language in the School of Computing at the University of Leeds, leading a research group of 17 PhD students and postdocs. Eric teaches Data Mining and Text Analytics at BSc and MSc levels, both online and on-campus modules. Eric is also School of Computing DPGR Director of Post Graduate Research; and Programme Leader for our fully-online MSc programme in Artificial Intelligence (AI), including the online AI student project. Eric also works 40% in Leeds Institute for Teaching Excellence, as a LITE Fellow in AI for decolonizing reading lists; and he is a Turing Fellow with the Alan Turing Institute for AI, researching AI in education.

Developing an Azure Platform for Vehicle Emissions Measurements - The CARES Project

Talk time: 1446-1451

Abstract: The CARES project aims to improve the instrumentation and data science methodologies for analysing on-road vehicle emissions. As part of this project we have developed an Azure Cosmos DB platform and web apps to interact with the data. These apps have been shared with various stakeholders including academic, government and SME stakeholders. We are still very much at the beginning of our journey and are far from fully realising the potential of this approach but we would like to show what we have done so far and get feedback and advice on how to move it forward.

Speaker: Christopher Rushton, Research Fellow, Institute for Transport Studies

Biography: Chris Rushton is a post-doctoral research fellow in the Institute for Transport Studies. Chris' areas of research include air pollution and its links to transport, and the effectiveness of clean air zone policy. Prior to working at ITS, Chris was a senior technologist at Connected Places Catapult where he developed and delivered projects related to the development of commercial solutions to air quality problems in the UK, EU and farther afield. Chris is interested in creating research and tools to help air pollution legislators and policy makers develop interventions that deliver cleaner air for the tax payer in a fair and cost-effective manner. Chris is just beginning his journey into new technology based research methods and is here both to inform people of the work being done in his field and also to learn how to improve his work from more experienced practitioners.


Wednesday July 27th 14:00-14:55 Day 2: 10 minute talks session 3

Stream 2: Worsley 8.49N

Comparative analysis of WEKA and RapidMine (withdrawn)

Talk time: 1400-1410

Abstract: Data Mining mainly aims to acquiring meaningful information from given data. This term covers applying suitable functional methods such as (classification, clustering, association, etc.) to get desired output and knowledge. One of the data mining sectors is text mining, which concentrates on text analysis to get meaningful information as knowledge. This work shows a comparative analysis of WEKA and RapidMiner as useful Machine Learning open-source toolkits. Both tools have the capability to handle text mining tasks in easy, automatic, and professional way. So, the illustration includes the entire text mining experiments have been conducted, starting from converting dataset format, going through building classification models, and ending with presenting results. Also, the comparison explains the advantages and disadvantages of both toolkits as ML platforms and their constructions in terms of user interface, dataset format, classification methods and techniques, related extensions, the process building stage, visualization, and the results. This case presentation seeks to present how each toolkit could be used for data mining and text analytics, which could help AI learners and researchers to select a suitable toolkit to achieves their works.

Speaker: Alaa Alsaqer, PhD Student, School of Computing

Biography: I am a PhD student at School of Computing, University of Leeds, and a Lecturer at College of Computer Science & Information Technology, King Faisal University.

Messagepack: Data Compression tool for your big dataset To Train Neural Network

Talk time: 1410-1420

Abstract: One common challenge in Training Neural Network Model (especially supervised learning) is the necessity of a large dataset. In my research case, Training Neural Network for Fluid Dynamics related to vorticity propagation in 3D Domain requires hundred of thousands of simulation frames and hundreds of scenes of ground truth data which lead to hundreds of gigabytes of data. Since RAM and GPU memory is constrained, how could we deal with this large size of data? Could we compress it to reduce the memory footprint while maintaining portability? Here the MessagePack comes. MessagePack is an efficient binary serialization format that lets you exchange data among multiple languages like JSON. However, it's faster and smaller. In my research, I implemented a pipeline to transform, read and write the data into chunks of MessagePack. The data become portable and ready to train in batch.

Speaker: Dody Dharma, PhD Student, School of Computing

Biography: Dody is a Postgraduate Researcher at Visualization and Computer Graphics Group , School of Computing , University of Leeds. Prior to that, He has been part of the Informatics Research Group at ITB, Institut Teknologi Bandung – Indonesia. Currently Dody is doing research on 3D realistic and interactive fluid and deformable solids simulation with deep learning. The research focuses on building Neural Network models to accelerate fully coupled fluids-deformable solids simulation for content in realtime application(e.g., AR,VR, or Games). His research also related to analysis of vorticity conservation in neural network based fluid simulation.

Generative Modeling for Shapes as Graphs without Correspondences using Graph Convolution Networks and Attention

Talk time: 1420-1430

Abstract: Graphs are powerful data structures to represent surface meshes via nodes and edges, modelling vertices and their spatial connections, respectively. In recent years, generative modelling for shapes as graphs has emerged as an active research area. However, developing statistical generative models from multiple instances of graphs imposes significant challenges as the structure of the graphs may vary across the training data. To overcome this limitation, we propose an unsupervised framework to learn a probabilistic deep generative model, applicable to datasets including shape surface mesh data with no correspondences. First, a synergy of a graph convolutional and attention networks (GCN-ATT) establishes a vertex-to-vertex correspondence between the graphs in the latent space while learning an atlas mesh. Subsequently, a variational autoencoder (VAE) learns a probability density function from a set of structurally normalised graphs in the 3D space. As a result, this framework enables us to generate realistic graphs from shapes having an inconsistent number of vertices and connections. To demonstrate the versatility, we apply the method to real mesh (as grid graph) datasets obtained from cardiac magnetic resonance and liver CT images. To the best of our knowledge, this is the first work that uses such an approach for synthetic 3D organ shape generation from a population of inconsistent shape structures. We show that the proposed GCN-ATT-VAE network can generate realistic synthetic graphs, as evidenced by both quantitative and qualitative results.

Speaker: Soodeh Kalaie, PhD Student, School of Computing

Biography: Soodeh is a Ph.D. student in the School of Computing at University of Leeds. In September 2019 she joined CISTIB group as a fully funded Ph.D. student co-supervised by Prof. Alejandro Frangi and Dr. Ali Gooya. Where her research mainly focuses on deep generative models for anatomical shapes. Her main research interests are computational medicine, machine learning, deep learning, and their applications to irregular data, focusing mainly on graph neural networks.

Unravelling the chemical mechanisms of kidney stone growth

Talk time: 1430-1440

Abstract: The key chemical interactions relating to kidney stone crystallisation and aggregation are not fully understood. Kidney stones are solid clusters of small stones, composed of crystals that have precipitated from urine and built up on the luminal surface of the epithelial cell surfaces of the kidney microtubule. Mineral accounts for 97% of a kidney stone with the remaining material being organic matrix, such as proteins and amino acids. This research uses the first principles modelling code, CASTEP on ARC3 and ARC4, to help elucidate the crystallisation phenomena and unravel the chemistry behind stone composition. To begin to understand the nucleation process, we have constructed surface models of calcium oxalate monohydrate and calcium oxalate dihydrate and modelled stone growth, simulating further calcium oxalate adsorption onto these surfaces. Next, as the interactions between urinary macromolecules and crystal surfaces at an atomic level are unexplained, we performed ab initio molecular dynamics of phosphocholine adsorption on the surfaces and have shown that the phosphocholine head groups become entrapped within the growing crystal. To investigate the interactions between growing crystal surfaces and a known inhibitor, citrate, we have further performed geometry optimisations of citrate adsorption on calcium oxalate surfaces.

Speaker: Rhiannon Morris

A modular, open-source pipeline for grading of Follicular Lymphoma with rapid transfer to other tasks

Talk time: 1440-1450

Abstract: Cytological grading of Follicular Lymphoma (FL) involves identification and quantification of cancerous cells within lymph nodes. When conducted manually, it is only feasible for clinicians to sample ten sites from hundreds available. The process therefore suffers high sampling error and high inter-observer variability. Given these limitations, the utility of clinical cytological grading has been low. Advances in image-based deep learning have led to creation of models with expert-like performance in cancer classification. We report the development of a deep learning pipeline for automated grading of entire tissue samples, containing millions of cells. The methods are robust to technical variability, such as differences in staining; can be executed on consumer-level computing resources and can be adapted to other disease cases within a working week. Using the novel pipeline, we have conducted cell-level analysis of one of the largest FL image sets in the World. This has uncovered relationships between cytological grade and patient outcome. Through continued refinement, we aim to set the gold standard for cytological grading of large images, with clinical application.

Speaker: Volodymyr Chapman, PhD Student (AI in Medical Imaging), Faculty of Biological Sciences

Biography: Training first as a biochemist in Imperial College London and the John Innes Centre, Norwich, Volodymyr pivoted to image-based deep learning in 2020. He completed a 6 month predoctoral internship in image-based encodings of sequence data with Professor Dan MacLean at The Sainsbury Laboratory, Norwich. Following this, he started a PhD with Professors David Westhead and Reuben Tooze in Leeds, using deep learning to improve patient care in follicular lymphoma. In his free time, Volodymyr enjoys brewing, growing food and eating.


Wednesday July 27th 15:00-15:50 Discussion Panel: Sustainability and research computing

Stream 1: Worsley 8.43X and Y

Abstract: Computational communities are becoming increasingly aware of the energy cost associated with high performance computing, and the associated environmental footprint. In this panel we will discuss how to ensure that our computational research is performed responsibly. We will consider factors such as: 1. Making supercomputing “green” 2. Optimising code efficiency to minimise energy consumption 3. Simplifying our scientific questions to make our calculations less costly

Speaker: Benjamin Hanson, Lecturer, School of Physics and Astronomy

Biography: Ben's research career has focused on the development & application of novel computational methods to biophysical systems. He has studied the molecular motor dynein by developing continuum mechanical simulation techniques, colloidal and protein-based hydrogel systems using the bespoke simulation software “BioNet”, and protein unfolding by developing multiscale mathematical methods. He is currently moving his focus towards teaching and scholarship in the Physics Education Research Group at the University of Leeds, and is beginning his research into the pedagogical potential of virtual reality in higher education physics.

Speaker: David Head, Lecturer, School of Computing

Biography: David has a long/old research career in using simplified models to probe complex systems, where the very process of deciding what simplifications can and cannot be made adds to the insight and understanding of the problem, whilst also reducing cluster load and generally being a good cluster citizen. Current areas of interest include fibre networks and multi-component modelling of bacterial colonies - for the latter, the greatest challenge often lies in convincing an application domain unused to modelling that the results of your simplified model still tell you something about the real world.

Speaker: Sarah Harris, Associate Professor, School of Physics and Astronomy

Biography: Sarah Anne Harris (SAH) is a Principal Investigator in the School of Physics and Astronomy and the Astbury Centre for Structural and Molecular Biology at the University of Leeds. Her research aims to understand the mechanism underlying biological processes such as molecular recognition, DNA packing and the operation of molecular motors. She uses simulations of biomolecules on the both the atomistic and mesoscales, and is part of core team developing the Fluctuating Finite Element Analysis (FFEA) software for mesoscale simulation at Leeds. She has a strong interest in High Performance Computing (HPC), and promotes its use in the biosciences through the Computational Collaborative Project for Biomolecular Simulation (CCPBioSim).

Speaker: John Hodrien, Research Software Engineer, Research Computing

Biography: John is a Research Software Engineer working within the Research IT team at the University of Leeds. Starting life as a Computer Scientist at Leeds; he taught parallel programming (C/C++; MPI/pthreads); before becoming a researcher within the Visualisation and Virtual Reality research group; all within the School of Computing. In these research roles; he has developed systems for working with pathology data with high resolution display clusters; and worked on a number of research projects on e-Science/Grid Computing. More recent work within IT involved providing and supporting a range of predominantly Linux solutions to research and teaching within Engineering; and more recently John has been a leader of this provision for the whole University. John has worked with a number of languages such as C/C++; Java; and Fortran; along with associated toolchains like Git; and has a soft spot for configuration management tools like Puppet; which has been used to provide standard configuration and deployment of systems and solutions at Leeds.


Wednesday July 27th 15:00-15:50 Day 2: 10 minute talks session 4

Stream 2: Worsley 8.49N

Digital twins of breast tumours for predicting neoadjuvant chemotherapy outcome

Talk time: 1500-1510

Abstract: Breast cancer is the most common cancer in women across the globe and a major cause of death. Neoadjuvant chemotherapy (NACT) is the standard of care for patients with locally advanced breast cancer, delivered to shrink the tumour before proceeding to surgery. However, only 39% of patients achieve pathological complete response and up to 12% experience no response at all. As such, there is a clear need to accurately identify non-responsive tumours as early as possible, enabling clinicians to discontinue the unsuccessful NACT and proceed with alternative treatment. For this purpose, we propose using digital twins: virtual replicates of a patient's tumour cellularity, which can be evolved in time to predict evolution under a NACT regimen. Initial tumour cellularity is estimated from diffusion-weighted magnetic resonance imaging (MRI), and used to calibrate biophysically relevant mathematical models of tumour growth. This calibration comes with an extremely large computational expense, in particular for the coupling of the surrounding tissue stiffness to tumour cell diffusion. As a first step to remedying this, we have rewritten this mechanical coupling using a finite element solver, and compared code performances on synthetic datasets and MRI data from patients.

Speaker: Rose Collet, PhD Student, School of Computing

Biography: I'm a PhD student in the Fluid Dynamics CDT and Centre for Computational Imaging and Simulation Technologies in Biomedicine (CISTIB). I am broadly interested in biomedical applications of fluid dynamics, and tumour growth modelling in particular. My PhD aims to develop models of fluid transport in tumours to characterise response to neoadjuvant chemotherapy in breast cancer patients, using a combination of medical imaging and mathematical modelling.

Outcome Prediction in Pelvic Radiotherapy Using Deep Multiple Instance Learning and Cascaded Attentions.

Talk time: 1510-1520

Abstract: Radiotherapy is currently used for more than 50% of patients with various cancers. Using ionising radiation to eliminate cancerous tissue as the basis of radiotherapy can also damage normal tissues around the tumour, which leads to malfunction in those organs – called toxicity. The occurrence and severity of this toxicity vary from patient to patient and it is still not well understood which factors increase the risk of toxicity. Although deep neural networks can perform challenging tasks with high accuracy, their complexity and not being able to explain their outcome hinder their application in real-world radiotherapy tasks. In our work, we propose a novel convolutional model to predict the toxicity for patients with pelvic cancers. Our novelty is twofold; firstly, we propose to employ multiple instance learning to investigate large data including 3D CT scans and 3D dose treatment plans with lower complexity. Secondly, we apply the attention mechanism to provide visual explanation for the network's behaviour. The Quantitative and qualitative performance analysis demonstrate that our framework can offer clinically convincing tools for radiotherapy toxicity prediction. The development of investigating both image and patient numerical data with a deep network will be a very important research direction for our future work.

Speaker: Behnaz Elhaminia, PhD Student, School of Computing

Biography: Behnaz Elhaminia received her B.S. degree in computer engineering, in 2014, and the M.S. degree in 2018, both from Ferdowsi University of Mashhad (FUM), Mashhad, Iran. She joined the Center for Computational Imaging & Simulation Technologies in Biomedicine School of Computing research group at the University of Leeds as a PhD student in September 2019 supervised by Dr Ali Gooya and Prof. Alejandro Frangi. She has been awarded a Faculty of Engineering Doctoral Academy Scholarship for her research. She is also an honorary PhD research student at Leeds teaching hospital and a member of Cancer Research UK RADNET Leeds.

Integrating next generation sequencing datasets to model gene regulation during cellular differentiation

Talk time: 1520-1530

Abstract: All cells in the human body have same genetic blueprint yet have diverged to produce different proteins and perform distinct functions. This cellular diversity is achieved through regulation of gene expression, ensuring each cell switches on the right genes at the right time. As cells differentiate, non-coding regions of the genome – cis-regulatory elements – exercise tight control over gene expression through the activity of DNA-bound transcription factors. Using statistics and machine learning, we can identify transcription factor-bound cis-regulatory elements and their target genes from large, noisy next-generation sequencing datasets. Here we present an integrative ‘omics approach to identify and prioritise gene-specific cis-Regulatory Elements Across Differentiation (cisREAD). We show how sequencing datasets can be processed using high-performance computing to generate inputs to our software and demonstrate usage of the cisREAD R package.

Speaker: Amber Emmett

I learned a programming language... now what?

Talk time: 1530-1540

Abstract: Coding is no longer just a software developer activity. Codes have been used for the most diverse activities across different areas. In this context, people with the most diverse backgrounds have sought to learn some programming language, but like any human language, there is a huge gap between learning language syntax and actually engaging in a conversation about a random topic with a native speaker.
This is the scenario people are facing with coding. We learn syntax, we practice with very simplified problems, but when we are faced with real problems in our field, we cannot even imagine how to solve them. In this talk I will present some ideas to help to reduce the gap between learning a programming language and being able to use it to solve problems.

Speaker: Dr Patricia Ternes, Research Software Engineer, Research Computing

Biography: Patricia is a Research Software Engineer at the University of Leeds. She has a PhD in Theoretical Physics, and her research focuses on the development of computational models that help to understand complex systems; initially focusing on phenomena such as the behaviour of fluids at nanoscales and lately modelling emergence in spatio-social systems.

Understanding Colorectal Cancer patients' microbiome using shotgun metagenomics: a big cohort study: COLO-COHORT

Talk time: 1540-1550

Abstract: 16000 people die of colorectal cancer (CRC) in the UK each year. More than half of CRC cases could be prevented by addressing modifiable lifestyle factors, identification of earlier stage neoplasia and targeted interventions, including chemoprevention and, crucially, polypectomy. COLO-COHORT will perform microbiome analyses on 4000 individuals undergoing colonoscopy because of symptoms or for surveillance. We will collect phenotypic information, undertake faecal immunochemical testing for occult bleeding, and obtain relevant blood investigations. It is being considered to do shotgun metagenomics. Which is a big computational challenge which include sequence assembly, alignment and annotation. After obtaining taxonomic and functional data, information will be correlated with phenotypic and the neoplasia profile at colonoscopy to identify factors which best predict disease risk. We will compare diversity and structure of the faecal microbiome in patients with and without neoplasia and correlate with dietary and/or lifestyle patterns. This project will give valuable information on the role of the microbiome in patients with adenomas or cancer. My talk will be a presentation of this research story, the pipeline planning and what problems I have faced so far. Further I will be providing few results related to healthy volunteer pilot study of this project.

Speaker: Suparna Mitra, University Academic Fellow, School of Medicine

Biography: Dr. Mitra has Mathematics and Statistics background (BSc & MSc) and pursued her Ph.D. in Bioinformatics from Tuebingen University, Germany. After that she was Marie Curie research fellow in UK. She has over 10 years of experience in planning, designing, conducting meta-’omics projects that comprise bacterial community (microbiota) analysis via 16S rRNA and/or whole genome shotgun sequencing of microbiota strains/species strains to enable further characterisation, including phylogeny, RNASeq etc., which has led to >30 publications. Among those most are directly related to medical research and therapeutics. She is the editor of the new upcoming “Metagenomics Data Analyses" book in the series of ‘Methods in Molecular Biology’. Applying meta-'omics concepts in different disease samples allows a more complete understanding of the aetiologies of complex diseases and it can help in developing new therapeutic strategies for targeted personalised disease. Dr. Mitra has collaborated with multiple groups within and outside University of Leeds (nationally and internationally). She is involved in medical projects like: COVID-patients’ microbiome, drug trials, drug testing using gut-models, CDI, gut microbiome, drug-induced dysbiosis, colorectal cancer, atherosclerotic plaque, early life microbiome from premature babies, weight loss surgeries, nutrition, food additives etc.; and environmental projects like soil microbiome, Mangrove community, microbial perturbation with rain-water and land slide, waste water treatment plants etc Dr Mitra is keen to contribute her knowledge and experience in metagenomics, bioinformatics and biostatistics in enhancing human life and health targeting global health challenges.


Wednesday July 27th 15:50-16:00 Questions and changeover

Stream 1: Worsley 8.43X and Y


Wednesday July 27th 16:00-16:40 Discussion Panel: The future of ResCompLeedsCon

Stream 1: Worsley 8.43X and Y

Abstract: This panel session will look to wrap up the conference, inviting attendees to provide feedback and help shape the future of this conference.