ALgorithms for PAngenome Computational Analysis

Akronyymi

ALPACA

Rahoitetun hankkeen kuvaus

Genomes are strings over the letters A,C,G,T, which represent nucleotides, the building blocks of DNA. In view of ultra-large amounts of genome sequence data emerging from ever more and technologically rapidly advancing genome sequencing devices—in the meantime, amounts of sequencing data accrued are reaching into the exabyte scale—the driving, urgent question is: how can we arrange and analyze these data masses in a formally rigorous, computationally efficient and biomedically rewarding manner? Graph based data structures have been pointed out to have disruptive benefits over traditional sequence based structures when representing pan-genomes, sufficiently large, evolutionarily coherent collections of genomes. This idea has its immediate justification in the laws of genetics: evolutionarily closely related genomes vary only in relatively little amounts of letters, while sharing the majority of their sequence content. Graph-based pan-genome representations that allow to remove redundancies without having to discard individual differences, make utmost sense. In this project, we will put this shift of paradigms—from sequence to graph based representations of genomes—into full effect. As a result, we can expect a wealth of practically relevant advantages, among which arrangement, analysis, compression, integration and exploitation of genome data are the most fundamental points. In addition, we will also open up a significant source of inspiration for computer science itself. For realizing our goals, our network will (i) decisively strengthen and form new ties in the emerging community of computational pan-genomics, (ii) perform research on all relevant frontiers, aiming at significant computational advances at the level of important breakthroughs, and (iii) boost relevant knowledge exchange between academia and industry. Last but not least, in doing so, we will train a new, “paradigm-shift-aware” generation of computational genomics researchers.

Näytä enemmän

Aloitusvuosi

2021

Päättymisvuosi

2024

Myönnetty rahoitus

Helsingin yliopisto

280 805.76 €

Participant

THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE (UK)

303 172.56 €

Participant

GENETON S.R.O. (SK)

233 246.88 €

Participant

INSTITUT NATIONAL DE RECHERCHE ENINFORMATIQUE ET AUTOMATIQUE (FR)

274 802.04 €

Participant

STICHTING NEDERLANDSE WETENSCHAPPELIJK ONDERZOEK INSTITUTEN (NL)

265 619 €

Participant

STICHTING NEDERLANDSE WETENSCHAPPELIJK ONDERZOEK INSTITUTEN (NL)

265 619.88 €

Participant

UNIVERZITA KOMENSKEHO V BRATISLAVE (SK)

233 246.88 €

Participant

UNIVERSITAET BIELEFELD (DE)

505 576.8 €

Coordinator

HEINRICH-HEINE-UNIVERSITAET DUESSELDORF (DE)

252 788.4 €

Participant

UNIVERSITA DI PISA (IT)

261 499.68 €

Participant

UNIVERSITA' DEGLI STUDI DI MILANO-BICOCCA (IT)

261 499.68 €

Participant

EUROPEAN MOLECULAR BIOLOGY LABORATORY (DE)

303 172.56 €

Participant

INSTITUT PASTEUR (FR)

274 802.04 €

Participant

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS (FR)

274 802.04 €

Participant

Myönnetty summa

3 725 035 €

Rahoittaja

Euroopan unioni

Rahoitusmuoto

Marie Skłodowska-Curie Innovative Training Networks (ITN)

Puiteohjelma

Horizon 2020 Framework Programme

Haku

Ohjelman osa

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions (5220)

Fostering new skills by means of excellent initial training of researchers (5221)

Aihe

Innovative Training Networks (MSCA-ITN-2020)

Haun tunniste

H2020-MSCA-ITN-2020

Muut tiedot

Rahoituspäätöksen numero

956229

Tunnistetut aiheet

bioinformatics