Matthieu Futeral-Peter Profile
Matthieu Futeral-Peter

@FuteralMatthieu

Followers
117
Following
342
Media
3
Statuses
24

PhD student @Inria in Willow and ALMAnaCH teams. Prev. intern @GoogleDeepMind. MVA & Ensae Paris Alumni

Paris, France
Joined May 2021
Don't wanna be here? Send us removal request.
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
Announcing mOSCAR, multilingual interleaved text-image corpus as part of @oscarnlp project. Paper: https://t.co/1hhnyYCyI3 Dataset: https://t.co/hiEUJ1Q3iJ Doc: https://t.co/KsnT5wVee2 1/6
3
27
54
@FuteralMatthieu
Matthieu Futeral-Peter
4 months
@oscarnlp I'll be presenting mOSCAR tomorrow at ACL in Vienna, feel free to stop by if you're around!
0
0
5
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
1
0
7
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
@oscarnlp @thorn Huge thanks to the amazing team involved in the project: @RandyZebaze, @pjox13, @uinelj, Rémi Lacroix, @CordeliaSchmid, @RABawden & @bensagot. Work mainly done @inria_paris & @InriaParisNLP 5/6
1
0
9
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
@oscarnlp @thorn We next train a multilingual open flamingo on mOSCAR and multilingual img-text captions and find that mOSCAR helps boost few-shot learning results on various multilingual img-text benchmarks, confirming previous findings in English-only VLMs. 4/6
1
0
8
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
@oscarnlp For safety, we especially focus on NSFW content and filter images using a mixture of open-sourced and private NSFW detectors. We additionally tackle CSAM removal problem with Safer from @thorn as it was shown large-scale image dataset contain CSAM. 3/6
1
0
8
@FuteralMatthieu
Matthieu Futeral-Peter
1 year
@oscarnlp mOSCAR covers 163 languages, 315M documents, 214B tokens and 1.2B images. We conducted a large set of filtering and eval steps to make sure mOSCAR is as safe as possible, diverse and of good quality. 2/6
1
0
6
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
@GoogleDeepMind @_andrea_agos @MTagliasacchi @neilzegh @n0mad_0 For more details please have a look at the paper: https://t.co/CsU5zMC1q9 Special thanks to my amazing host @n0mad_0, I had a blast working on this!
0
0
4
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
@GoogleDeepMind @_andrea_agos @MTagliasacchi @neilzegh @n0mad_0 We finally evaluate the impact on diversity of various improvements in the speech literature using MAD Speech and we show that acoustic diversity changes non-trivially across different scenarios; it should therefore be taken into account when evaluating models! 5/5
1
0
3
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
@GoogleDeepMind @_andrea_agos @MTagliasacchi @neilzegh @n0mad_0 Next we construct a series of datasets with controlled levels of diversity to test different candidate metrics and we show that MAD Speech achieves the strongest agreements with the ground truth diversity. 4/5
1
0
3
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
@GoogleDeepMind @_andrea_agos @MTagliasacchi @neilzegh @n0mad_0 We focus on acoustic diversity and more especially five “facets” of it: voice, gender, emotion, accent, and background noise. We then build MAD Speech as a composition of (1) per-facet specialized embeddings and (2) aggregation functions to get diversity scores. 3/5
1
0
4
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
@GoogleDeepMind @_andrea_agos @MTagliasacchi @neilzegh @n0mad_0 Generative spoken language models produce natural-sounding speech, however little is known about how diverse their outputs are as we lack metrics to measure it. We fill this gap by proposing a set of lightweight metrics we call MAD Speech. 2/5
1
0
4
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
Excited to introduce MAD Speech: a new set of metrics to measure acoustic diversity in speech. Work done @GoogleDeepMind w/ @_andrea_agos, @MTagliasacchi, @neilzegh and @n0mad_0 Paper link: https://t.co/CsU5zMC1q9 1/5
3
13
53
@wissam_antoun
Wissam Antoun
2 years
Have you ever wished you had a Copilot-like helper on Overleaf? Well, I did, and @ylecun 😁 Introducing: GalacTex, a Chrome extension that adds the power of LLMs to your Overleaf text editor. GalacTex is an open-source project powered by Galactica 🪐 an LLM for science.
1
5
33
@RABawden
Rachel Bawden
2 years
Log into Virtual Poster Session 2 at #ACL2023 to chat with @FuteralMatthieu about his work on tackling ambiguity in multimodal machine translation. Today at 11am Toronto time (EDT), i.e. in 2 hours from now!
0
1
1
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
This is a joint work with @CordeliaSchmid, I. Laptev, @bensagot and @RABawden. @InriaParisNLP @inria_paris 6/6
0
0
0
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
Our approach VGAMT obtains competitive results compared to strong text-only models on standard English-to-{French, German, Czech} benchmarks and outperforms these baselines and state-of-the-art MMT systems by a large margin on our contrastive test set. 5/6
1
0
0
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
(ii) a test set, CoMMuTE, containing contrastive evaluation pairs, where images provide the necessary context to disambiguate between multiple meanings of the same source sentence. 4/6
1
0
0
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
We propose (i) a new MMT approach that is able to exploit text-only data and captioning data by adapting a strong MT model with lightweight modules. We also introduce a novel guided self-attention to mask irrelevant connections between the image and parts of the text. 3/6
1
0
0
@FuteralMatthieu
Matthieu Futeral-Peter
2 years
(i) Most MMT models do not exploit the large amount of text-only data and therefore perform poorly compared to SOTA text-only MT. (ii) Current MMT benchmarks are mainly composed of unambiguous examples and are thus not adapted to evaluate how well MMT models exploit images. 2/6
1
0
0