lxndrkp Profile Banner
Alexandra Kapp Profile
Alexandra Kapp

@lxndrkp

Followers
761
Following
3K
Media
122
Statuses
770

Berlin, Deutschland
Joined February 2015
Don't wanna be here? Send us removal request.
@lxndrkp
Alexandra Kapp
4 years
I finally officially registered as a Ph.D. student researching 'privacy-preserving analytics of human mobility data applications'. 🎉 Using this occasion, I started a blog where I want to share insights on my work along the way: https://t.co/DdihnKNpUO
3
1
63
@lxndrkp
Alexandra Kapp
2 years
It is worth noting that trip data encompasses various relevant characteristics beyond spatial distribution (e.g. temporal info) all of which are discarded by these models. > Our results imply that current models fall short in their promise of high utility and flexibility. 13/13
0
0
0
@lxndrkp
Alexandra Kapp
2 years
The remaining 3 models somewhat maintain spatial distribution, one even with differential privacy guarantees. However, all models struggle to produce meaningful sequences of geo-locations with reasonable trip lengths and to model traffic flow at intersections accurately. /12
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Out of the five evaluated models, one fails to produce data within reasonable computation time and another generates too many jumps to meet the requirements for map matching. /11
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Then, we introduced routing-engine-generated trips (like GoogleMaps) as a baseline, as they provide a privacy-friendly way of fine-granular routes to connect a start and an endpoint. /10
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Firstly, none of the 5 evaluated models provide synthetic data on a level that is fine-granular enough to match the road network. Thus, we included a step of map matching. /9
1
0
0
@lxndrkp
Alexandra Kapp
2 years
We evaluated the utility of five state-of-the-art models, AdaTrace, PrivTrace, DP-Loc, a BiLSTM-based model, and TrajGAIL, using the designated utility metrics on a dataset comprising approximately 30,000 bicycle trips in Berlin. /8
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Thus, we selected 4 tasks that closely reflect real-life tasks that trip data is used for to obtain a more realistic utility evaluation: trip lengths, traffic volume, road preference, and traffic flow at intersections. /7
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Also: high similarity based on one distribution does not indicate a general high utility. For example, high similarity of spatial distributions does not allow conclusions about temporal distributions. Single distributions also do not reflect actual real-life use cases. /6
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Distributions are typically discretized, e.g., a spatial distribution based on a grid. The resolution of such grids thereby highly influences the conclusion about the maintained utility: a high similarity on a 100 m res. has different implications than on a 1km res. /5
1
0
0
@lxndrkp
Alexandra Kapp
2 years
How is utility measured? Typically, such synthetic data models are evaluated by comparing distributions, e.g., the spatial distribution, between raw and synthetic data. The higher the similarity the higher the utility. However, this approach has shortcomings: /4
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Synthetic data, in this context, is created through models that learn respective distributions from raw data and maintain these. The goal is to create high-utility privacy-friendly synthetic datasets. /3
1
0
0
@lxndrkp
Alexandra Kapp
2 years
Why synthetic data? Human movement data is highly sensitive, however, data sharing is desirable for many use cases, including city planning or demand /2
1
0
0
@lxndrkp
Alexandra Kapp
2 years
📢New paper📢 We investigated the utility of five models that create synthetic urban mobility data from raw privacy-sensitive data. >synthetic trips do not provide the expected high flexibility and utility and should be used carefully. @h_mihaljevic https://t.co/c7yEYTkX4u 🧵/1
Tweet card summary image
dl.acm.org
1
1
1
@lxndrkp
Alexandra Kapp
2 years
Was genau ist explainable AI und wie funktioniert es? Das habe ich in einem Betrag für https://t.co/Nb6BkAiIgX zusammengefasst 🤓
Tweet card summary image
te.ma
te.ma präsentiert zentrale Beiträge aus Fachdiskursen und macht brennende Fragen fundiert diskutierbar.
0
0
3
@robinlovelace
Robin Lovelace
2 years
New #geocompx blog post on Geographic Data Analysis in #RStats and #Python. The first time equivalent code for reading, plotting, and analysing geographic vector data in these two popular #DataScience languages are provided side-by-side 🚀 #OpenSource: https://t.co/6xnavxT1YW
3
58
194
@h_mihaljevic
Helena Mihaljevic
2 years
Wir suchen ab Oktober eine*n wissenschaftliche*n Mitarbeiter*in im Bereich #DeepLearning , #ComputerVision, Bildklassifikation, und Geo-Daten. Ziel: Verknüpfung offener Datenquellen, um möglichst präzise Art und Qualität von Straßenbelag vorherzusagen.
1
7
4
@lxndrkp
Alexandra Kapp
2 years
For better usability for practitioners, models should provide clearer information on applicable use cases, input and output format, maintained (and discarded) empirical distributions, and required dataset size. Source code and example datasets should be made openly available.
0
0
0
@lxndrkp
Alexandra Kapp
2 years
Summary: so far such models for mobility data are not yet ready to use in practice. And more research with real-world test cases would be desirable.
1
0
0