Alexandra Kapp
@lxndrkp
Followers
761
Following
3K
Media
122
Statuses
770
Berlin, Deutschland
Joined February 2015
I finally officially registered as a Ph.D. student researching 'privacy-preserving analytics of human mobility data applications'. 🎉 Using this occasion, I started a blog where I want to share insights on my work along the way: https://t.co/DdihnKNpUO
3
1
63
Tolle Website vom @BMBF_Bund geförderten Projekt #freemove! Tauchen Sie einfach ein in die Welt der Daten! #freemove hilft beim Spagat zwischen Daten nutzen und Privatsphäre schützen - für mehr #Nachhaltigkeit! @TSBBerlin @TUBerlin @FU_Berlin @DLR_Verkehr
https://t.co/ZWCkJCdp6j
fona.de
Die Mobilitätswende ist ein entscheidender Hebel, wenn es um die nachhaltige Stadt von morgen geht. Mobilitätsdaten werden als Lösung für viele Herausforderungen bei genau dieser Wende gehandelt – ob...
0
4
5
It is worth noting that trip data encompasses various relevant characteristics beyond spatial distribution (e.g. temporal info) all of which are discarded by these models. > Our results imply that current models fall short in their promise of high utility and flexibility. 13/13
0
0
0
The remaining 3 models somewhat maintain spatial distribution, one even with differential privacy guarantees. However, all models struggle to produce meaningful sequences of geo-locations with reasonable trip lengths and to model traffic flow at intersections accurately. /12
1
0
0
Out of the five evaluated models, one fails to produce data within reasonable computation time and another generates too many jumps to meet the requirements for map matching. /11
1
0
0
Then, we introduced routing-engine-generated trips (like GoogleMaps) as a baseline, as they provide a privacy-friendly way of fine-granular routes to connect a start and an endpoint. /10
1
0
0
Firstly, none of the 5 evaluated models provide synthetic data on a level that is fine-granular enough to match the road network. Thus, we included a step of map matching. /9
1
0
0
We evaluated the utility of five state-of-the-art models, AdaTrace, PrivTrace, DP-Loc, a BiLSTM-based model, and TrajGAIL, using the designated utility metrics on a dataset comprising approximately 30,000 bicycle trips in Berlin. /8
1
0
0
Thus, we selected 4 tasks that closely reflect real-life tasks that trip data is used for to obtain a more realistic utility evaluation: trip lengths, traffic volume, road preference, and traffic flow at intersections. /7
1
0
0
Also: high similarity based on one distribution does not indicate a general high utility. For example, high similarity of spatial distributions does not allow conclusions about temporal distributions. Single distributions also do not reflect actual real-life use cases. /6
1
0
0
Distributions are typically discretized, e.g., a spatial distribution based on a grid. The resolution of such grids thereby highly influences the conclusion about the maintained utility: a high similarity on a 100 m res. has different implications than on a 1km res. /5
1
0
0
How is utility measured? Typically, such synthetic data models are evaluated by comparing distributions, e.g., the spatial distribution, between raw and synthetic data. The higher the similarity the higher the utility. However, this approach has shortcomings: /4
1
0
0
Synthetic data, in this context, is created through models that learn respective distributions from raw data and maintain these. The goal is to create high-utility privacy-friendly synthetic datasets. /3
1
0
0
Why synthetic data? Human movement data is highly sensitive, however, data sharing is desirable for many use cases, including city planning or demand /2
1
0
0
📢New paper📢 We investigated the utility of five models that create synthetic urban mobility data from raw privacy-sensitive data. >synthetic trips do not provide the expected high flexibility and utility and should be used carefully. @h_mihaljevic
https://t.co/c7yEYTkX4u đź§µ/1
dl.acm.org
1
1
1
Was genau ist explainable AI und wie funktioniert es? Das habe ich in einem Betrag für https://t.co/Nb6BkAiIgX zusammengefasst 🤓
te.ma
te.ma präsentiert zentrale Beiträge aus Fachdiskursen und macht brennende Fragen fundiert diskutierbar.
0
0
3
New #geocompx blog post on Geographic Data Analysis in #RStats and #Python. The first time equivalent code for reading, plotting, and analysing geographic vector data in these two popular #DataScience languages are provided side-by-side 🚀 #OpenSource: https://t.co/6xnavxT1YW
3
58
194
Wir suchen ab Oktober eine*n wissenschaftliche*n Mitarbeiter*in im Bereich #DeepLearning , #ComputerVision, Bildklassifikation, und Geo-Daten. Ziel: Verknüpfung offener Datenquellen, um möglichst präzise Art und Qualität von Straßenbelag vorherzusagen.
1
7
4
For better usability for practitioners, models should provide clearer information on applicable use cases, input and output format, maintained (and discarded) empirical distributions, and required dataset size. Source code and example datasets should be made openly available.
0
0
0
Summary: so far such models for mobility data are not yet ready to use in practice. And more research with real-world test cases would be desirable.
1
0
0