Problem Description

Motivation

Understanding, modeling, and predicting human mobility trajectories in urban areas is an essential task for various domains and applications, including transportation modeling, disaster risk management, and urban planning. The recent availability of large-scale human movement and behavior data collected from (often millions of) mobile devices and social media platforms have enabled the development and testing of complex human mobility models, resulting in a plethora of methods published in computer science venues such as ACM SIGSPATIAL.

However, human mobility prediction methods are trained and tested on different datasets due to the lack of open-source and large-scale human mobility datasets amid privacy concerns, making it difficult to make fair comparisons of other methods’ performances. Example Image The lack of large-scale open-source datasets has been one of the key barriers hindering the progress of human mobility model development. Please see our Comment piece in Nature Computational Science for more details.



Following the success of HuMob Challenge 2023 @ ACM SIGSPATIAL 2023 and HuMob Challenge 2024 @ ACM SIGSPATIAL 2024, we will host this year’s HuMob Challenge as a GISCUP, using a synthetic but realistic human mobility dataset of 20K~150K individuals’ trajectories per city, across 75 days in 4 metropolitan areas provided by LY Corporation (previously Yahoo Japan). Participants will develop and test methods to predict human mobility trajectories using the provided open-source dataset.

The Challenge: Multi-City Prediction

The challenge takes place in 4 metropolitan areas (cities A, B, C, D), somewhere in Japan. Each area is divided into 500 meters x 500 meters cells, which span a 200 x 200 grid. The human mobility datasets contain the movement of individuals across a 75-day period, discretized into 30-minute intervals and 500-meter grid cells (see Figure below).



Example Image

The task is to predict the movement of a subset of individuals in cities A, B, C, and D, during days 61 to 75 (orange colored parts), using movement data of individuals in cities A, B, C, D (from day 1 to 60) (blue colored parts), as shown in the following Figure.

Not all cities’ data are required to be used for prediction. For instance, to predict city B’s movement from days 61 to 75, one can just use the movement patterns in city B between 1 to 60 (bold arrow). Using data from other cities (e.g., city A) may or may not improve the prediction accuracy!

Download the data here: https://zenodo.org/records/15313913. Please apply for data download through the Zenodo link. Please provide the following information in the request form: 1) Your name, 2) email address, 3) institution, and 4) Team name.

Evaluation Metrics

The predicted human movement trajectories will be evaluated against the actual trajectories and the accuracy using the GEO-BLEU metric (Shimizu et al., 2022). Python implementations of the evaluation metrics will be provided on LY Corporation’s GitHub page (https://github.com/yahoojapan/geobleu).

Shimizu, T., Tsubouchi, K., & Yabe, T. (2022). GEO-BLEU: similarity measure for geospatial sequences. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems (pp. 1-4).

The hyperparameters used for the GEO-BLEU scores are: Beta=0.5, n=5.

Submissions will be ranked for each metric, and the top 5 teams will be decided based on the two rankings. We recommend the teams try to optimize for both metrics.

Baseline results by the Organizers

The organizers have implemented the following baseline methods and tested their prediction accuracy. Evaluation results of baseline methods applied to the test split containing 3,000 user IDs:
  • Global Mean: Calculate the "global mean" as the average of staypoints in days 1 to 60 of all users' trajectories, and for days 61 to 75, predict the trajectory as if the user continues to stay at that point.
  • Global Mode: Determine the "global mode" as the most frequent staypoint in days 1 to 60 of all users' trajectories, and for days 61 to 75, predict the trajectory as if the user continues to stay at that point.
  • Per-User Mean: Calculate the "per-user mean" as the average of staypoints in days 1 to 60 of a given user's trajectory, and for days 61 to 75, predict the trajectory as if the user continues to stay at that point.
  • Per-User Mode: Determine the "per-user mode" as the most frequent staypoint in days 1 to 60 of a given user's trajectory, and for days 61 to 75, predict the trajectory as if the user continues to stay at that point.
  • Unigram Model: For a given user, create a unigram model of staypoints from days 1 to 60 of their trajectory, and use it to predict the trajectory in days 61 to 75.
  • Bigram Model: For a given user, create a bigram model of staypoints from days 1 to 60 of their trajectory, and use it to predict the trajectory in days 61 to 75.
  • Bigram Model (top_p=0.7): For a given user, create a bigram model of staypoints from days 1 to 60 of their trajectory, and use it to predict the trajectory in days 61 to 75, applying a sampling parameter of top_p=0.7.
Method City A City B City C City D (average)
Global Mean 0.00052 0.00020 0.00239 0.00001 0.00078
Global Mode 0.00179 0.00499 0.00724 0.00334 0.00434
Per-User Mean 0.01492 0.01646 0.02559 0.02637 0.02084
Per-User Mode 0.07984 0.08116 0.10789 0.10296 0.09296
Unigram Model 0.03156 0.03754 0.04649 0.04656 0.04054
Bigram Model 0.04687 0.05492 0.06249 0.06212 0.05660
Bigram Model (top_p=0.7) 0.09384 0.09232 0.07039 0.07997 0.08413