ST-Dataminer Lab - Research

Urban planning and reasoning tasks

Urban Foundation Models, inspired by advancements in language and vision foundation models, adapt these powerful concepts to the urban and geospatial domain. By harnessing large-scale data from sources like OpenStreetMap, these models capture the complex spatial, visual, and textual characteristics of urban environments, providing general-purpose representations for various urban tasks. Our research builds on this foundation, employing techniques like context leverage and self-supervised learning to address the unique challenges of urban data. This approach enhances the models' effectiveness in tasks such as urban function inference, geospatial entity resolution, and address matching, offering accurate and scalable solutions for urban planning, socio-economic analysis, and smart city initiatives.

Pre-trained models for spatiotemporal data encoding

FSTLLM: Spatio-Temporal LLM for Few Shot Time Series Forecasting, Yue Jiang, Yile Chen, Xiucheng Li, Qin Chao, Shuai Liu, Gao Cong (ICML 2025)
City Foundation Models for Learning General Purpose Representations from OpenStreetMap Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li. International Conference on Information and Knowledge Management 2024 (**CIKM '24**)
Weiming Huang, Jing Wang, Gao Cong. International Journal of Geographical Information Science 2024 (IJGIS '2024')

LLM-based spatiotemporal models

Large Language Model (LLM)-based spatiotemporal models integrate the natural language processing prowess of LLMs with the nuanced understanding of spatial and temporal data. Built primarily on transformer architectures, these models are adept at handling sequences where geographic proximity and temporal dynamics are interlinked. They are particularly effective for predicting locations, analyzing movement patterns, or recommending next points of interest. By employing attention mechanisms, these models focus on pertinent spatiotemporal features, making them valuable for applications in route prediction, traffic management, and urban planning, where understanding the complex interplay between space and time is crucial.

Trajectory data mining

Trajectory data mining aims to extract meaningful patterns and insights from movement data consisting of sequences of timestamped locations, such as GPS data from vehicles or mobile devices. Our group explores various aspects and areas within this domain. We address challenges in trajectory recovery and simplification, ensuring that data remains both accurate and manageable. We explore learning techniques to derive trajectory representations, which can be efficiently and effectively utilized in tasks like similarity computation and clustering. We work on enhancing intelligent transportation systems by developing models for estimating time of arrival (ETA) and inferring route between origin and destination pairs. Additionally, our research extends to anomalous trajectory detection, where we identify irregularities in trajectory data that could indicate unusual patterns, contributing to safer and more reliable transportation systems.

Trajectory similarity computation

Yanchuan Chang, Egemen Tanin, Gao Cong, Christian S. Jensen, Jianzhong Qi. Trajectory Similarity Measurement: An Efficiency Perspective. Proceedings of the VLDB Endowment 2024 (PVLDB 2024) (Paper) (Code)
Zheng Wang, Cheng Long, Gao Cong: Similar Sports Play Retrieval With Deep Reinforcement Learning. IEEE Trans. Knowl. Data Eng. 35(4): 4253-4266 (2023) (Paper) (Code)
Zheng Wang, Cheng Long, Gao Cong and Yiding Liu: Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning, Proceedings of the VLDB Endowment (PVLDB) 2020

Trajectory recovery & simplification

Yile Chen, Gao Cong, Cuauhtemoc Anda. TERI: An Effective Framework for Trajectory Recovery with Irregular Time Intervals. Proceedings of the VLDB Endowment 2024 (PVLDB 2024) (Paper) (Code)
Zheng Wang, Cheng Long, Gao Cong. Error-Bounded Online Trajectory Simplification with Multi-Agent Reinforcement Learning, KDD 2021 (Paper) (Code)
Zheng Wang, Cheng Long, Gao Cong. Trajectory Simplification with Reinforcement Learning, ICDE 2021 (Paper) (Code)

Intelligent transportation

Qianru Zhang, Zheng Wang, Cheng Long, Chao Huang, Siu-Ming Yiu, Yiding Liu, Gao Cong, Jieming Shi. Online Anomalous Subtrajectory Detection on Road Networks with Deep Reinforcement Learning. IEEE ICDE 2023 (Paper) (Code)
Xiucheng Li, Gao Cong, Yun Cheng, Spatial Transition Learning on Road Networks with Deep Probabilistic Models, ICDE 2020 (Paper)

Spatiotemporal forecasting

Spatiotemporal forecasting is a powerful analytical technique that involves predicting future events or conditions based on spatial and temporal data. This approach is crucial in various fields, including meteorology, urban planning, and environmental science, as it allows for the anticipation of changes over time and across different locations. By analyzing patterns in data that vary both in space and time, spatiotemporal forecasting models can provide insights into trends, such as weather patterns, traffic flow, and disease spread, enabling better decision-making and proactive management in complex, dynamic environments.

Yue Jiang, Xiucheng Li, Yile Chen, Shuai Liu, Weilong Kong, Antonis F. Lentzakis, Gao Cong. SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting. IEEE International Conference on Data Engineering (ICDE 2024) (Paper) (Code)

Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang, Cheng Long, Gao Cong, and Jingyuan Wang. AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction. In The Twelfth International Conference on Learning Representations, 2024. ICLR'24 (Paper)

Time series analytics

Time series analytics involves the statistical analysis and modeling of data that is indexed in time order. This form of analytics is critical in domains where monitoring over time is essential, such as finance, healthcare, and environmental science. Techniques range from simple moving averages to complex machine learning models, including ARIMA, LSTM networks, and recurrent neural networks. These methods help to identify trends, seasonal variations, and cycles within the data. Time series analytics also plays a pivotal role in forecasting future values based on historical patterns, detecting anomalies, and optimizing operational strategies. Its applications are vast, enabling decision-makers to derive insights from patterns over time, predict future scenarios, and effectively respond to dynamic conditions.

Di Yao, Gao Cong, Chao Zhang, Xuying Meng, Rongchang Duan, Jingping Bi. A Linear Time Approach to Computing Time Series Similarity based on Deep Metric Learning. IEEE Transactions on Knowledge and Data Engineering (TKDE) 34(10): 4554-4571 (2022) (Paper)

Points of Interest (POI)

Effective management of Point-of-Interest (POI) data is essential for enhancing location-based services, urban planning, and geospatial analytics. We explore advanced techniques to optimize POI data search and representation by leveraging spatial indexing, semantic enrichment, and machine learning algorithms. Our research aims to improve the accuracy and efficiency of POI data in applications while enabling more meaningful representations of locations, ultimately enhancing decision-making processes in various applications such as navigation, recommendation systems, and smart city initiatives.

Yang Xu, Gao Cong, Lei Zhu, Lizhen Cui. MMPOI: A Multi-Modal Content-Aware Framework for POI Recommendations. the Web Conf 2024 (Paper)
Lubin Bai , Weiming Huang , Xiuyuan Zhang, Shihong Du, Gao Cong , Haoyu Wang , Bo Liu Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs. ISPRS Journal of Photogrammetry and Remote Sensing 201, 193-208, 2023 (Paper)
Yile Chen, Xiucheng Li, Gao Cong, Cheng Long, Zhifeng Bao, Shang Liu, Wanli Gu, Fuzheng Zhang: Points-of-Interest Relationship Inference with Spatial-enriched Graph Neural Networks. Proc. VLDB Endow. 15(3): 504-512 (2021) (Paper)
Shanshan Feng; Lucas Vinh Tran; Gao Cong; Lisi Chen; Jing Li; Fan Li: HME: A Hyperbolic Metric Embedding Approach for Next-POI Recommendation, SIGIR 2020(Paper)

Road network

Road network serves as typical geospatial data type in polylines. We develop techniques to model and represent the complex structure and semantic patterns of road networks in a way that enhances various downstream applications, such as traffic inference, travel time estimation, and road classification. We also investigate specific applications in urban planning and infrastructure development, aiming to improve the effectiveness and management of transportation networks.

Haitao Yuan, Gao Cong, Guoliang Li. Nuhuo: An Effective Estimation Model for Traffic Speed Histogram Imputation on A Road Network. Proceedings of the VLDB Endowment 2024 (PVLDB 2024) (Paper) (Code)
Yile Chen, Xiucheng Li, Gao Cong, Cheng Long, Zhifeng Bao. Robust Road Network Representation Learning: When Traffic Patterns Meet Traveling Semantics, CIKM 2021. (Paper)

Regions

Urban regions form the foundation of essential urban utilities and services, encompassing diverse areas that play distinct roles in the city's functionality. Understanding and profiling these regions allow us to discover urban functions, assist in urban planning, inform policy-making, and conduct socio-economic analyses. This is significant for improving the efficiency, sustainability, and quality of life in cities. To accomplish this, we rely on accessible geospatial data sources such as OpenStreetMap, which provides detailed information like building footprints and points of interest (POIs). Our approach involves sophisticated techniques like metric learning and contrastive learning, which enable us to capture both statistical and geographic similarities between regions. Through the application of these techniques, our methodology has facilitated effective region comparisons, enabling successful implementation of similar region search. We have also achieved significant advances in tasks such as urban land use prediction and population density estimation, outperforming traditional models.

Liang Zhang, Cheng Long, Gao Cong. Region Embedding with Intra and Inter-View Contrastive Learning. In IEEE Transactions on Knowledge and Data Engineering, 35(9): 9031-9036 (2023) (Paper) (Code)
Yi Li, Weiming Huang, Gao Cong, Hao Wang, Zheng Wang. Urban Region Representation Learning with OpenStreetMap Building Footprints. KDD 2023 (Paper) (Code)

Recommendation and user modeling (POI)

Effective management of user behavior data is crucial for personalized recommendations, targeted marketing, and user experience optimization. We investigate advanced methods to enhance the modeling and recommendation processes by integrating behavioral analytics, machine learning, and data-driven user profiling. Our research focuses on developing robust algorithms that capture user preferences and behaviors, enabling more precise and dynamic recommendations across various platforms, thereby driving higher satisfaction and retention rates.

Jiayi Xie, Shang Liu, Gao Cong, Zhenzhong Chen. UnifiedSSR: A Unified Framework of Sequential Search and Recommendation. the Web Conf 2024 (Paper) (Code)
Shang Liu, Wanli Gu, Gao Cong, Fuzheng Zhang: Structural Relationship Representation Learning with Graph Embedding for Personalized Product Search. CIKM 2020: 915-924 Paper

ST Dataminer Research Lab

Urban Foundation Model