Uncertainty-Guided Never-Ending Learning to Drive

Abstract

We present a highly scalable self-training framework for incrementally adapting vision-based end-to-end autonomous driving policies in a semi-supervised manner, i.e., over a continual stream of incoming video data. To facilitate large-scale model training (e.g., open web or unlabeled data), we do not assume access to ground-truth labels and instead estimate pseudo-label policy targets for each video. Our framework comprises three key components: knowledge distillation, a sample purification module, and an exploration and knowledge retention mechanism. First, given sequential image frames, we pseudo-label the data and estimate uncertainty using an ensemble of inverse dynamics models. The uncertainty is used to select the most informative samples to add to an experience replay buffer. We specifically select high-uncertainty pseudo-labels to facilitate the exploration and learning of new and diverse driving skills. However, in contrast to prior work in continual learning that assumes ground-truth labeled samples, the uncertain pseudo-labels can introduce significant noise. Thus, we also pair the exploration with a label refinement module, which makes use of consistency constraints to re-label the noisy exploratory samples and effectively learn from diverse data. Trained as a complete never-ending learning system, we demonstrate state-of-the-art performance on training from domain-changing data as well as millions of images from the open web.

Method

We present ∞-Driver, an agent that continually learns from unlabeled incoming video data. For each video in the video stream, the agent employs an ensemble of inverse dynamics models to infer waypoint pseudo-labels and their uncertainty (higher uncertainty is depicted in red). Noisy pseudo-labels are automatically refined through consistency-based re-labeling and confidence-based filtering. Next, a driving policy student model is trained over incoming and episodic memory replay data. The memory buffer is updated to incorporate high-uncertainty samples. This maintains a diverse set of samples to retain knowledge and prevent forgetting, despite only viewing a given image once. Towards learning a generalized driving policy, our efficient framework enables highly scalable training, i.e., over millions of video frames from the web.

Result

Evaluation of Incremental Learning Over Cities. We present the results of incremental learning on image collections sorted by cities with varying buffer sizes: 3,000, 6,000, and 9,000. L₋₁, F₋₁, I₋₁ represents the evaluations after the model trains on the last image collection. Our ∞-Driver outperforms other baselines across all metrics and exhibits a significant advantage on the Average Loss and Forgetting measures compared to other baseline models, especially on buffer sizes 3,000 and 6,000. This indicates that our filter mechanism and temporal consistency re-labeling method can effectively remove and refine noisy samples. Additionally, it demonstrates that our uncertainty-based buffer sampling method can select more informative samples into the buffer. Thus, our buffer is enriched with more informative samples and less misleading labels. It enhances the model’s ability to retain previously acquired knowledge without hindering the model from learning new incoming knowledge.

Evaluation of Incremental Learning on Open Web Data. To further demonstrate the generalization ability of ∞-Driver, we conduct open web experiments that continuously train on the YouTube dataset. It is important to note that all samples within the evaluation set are entirely unseen to ∞-Driver before evaluation. We present the revised Average Loss, Forgetting, and ADE measured on the evaluation set after the model completes the final training phase. ∞-Driver obtains the best result on Forgetting, which indicates that the model’s final performance on the evaluation set essentially matches the model’s best performance. The lowest Average Loss and ADE demonstrate that the model exhibits best generalization capabilities among the analyzed models.

Qualitative Analysis of Incremental Learning Over Cities. We shows the ADE scores on each city’s hold-out evaluation set. Performance is measured after the model has completed training on the last city’s image collection. According to the figure, we demonstrate that after incrementally training over ten cities, our ∞-Driver consistently outperforms other baseline models in each city under all buffer settings.

Qualitative Analysis on Open Web Data. We displays the trend of ADE score as ∞-Driver incrementally trained on YouTube data. We plot the trend by selecting the ADE scores measured after the model has seen 2, 4, 6, 8, and 10 million images.

Qualitative Examples

Acknowledgments

We thank the Red Hat Collaboratory (awards #2024-01-RH02, #2024-01-RH07) and NSF (IIS-2152077) for supporting this research.