Skip to the content.

MetaWild

Introduction

The MetaWild dataset is a multimodal benchmark designed to improve animal re-identification (Animal ReID) by integrating environmental metadata with visual data. Unlike existing datasets that rely solely on images, MetaWild includes 20,890 images spanning six representative species, each image is paired with metadata such as temperature and circadian rhythms, providing valuable context for distinguishing individual animals. Additionally, to facilitate the use of metadata in existing ReID methods, we propose the Meta-Feature Adapter (MFA), a lightweight module that can be incorporated into existing vision-language model (VLM)-based Animal ReID methods, allowing ReID models to leverage both environmental metadata and visual information to improve ReID performance. Our experiments on MetaWild show that incorporating metadata with MFA consistently improves ReID performance over using visual data alone. We hope that our benchmark can inspire further exploration of multimodal approaches for Animal ReID.


Data Availability

Our dataset is accessible through:


Supplementary Material

This section provides detailed supplementary materials to support the main findings presented in the MAAR project. Our supplementary materials are divided into two key sections:


Meta-Feature Adapter (MFA)

To effectively leverage environmental metadata in Animal Re-Identification (ReID), we propose the Meta-Feature Adapter (MFA) — a lightweight, plug-and-play module designed to integrate textual metadata with visual representations. MFA is compatible with existing Vision-Language Models (VLMs) such as CLIP, and enables multimodal feature fusion by aligning textual metadata with visual features.

MFA Architecture

Figure 1: Overview of the MFA architecture integrating visual and metadata branches using feature experts and gated cross-attention.

Multimodal Integration Results

We evaluate the influence of environmental metadata on Animal ReID performance using the MetaWild dataset by incorporating MFA into existing ReID methods. Experimental results show that incorporating environmental metadata consistently improves ReID accuracy across multiple baseline methods and all six species.

CLIP-FT Results
CLIP-FT vs CLIP-FT+MFA
CLIP-ReID Results
CLIP-ReID vs CLIP-ReID+MFA
ReID-AW Results
ReID-AW vs ReID-AW+MFA

Metadata Distribution Visualizations

To better understand the role of environmental context in Animal ReID, we analyze the distribution of three metadata types included in the MetaWild dataset: Temperature, Circadian Rhythms, and Face Orientation. These features are selected based on ecological relevance and their ability to provide identity-discriminative cues, especially when visual signals are ambiguous or incomplete.

Temperature Distribution

As shown in the raincloud plot, the temperature range under which animals are captured varies significantly across species:

These inter-species temperature patterns allow metadata to serve as a latent domain cue, providing additional signal beyond visual appearance.

Temperature Distribution

Figure 2: Visualization of temperature distribution across species in the MetaWild dataset.

Circadian Rhythms Distribution

The distribution of day and night appearances reveals circadian preferences:

This metadata is valuable in cases where lighting affects visibility, helping models reason about behavior-related appearance variations.

Circadian Rhythms Distribution

Figure 3: Circadian rhythms distribution across species in the MetaWild dataset, showing day/night activity patterns.

Face Orientation Distribution

Face orientation statistics highlight capture angle biases:

Pose diversity introduces intra-class variability. By encoding orientation explicitly, models can better align visual representations across individuals.

Face Orientation Distribution

Figure 4: Face orientation distribution across species in the MetaWild dataset.


Licensing & Access

The MetaWild dataset inherits its licensing terms from the NZ-TrailCams dataset, from which it was constructed.
Specifically, MetaWild complies with the:

Community Data License Agreement – Permissive – Version 1.0
Full License Text (CDLA-Permissive-1.0)

For detailed information on the licensing terms, please refer to the dataset card for MetaWild on Hugging face.