champpy.MobProfilesCleaner

The MobProfilesCleaner class is used to clean and validate mobility profiles data. It provides configurable limits for speed, duration, and distance, allowing for flexible data cleaning strategies.

The cleaner performs the following operations:

  • Applies min/max limits to speed, duration, and distance values

  • Removes outliers or caps values based on configured methods

  • Cleans first/last journey locations to ensure plausible data

  • Resamples data to a specified temporal resolution

  • Provides detailed logging of all cleaning actions

Basic workflow:

  1. Initialize the cleaner class MobProfilesCleaner with user parameters UserParamsCleaning

  2. Call clean_profiles() with the mobility profiles MobProfiles to be cleaned

  3. Access the cleaned MobProfiles instance

class champpy.MobProfilesCleaner(user_params=None)[source]

Cleaner for MobProfiles with configurable limits.

This class provides configurable data cleaning for mobility profiles, including removal or capping of outliers in speed, duration, and distance. It also validates and corrects first/last journey locations.

Parameters:

user_params (UserParamsCleaning) – Cleaning limits. If None, default limits from UserParamsCleaning are used.

modified_id_journeys

Dictionary tracking modified journeys by type of modification (distance, speed, duration, location).

Type:

dict

deleted_id_journeys

Dictionary tracking deleted journeys by type of modification (distance, speed, duration).

Type:

dict

params

User parameters for cleaning.

Type:

UserParamsCleaning

Examples

Create a cleaner with default limits:

>>> from champpy import MobProfilesCleaner, UserParamsCleaning
>>> cleaner = MobProfilesCleaner()
>>> cleaned_profiles = cleaner.clean(mob_profiles)

Create a cleaner with custom limits:

>>> from champpy import LimitConfig, UserParamsCleaning
>>> custom_params = UserParamsCleaning(
...     speed=LimitConfig(min_value=0.5, max_value=100.0, max_method="cap"),
...     duration=LimitConfig(min_value=0.1, max_value=12.0, max_method="cap"),
...     distance=LimitConfig(min_value=0.1, max_value=600.0, max_method="cap"),
...     temp_res=0.5,
...     print_summary=True
... )
>>> cleaner = MobProfilesCleaner(custom_params)
>>> cleaned_profiles = cleaner.clean(mob_profiles)
clean(mob_profiles)[source]

Clean the input MobProfiles based on configured limits.

This method applies the following cleaning steps:

  • Resample to specified temporal resolution

  • Reindex IDs for consistency

  • Clean first/last journey locations

  • Apply limits to duration, speed, and distance

  • Log cleaning summary

Parameters:

mob_profiles (MobProfiles) – MobProfiles instance to clean.

Returns:

Cleaned MobProfiles instance with _is_cleaned flag set to True.

Return type:

MobProfiles

class champpy.UserParamsCleaning(speed=LimitConfig(min_value=0.01, min_method='delete', max_value=120.0, max_method='cap'), duration=LimitConfig(min_value=0.25, min_method='delete', max_value=8.0, max_method='cap'), distance=LimitConfig(min_value=0.5, min_method='delete', max_value=500.0, max_method='cap'), temp_res=0.25, print_summary=True)[source]

User parameters for cleaning MobProfiles.

Configuration for data quality checks including speed, duration, distance limits and temporal resolution settings.

distance: LimitConfig = LimitConfig(min_value=0.5, min_method='delete', max_value=500.0, max_method='cap')

Distance limits configuration in kilometers.

Default: LimitConfig(min_value=0.5, max_value=500.0)

duration: LimitConfig = LimitConfig(min_value=0.25, min_method='delete', max_value=8.0, max_method='cap')

Duration limits configuration in hours.

Default: LimitConfig(min_value=0.25, max_value=8.0)

print_summary: bool = True

Whether to print cleaning summary to logger.

Default: True

speed: LimitConfig = LimitConfig(min_value=0.01, min_method='delete', max_value=120.0, max_method='cap')

Speed limits configuration in km/h.

Default: LimitConfig(min_value=0.01, max_value=120.0)

temp_res: float = 0.25

Temporal resolution in hours for resampling during cleaning.

Default: 0.25 (15-minute resolution)

class champpy.LimitConfig(min_value=0, min_method='delete', max_value=inf, max_method='cap')[source]

Configuration for a single limit parameter in data cleaning.

max_method: Literal['delete', 'cap'] = 'cap'

"delete" or "cap" (default: "cap").

Type:

Method to handle values above maximum

max_value: float = inf

inf).

Type:

Maximum value threshold (default

min_method: Literal['delete'] = 'delete'

"delete").

Type:

Method to handle values below minimum (default

min_value: float = 0

0).

Type:

Minimum value threshold (default