champpy.MobProfilesCleaner¶
The MobProfilesCleaner class is used to clean and validate mobility profiles data.
It provides configurable limits for speed, duration, and distance, allowing for flexible data cleaning strategies.
The cleaner performs the following operations:
Applies min/max limits to speed, duration, and distance values
Removes outliers or caps values based on configured methods
Cleans first/last journey locations to ensure plausible data
Resamples data to a specified temporal resolution
Provides detailed logging of all cleaning actions
Basic workflow:
Initialize the cleaner class
MobProfilesCleanerwith user parametersUserParamsCleaningCall
clean_profiles()with the mobility profilesMobProfilesto be cleanedAccess the cleaned
MobProfilesinstance
- class champpy.MobProfilesCleaner(user_params=None)[source]¶
Cleaner for MobProfiles with configurable limits.
This class provides configurable data cleaning for mobility profiles, including removal or capping of outliers in speed, duration, and distance. It also validates and corrects first/last journey locations.
- Parameters:
user_params (
UserParamsCleaning) – Cleaning limits. If None, default limits fromUserParamsCleaningare used.
- modified_id_journeys¶
Dictionary tracking modified journeys by type of modification (distance, speed, duration, location).
- Type:
- deleted_id_journeys¶
Dictionary tracking deleted journeys by type of modification (distance, speed, duration).
- Type:
- params¶
User parameters for cleaning.
- Type:
Examples
Create a cleaner with default limits:
>>> from champpy import MobProfilesCleaner, UserParamsCleaning >>> cleaner = MobProfilesCleaner() >>> cleaned_profiles = cleaner.clean(mob_profiles)
Create a cleaner with custom limits:
>>> from champpy import LimitConfig, UserParamsCleaning >>> custom_params = UserParamsCleaning( ... speed=LimitConfig(min_value=0.5, max_value=100.0, max_method="cap"), ... duration=LimitConfig(min_value=0.1, max_value=12.0, max_method="cap"), ... distance=LimitConfig(min_value=0.1, max_value=600.0, max_method="cap"), ... temp_res=0.5, ... print_summary=True ... ) >>> cleaner = MobProfilesCleaner(custom_params) >>> cleaned_profiles = cleaner.clean(mob_profiles)
- clean(mob_profiles)[source]¶
Clean the input MobProfiles based on configured limits.
This method applies the following cleaning steps:
Resample to specified temporal resolution
Reindex IDs for consistency
Clean first/last journey locations
Apply limits to duration, speed, and distance
Log cleaning summary
- Parameters:
mob_profiles (
MobProfiles) – MobProfiles instance to clean.- Returns:
Cleaned MobProfiles instance with
_is_cleanedflag set toTrue.- Return type:
- class champpy.UserParamsCleaning(speed=LimitConfig(min_value=0.01, min_method='delete', max_value=120.0, max_method='cap'), duration=LimitConfig(min_value=0.25, min_method='delete', max_value=8.0, max_method='cap'), distance=LimitConfig(min_value=0.5, min_method='delete', max_value=500.0, max_method='cap'), temp_res=0.25, print_summary=True)[source]¶
User parameters for cleaning MobProfiles.
Configuration for data quality checks including speed, duration, distance limits and temporal resolution settings.
- distance: LimitConfig = LimitConfig(min_value=0.5, min_method='delete', max_value=500.0, max_method='cap')¶
Distance limits configuration in kilometers.
Default:
LimitConfig(min_value=0.5, max_value=500.0)
- duration: LimitConfig = LimitConfig(min_value=0.25, min_method='delete', max_value=8.0, max_method='cap')¶
Duration limits configuration in hours.
Default:
LimitConfig(min_value=0.25, max_value=8.0)
- speed: LimitConfig = LimitConfig(min_value=0.01, min_method='delete', max_value=120.0, max_method='cap')¶
Speed limits configuration in km/h.
Default:
LimitConfig(min_value=0.01, max_value=120.0)
- class champpy.LimitConfig(min_value=0, min_method='delete', max_value=inf, max_method='cap')[source]¶
Configuration for a single limit parameter in data cleaning.
- max_method: Literal['delete', 'cap'] = 'cap'¶
"delete"or"cap"(default:"cap").- Type:
Method to handle values above maximum