champpy.MobProfiles¶
The MobProfiles class is a wrapper class that integrates four interconnected data components for managing mobility profiles.
These four components are instances of Logbooks, Vehicles, Clusters, and Locations.
Each component contains a DataFrame that holds the data for that component.
The components are linked via IDs, ensuring data consistency and enabling seamless workflows for analyzing and modifying the data.
The structure is as follows:
MobProfiles
├── logbooks # Journeys of the vehicles
│ └── df # DataFrame with one row for each journey
├── vehicles # Information of the vehicles
│ └── df # DataFrame with one row for each vehicle
├── clusters # Groups of vehicles with similar behaviour
│ └── df # DataFrame with one row for each cluster
└── locations # Information of the locations distinguished
└── df # DataFrame with one row for each unique location
- class champpy.MobProfiles(input_logbooks_df, input_vehicles_df=None, frozen=False)[source]¶
Wrapper class for mobility profiles in the champpy framework.
It contains the logbooks, vehicles, clusters and locations as separate classes.
- Parameters:
input_logbooks_df (
DataFrame) –Input DataFrame for the logbooks. Expected columns and dtypes:
Column
Type
Description
id_vehicle
One-based index for vehicles, connected to id_vehicle in input_vehicles_df.
dep_dt
Departure datetime of each journey.
arr_dt
Arrival datetime of each journey.
dep_loc
Departure location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.
arr_loc
Arrival location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.
distance
Distance of each journey in km.
input_vehicles_df (
DataFrame|None) –Input DataFrame for the vehicles. If not provided, the vehicles will be generated from the logbooks. Expected columns and dtypes:
Column
Type
Description
id_vehicle
Vehicle identifier.
first_day
First recorded day of the vehicle.
last_day
Last recorded day of the vehicle.
cluster
Split the vehicles into clusters by assigning a cluster ID (one-based) to each vehicle. This is optional and can be used for example to distinguish between different user groups. If you don’t want to use clusters, you can simply set the cluster column to 1 for all vehicles.
first_loc
First location (optional). Use the same location encoding as in dep_loc and arr_loc in input_logbooks_df. It is espacially relevant for non-driving vehicles, which do not have any journeys in the logbooks.
frozen (
bool) – If True, the MobProfiles instance is immutable after creation. Default is False.
- logbooks¶
Contains the journey data of the mobility profile with departure and arrival information.
- Type:
- vehicles¶
Contains vehicle-specific data about eaach vehicle, such as its first and last day of activity, cluster assignment, and first location. It is connected to logbooks via id_vehicle.
- Type:
- clusters¶
Describes the clusters defined in vehicles. It is connected to vehicles via id_cluster. It provides a label for each cluster.
- Type:
- locations¶
Describes the locations defined in logbooks and vehicles. The location is connected to logbooks via dep_loc and arr_loc and to vehicles via first_loc. It provides a label for each location. The location = 0 is reserved for driving and gets the label “Driving”.
- Type:
Examples
Create a MobProfiles instance with minimal example data:
import pandas as pd import champpy # Create example logbook data with synthetic journeys logbook_df = pd.DataFrame({ 'id_vehicle': [1, 1, 2], 'dep_dt': pd.to_datetime(['2024-01-01 08:00', '2024-01-01 18:00', '2024-01-01 09:30']), 'arr_dt': pd.to_datetime(['2024-01-01 12:00', '2024-01-01 22:00', '2024-01-01 17:30']), 'dep_loc': [1, 2, 1], 'arr_loc': [2, 1, 1], 'distance': [25.5, 30.2, 18.0] }) # Create example vehicle data vehicle_df = pd.DataFrame({ 'id_vehicle': [1, 2], 'first_day': pd.to_datetime(['2024-01-01', '2024-01-01']), 'last_day': pd.to_datetime(['2024-01-02', '2024-01-02']), 'id_cluster': [1, 1], 'first_loc': [1, 1] }) # Create mobility profiles mob_profiles = champpy.MobProfiles(input_logbooks_df=logbook_df, input_vehicles_df=vehicle_df)
- add_mob_profiles(input_mob_profiles, old_cluster_label='Old', new_cluster_label='New')[source]¶
Add mobility data from another MobProfiles instance. The vehicles of the existing MobProfiles instance gets id_cluster = 1. The vehicles of the added MobProfiles instance gets id_cluster = 2. You can set labels for existing data using old_cluster_label and for added data using new_cluster_label.
- Parameters:
input_mob_profiles (
MobProfiles) – Another MobProfiles instance to add data from.old_cluster_label (
str) – Label for existing datanew_cluster_label (
str) – Label for added data
- Return type:
Examples
Assuming mob_profiles exists (see
MobProfilesexamples):# Create second dataset other_logbook_df = pd.DataFrame({...}) other_mob_profiles = champpy.MobProfiles(other_logbook_df) # Add to existing mob_profiles mob_profiles.add_mob_profiles(input_mob_profiles=other_mob_profiles, old_cluster_label="Existing", new_cluster_label="Added")
- reindexing(type='all')[source]¶
Reindex of IDs in the MobProfiles instance (id_journey, id_vehicle, id_cluster).
- Parameters:
type (
Literal['all','id_journey','id_vehicle','id_cluster']) – Specifies which IDs to reindex. Default is “all”. - “all”: Reindex all IDs (id_journey, id_vehicle, id_cluster) - “id_journey”: Reindex only journey IDs - “id_vehicle”: Reindex only vehicle IDs - “id_cluster”: Reindex only cluster IDs- Return type:
- class champpy.Logbooks(input_df=None, frozen=False)[source]¶
Component class included in
MobProfilesrepresenting the logbooks with all journeys.The Logbooks class represents the logbook data of journeys, including departure and arrival times, locations, and distances. The class holding a dataframe df that contains the data. It is included as a component in the
MobProfilesclass and can be accessed via its instances. It provides methods to add, update, and delete journeys, as well as to restore location continuity and convert temporal resolution. The Logbooks class ensures data integrity through validation with a Pandera schema.The DataFrame (accessible via
df) contains the following columns:Column
Type
Description
id_journey
One-based index for journeys. This column is optional will be generated if not provided in the input DataFrame.
id_vehicle
One-based index for vehicles, connected to id_vehicle in input_vehicles_df.
dep_dt
Departure datetime of each journey.
arr_dt
Arrival datetime of each journey.
dep_loc
Departure location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.
arr_loc
Arrival location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.
distance
Distance of each journey in km.
duration
Duration of each journey in hours.
speed
Speed of each journey in km/h.
- Parameters:
input_df (
DataFrame) – Input DataFrame for the logbooks. Please see column description inLogbooksfor required columns and types. The column id_journey is optional and will be generated if not provided in the input DataFrame. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.frozen (
bool) – If True, the Logbooks instance is immutable after creation. Default is False.
- add_journeys(input_df)[source]¶
Add journeys from a DataFrame to the logbook.
- Parameters:
input_df (
DataFrame) – DataFrame with journey data. Please see column description inLogbooksfor required columns and types. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Create new journeys DataFrame new_journeys_df = pd.DataFrame({ "id_vehicle": [1, 1], "dep_dt": [pd.Timestamp("2024-01-01 08:00"), pd.Timestamp("2024-01-01 10:00")], "arr_dt": [pd.Timestamp("2024-01-01 09:00"), pd.Timestamp("2024-01-01 11:00")], "dep_loc": [1, 2], "arr_loc": [2, 3], "distance": [10.0, 15.0] }) # Add journeys to logbooks mob_profiles.logbooks.add_journeys(new_journeys_df)
- delete_journeys(id_journey)[source]¶
Delete journeys by journey ID.
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Delete the first two journeys of the logbook mob_profiles.logbooks.delete_journeys(id_journey=[1, 2])
- restore_location_continuity(target='dep')[source]¶
Restore location continuity by overwriting either dep_loc or arr_loc.
Meaning location continuity: the departure location (dep_loc) of every journey for a vehicle must have the same value as the arrival location (arr_loc) of the previous journey.
- update_journeys(input_df)[source]¶
Update existing journeys in the logbook based on id_journey.
- Parameters:
input_df (
DataFrame) – DataFrame with journey data. Please see column description inLogbooksfor required columns and types. Must include id_journey column. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Get the data of the first two journeys and modify its departure times and distance updated_journeys_df = mob_profiles.logbooks.df.head(2) updated_journeys_df.loc[:, "arr_dt"] = updated_journeys_df.loc[:, "arr_dt"] + pd.Timedelta(minutes=30) updated_journeys_df.loc[:, "distance"] = updated_journeys_df.loc[:, "distance"] + 5.0 # Update journeys in logbooks mob_profiles.logbooks.update_journeys(updated_journeys_df)
- property df: DataFrame¶
Get a copy of the DataFrame of the data component. If the DataFrame is None, return an empty DataFrame with the correct schema.
- property temp_res: float¶
Temporal resolution of the logbook in hours.
- Getter:
Returns the current temporal resolution of the logbook in hours. If no temporal resolution has been set, returns None.
- Setter:
Set the temporal resolution of the logbook in hours. This will convert the logbook to the specified temporal resolution by merging overlapping/adjacent journeys per vehicle.
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Get current temporal resolution (initially None) current_res = mob_profiles.logbooks.temp_res # Set temporal resolution to 1 hour # This will merge journeys that overlap or are adjacent within 1-hour intervals mob_profiles.logbooks.temp_res = 1.0 # Check the new temporal resolution print(mob_profiles.logbooks.temp_res) # Output: 1.0
- class champpy.Vehicles(input_df=None, frozen=False)[source]¶
Component class included in
MobProfilesrepresenting vehicles.The Vehicles class manages vehicle-level metadata. It is included as a component in the
MobProfilesclass and can be accessed via its instances.The DataFrame (accessible via
df) contains the following columns:Column
Type
Description
id_vehicle
Vehicle identifier. One-based index for vehicles.
first_day
First recorded day of the vehicle.
last_day
Last recorded day of the vehicle.
id_cluster
Cluster assignment (optional, default: 1). Used to group vehicles into different clusters.
first_loc
First location of the vehicle (optional, default: None). Use the same location encoding as in the logbooks.
- Parameters:
- add_vehicles(input_df)[source]¶
Add vehicles from a DataFrame.
- Parameters:
input_df (
DataFrame) – DataFrame with vehicle data to add. See column description table inVehiclesfor required columns.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Create new vehicles DataFrame new_vehicles_df = pd.DataFrame({ "id_vehicle": [3, 4], "first_day": pd.to_datetime(["2020-01-01", "2020-01-02"]), "last_day": pd.to_datetime(["2020-01-03", "2020-01-04"]), "id_cluster": [1, 1], "first_loc": [1, 2] }) # Add vehicles from a DataFrame mob_profiles.vehicles.add_vehicles(input_df=new_vehicles_df)
- delete_vehicles(id_vehicle)[source]¶
Delete vehicles by vehicle ID.
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Delete the second vehicle and all its journeys mob_profiles.vehicles.delete_vehicles(id_vehicle=[2])
- generate_vehicles_from_logbooks(logbooks)[source]¶
Generate vehicle DataFrame from a Logbooks instance.
- set_first_loc_from_logbooks(logbooks)[source]¶
Set first_loc for each vehicle based on the first dep_loc in the logbooks.
- update_vehicles(input_df)[source]¶
Update existing vehicles based on id_vehicle. Replaces all columns for matching vehicles with values from input_df.
- Parameters:
input_df (
DataFrame) – DataFrame with vehicle data to add. See column description table inVehiclesfor required columns.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Get dataframe of the second vehicle and set its cluster to 2 updated_vehicles_df = mob_profiles.vehicles.df[mob_profiles.vehicles.df["id_vehicle"] == 2] updated_vehicles_df.loc[:, "id_cluster"] = 2 # Update vehicles from a DataFrame mob_profiles.vehicles.update_vehicles(input_df=updated_vehicles_df)
- class champpy.Clusters(vehicles=None, frozen=False)[source]¶
Component class included in
MobProfilesrepresenting vehicle clusters.The Clusters class manages cluster assignments for vehicles in the mobility data. It is included as a component in the
MobProfilesclass and can be accessed via its instances. The clusters DataFrame is automatically generated from the vehicles DataFrame and cannot be set directly, but can be updated via the update methods.The DataFrame (accessible via
df) contains the following columns:Column
Type
Description
id_cluster
Cluster identifier.
label
Human-readable label for the cluster.
- Parameters:
- update_clusters(input_df)[source]¶
Update existing clusters based on id_cluster. Replaces all columns for matching clusters with values from input_df.
- Parameters:
input_df (
DataFrame) – DataFrame with cluster data to update. See column description table inClustersfor required columns.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Get current clusters DataFrame clusters_df = mob_profiles.clusters.df # Update cluster labels clusters_df.loc[clusters_df["id_cluster"] == 1, "label"] = "Private Vehicles" # Apply updated labels mob_profiles.clusters.update_clusters(clusters_df)
- update_clusters_from_vehicles(vehicles)[source]¶
Update clusters DataFrame based on current vehicle DataFrame.
- class champpy.Locations(vehicles=None, logbooks=None, frozen=False)[source]¶
Component class included in
MobProfilesrepresenting locations used in journeys.The Locations class manages location definitions for the mobility data. It is included as a component in the
MobProfilesclass and can be accessed via its instances. The locations DataFrame is automatically generated from the logbooks and vehicles DataFrames and cannot be set directly, but can be updated via the update methods. Location 0 is reserved for “Driving” and location 1 is typically “Home”.The DataFrame (accessible via
df) contains the following columns:Column
Type
Description
location
Location identifier (0 = Driving, 1+ = stationary locations).
label
Human-readable label for the location (e.g., “Home”, “Work”, “Location 3”).
- Parameters:
- update_locations(input_df)[source]¶
Update existing locations based on location ID. Replaces all columns for matching locations with values from input_df.
- Parameters:
input_df (
DataFrame) – DataFrame with location data to update. See column description table inLocationsfor required columns.- Return type:
Examples
This example uses the instance mob_profiles defined in the
MobProfilesexamples:# Get current locations DataFrame locations_df = mob_profiles.locations.df # Update location labels with meaningful names locations_df.loc[locations_df["location"] == 2, "label"] = "Work" # Apply updated labels mob_profiles.locations.update_locations(locations_df)
- update_locations_from_logbooks_vehicles(logbooks=None, vehicles=None)[source]¶
Update locations DataFrame based on unique dep_loc and arr_loc in logbooks.