champpy.MobProfiles

The MobProfiles class is a wrapper class that integrates four interconnected data components for managing mobility profiles. These four components are instances of Logbooks, Vehicles, Clusters, and Locations. Each component contains a DataFrame that holds the data for that component. The components are linked via IDs, ensuring data consistency and enabling seamless workflows for analyzing and modifying the data. The structure is as follows:

MobProfiles
├── logbooks  # Journeys of the vehicles
│   └── df    # DataFrame with one row for each journey
├── vehicles  # Information of the vehicles
│   └── df    # DataFrame with one row for each vehicle
├── clusters  # Groups of vehicles with similar behaviour
│   └── df    # DataFrame with one row for each cluster
└── locations # Information of the locations distinguished
    └── df    # DataFrame with one row for each unique location
class champpy.MobProfiles(input_logbooks_df, input_vehicles_df=None, frozen=False)[source]

Wrapper class for mobility profiles in the champpy framework.

It contains the logbooks, vehicles, clusters and locations as separate classes.

Parameters:
  • input_logbooks_df (DataFrame) –

    Input DataFrame for the logbooks. Expected columns and dtypes:

    Column

    Type

    Description

    id_vehicle

    int

    One-based index for vehicles, connected to id_vehicle in input_vehicles_df.

    dep_dt

    pandas.Timestamp

    Departure datetime of each journey.

    arr_dt

    pandas.Timestamp

    Arrival datetime of each journey.

    dep_loc

    int

    Departure location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.

    arr_loc

    int

    Arrival location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.

    distance

    float

    Distance of each journey in km.

  • input_vehicles_df (DataFrame | None) –

    Input DataFrame for the vehicles. If not provided, the vehicles will be generated from the logbooks. Expected columns and dtypes:

    Column

    Type

    Description

    id_vehicle

    int

    Vehicle identifier.

    first_day

    pandas.Timestamp

    First recorded day of the vehicle.

    last_day

    pandas.Timestamp

    Last recorded day of the vehicle.

    cluster

    int

    Split the vehicles into clusters by assigning a cluster ID (one-based) to each vehicle. This is optional and can be used for example to distinguish between different user groups. If you don’t want to use clusters, you can simply set the cluster column to 1 for all vehicles.

    first_loc

    int

    First location (optional). Use the same location encoding as in dep_loc and arr_loc in input_logbooks_df. It is espacially relevant for non-driving vehicles, which do not have any journeys in the logbooks.

  • frozen (bool) – If True, the MobProfiles instance is immutable after creation. Default is False.

logbooks

Contains the journey data of the mobility profile with departure and arrival information.

Type:

Logbooks

vehicles

Contains vehicle-specific data about eaach vehicle, such as its first and last day of activity, cluster assignment, and first location. It is connected to logbooks via id_vehicle.

Type:

Vehicles

clusters

Describes the clusters defined in vehicles. It is connected to vehicles via id_cluster. It provides a label for each cluster.

Type:

Clusters

locations

Describes the locations defined in logbooks and vehicles. The location is connected to logbooks via dep_loc and arr_loc and to vehicles via first_loc. It provides a label for each location. The location = 0 is reserved for driving and gets the label “Driving”.

Type:

Locations

Examples

Create a MobProfiles instance with minimal example data:

import pandas as pd
import champpy

# Create example logbook data with synthetic journeys
logbook_df = pd.DataFrame({
    'id_vehicle': [1, 1, 2],
    'dep_dt': pd.to_datetime(['2024-01-01 08:00', '2024-01-01 18:00', '2024-01-01 09:30']),
    'arr_dt': pd.to_datetime(['2024-01-01 12:00', '2024-01-01 22:00', '2024-01-01 17:30']),
    'dep_loc': [1, 2, 1],
    'arr_loc': [2, 1, 1],
    'distance': [25.5, 30.2, 18.0]
})

# Create example vehicle data
vehicle_df = pd.DataFrame({
    'id_vehicle': [1, 2],
    'first_day': pd.to_datetime(['2024-01-01', '2024-01-01']),
    'last_day': pd.to_datetime(['2024-01-02', '2024-01-02']),
    'id_cluster': [1, 1],
    'first_loc': [1, 1]
})

# Create mobility profiles
mob_profiles = champpy.MobProfiles(input_logbooks_df=logbook_df,
                           input_vehicles_df=vehicle_df)
add_mob_profiles(input_mob_profiles, old_cluster_label='Old', new_cluster_label='New')[source]

Add mobility data from another MobProfiles instance. The vehicles of the existing MobProfiles instance gets id_cluster = 1. The vehicles of the added MobProfiles instance gets id_cluster = 2. You can set labels for existing data using old_cluster_label and for added data using new_cluster_label.

Parameters:
  • input_mob_profiles (MobProfiles) – Another MobProfiles instance to add data from.

  • old_cluster_label (str) – Label for existing data

  • new_cluster_label (str) – Label for added data

Return type:

None

Examples

Assuming mob_profiles exists (see MobProfiles examples):

# Create second dataset
other_logbook_df = pd.DataFrame({...})
other_mob_profiles = champpy.MobProfiles(other_logbook_df)

# Add to existing mob_profiles
mob_profiles.add_mob_profiles(input_mob_profiles=other_mob_profiles,
                      old_cluster_label="Existing",
                      new_cluster_label="Added")
copy()[source]

Create Copy of Instance

reindexing(type='all')[source]

Reindex of IDs in the MobProfiles instance (id_journey, id_vehicle, id_cluster).

Parameters:

type (Literal['all', 'id_journey', 'id_vehicle', 'id_cluster']) – Specifies which IDs to reindex. Default is “all”. - “all”: Reindex all IDs (id_journey, id_vehicle, id_cluster) - “id_journey”: Reindex only journey IDs - “id_vehicle”: Reindex only vehicle IDs - “id_cluster”: Reindex only cluster IDs

Return type:

None

class champpy.Logbooks(input_df=None, frozen=False)[source]

Component class included in MobProfiles representing the logbooks with all journeys.

The Logbooks class represents the logbook data of journeys, including departure and arrival times, locations, and distances. The class holding a dataframe df that contains the data. It is included as a component in the MobProfiles class and can be accessed via its instances. It provides methods to add, update, and delete journeys, as well as to restore location continuity and convert temporal resolution. The Logbooks class ensures data integrity through validation with a Pandera schema.

The DataFrame (accessible via df) contains the following columns:

Column

Type

Description

id_journey

int

One-based index for journeys. This column is optional will be generated if not provided in the input DataFrame.

id_vehicle

int

One-based index for vehicles, connected to id_vehicle in input_vehicles_df.

dep_dt

pandas.Timestamp

Departure datetime of each journey.

arr_dt

pandas.Timestamp

Arrival datetime of each journey.

dep_loc

int

Departure location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.

arr_loc

int

Arrival location of each journey as integer above 0. You can for example define 1 for home, 2 for work, etc. The location = 0 is reserved for driving and not allowed in this dataframe.

distance

float

Distance of each journey in km.

duration

float

Duration of each journey in hours.

speed

float

Speed of each journey in km/h.

Parameters:
  • input_df (DataFrame) – Input DataFrame for the logbooks. Please see column description in Logbooks for required columns and types. The column id_journey is optional and will be generated if not provided in the input DataFrame. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.

  • frozen (bool) – If True, the Logbooks instance is immutable after creation. Default is False.

add_journeys(input_df)[source]

Add journeys from a DataFrame to the logbook.

Parameters:

input_df (DataFrame) – DataFrame with journey data. Please see column description in Logbooks for required columns and types. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Create new journeys DataFrame
new_journeys_df = pd.DataFrame({
    "id_vehicle": [1, 1],
    "dep_dt": [pd.Timestamp("2024-01-01 08:00"), pd.Timestamp("2024-01-01 10:00")],
    "arr_dt": [pd.Timestamp("2024-01-01 09:00"), pd.Timestamp("2024-01-01 11:00")],
    "dep_loc": [1, 2],
    "arr_loc": [2, 3],
    "distance": [10.0, 15.0]
})

# Add journeys to logbooks
mob_profiles.logbooks.add_journeys(new_journeys_df)
delete_journeys(id_journey)[source]

Delete journeys by journey ID.

Parameters:

id_journey (list) – List of journey IDs to delete.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Delete the first two journeys of the logbook
mob_profiles.logbooks.delete_journeys(id_journey=[1, 2])
restore_location_continuity(target='dep')[source]

Restore location continuity by overwriting either dep_loc or arr_loc.

Meaning location continuity: the departure location (dep_loc) of every journey for a vehicle must have the same value as the arrival location (arr_loc) of the previous journey.

Parameters:

target (Literal['dep', 'arr']) – “dep” (default): set dep_loc to previous arr_loc. “arr”: set arr_loc to next dep_loc.

Return type:

None

update_journeys(input_df)[source]

Update existing journeys in the logbook based on id_journey.

Parameters:

input_df (DataFrame) – DataFrame with journey data. Please see column description in Logbooks for required columns and types. Must include id_journey column. The columns duration and speed are not required as they are calculated. They will be ignored if provided in the input DataFrame.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Get the data of the first two journeys and modify its departure times and distance
updated_journeys_df = mob_profiles.logbooks.df.head(2)

updated_journeys_df.loc[:, "arr_dt"] = updated_journeys_df.loc[:, "arr_dt"] + pd.Timedelta(minutes=30)
updated_journeys_df.loc[:, "distance"] = updated_journeys_df.loc[:, "distance"] + 5.0

# Update journeys in logbooks
mob_profiles.logbooks.update_journeys(updated_journeys_df)
property df: DataFrame

Get a copy of the DataFrame of the data component. If the DataFrame is None, return an empty DataFrame with the correct schema.

property number: int

Return the number of entries in the DataFrame df.

property temp_res: float

Temporal resolution of the logbook in hours.

Getter:

Returns the current temporal resolution of the logbook in hours. If no temporal resolution has been set, returns None.

Setter:

Set the temporal resolution of the logbook in hours. This will convert the logbook to the specified temporal resolution by merging overlapping/adjacent journeys per vehicle.

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Get current temporal resolution (initially None)
current_res = mob_profiles.logbooks.temp_res

# Set temporal resolution to 1 hour
# This will merge journeys that overlap or are adjacent within 1-hour intervals
mob_profiles.logbooks.temp_res = 1.0

# Check the new temporal resolution
print(mob_profiles.logbooks.temp_res)  # Output: 1.0
class champpy.Vehicles(input_df=None, frozen=False)[source]

Component class included in MobProfiles representing vehicles.

The Vehicles class manages vehicle-level metadata. It is included as a component in the MobProfiles class and can be accessed via its instances.

The DataFrame (accessible via df) contains the following columns:

Column

Type

Description

id_vehicle

int

Vehicle identifier. One-based index for vehicles.

first_day

pandas.Timestamp

First recorded day of the vehicle.

last_day

pandas.Timestamp

Last recorded day of the vehicle.

id_cluster

int

Cluster assignment (optional, default: 1). Used to group vehicles into different clusters.

first_loc

int

First location of the vehicle (optional, default: None). Use the same location encoding as in the logbooks.

Parameters:
  • input_df (DataFrame) – Input DataFrame for the vehicles. Please see column description above for required columns and types.

  • frozen (bool) – If True, the Vehicles instance is immutable after creation. Default is False.

add_vehicles(input_df)[source]

Add vehicles from a DataFrame.

Parameters:

input_df (DataFrame) – DataFrame with vehicle data to add. See column description table in Vehicles for required columns.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Create new vehicles DataFrame
new_vehicles_df = pd.DataFrame({
    "id_vehicle": [3, 4],
    "first_day": pd.to_datetime(["2020-01-01", "2020-01-02"]),
    "last_day": pd.to_datetime(["2020-01-03", "2020-01-04"]),
    "id_cluster": [1, 1],
    "first_loc": [1, 2]
})
# Add vehicles from a DataFrame
mob_profiles.vehicles.add_vehicles(input_df=new_vehicles_df)
delete_vehicles(id_vehicle)[source]

Delete vehicles by vehicle ID.

Parameters:

id_vehicle (list) – List of vehicle IDs to delete.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Delete the second vehicle and all its journeys
mob_profiles.vehicles.delete_vehicles(id_vehicle=[2])
generate_vehicles_from_logbooks(logbooks)[source]

Generate vehicle DataFrame from a Logbooks instance.

Parameters:

logbooks (Logbooks) – Logbooks instance with journey data to generate vehicles from.

Return type:

None

set_first_loc_from_logbooks(logbooks)[source]

Set first_loc for each vehicle based on the first dep_loc in the logbooks.

Parameters:

logbooks (Logbooks) – Logbook instance with journey data to extract first locations from.

Return type:

None

update_vehicles(input_df)[source]

Update existing vehicles based on id_vehicle. Replaces all columns for matching vehicles with values from input_df.

Parameters:

input_df (DataFrame) – DataFrame with vehicle data to add. See column description table in Vehicles for required columns.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Get dataframe of the second vehicle and set its cluster to 2
updated_vehicles_df = mob_profiles.vehicles.df[mob_profiles.vehicles.df["id_vehicle"] == 2]
updated_vehicles_df.loc[:, "id_cluster"] = 2

# Update vehicles from a DataFrame
mob_profiles.vehicles.update_vehicles(input_df=updated_vehicles_df)
property df: DataFrame

Get a copy of the DataFrame of the data component. If the DataFrame is None, return an empty DataFrame with the correct schema.

property number: int

Return the number of entries in the DataFrame df.

class champpy.Clusters(vehicles=None, frozen=False)[source]

Component class included in MobProfiles representing vehicle clusters.

The Clusters class manages cluster assignments for vehicles in the mobility data. It is included as a component in the MobProfiles class and can be accessed via its instances. The clusters DataFrame is automatically generated from the vehicles DataFrame and cannot be set directly, but can be updated via the update methods.

The DataFrame (accessible via df) contains the following columns:

Column

Type

Description

id_cluster

int

Cluster identifier.

label

str

Human-readable label for the cluster.

Parameters:
  • vehicles (Vehicles | None) – Vehicles instance with vehicle data including ‘id_cluster’ column. If provided, clusters will be automatically generated from the unique cluster IDs.

  • frozen (bool) – If True, the Clusters instance is immutable after creation. Default is False.

update_clusters(input_df)[source]

Update existing clusters based on id_cluster. Replaces all columns for matching clusters with values from input_df.

Parameters:

input_df (DataFrame) – DataFrame with cluster data to update. See column description table in Clusters for required columns.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Get current clusters DataFrame
clusters_df = mob_profiles.clusters.df

# Update cluster labels
clusters_df.loc[clusters_df["id_cluster"] == 1, "label"] = "Private Vehicles"

# Apply updated labels
mob_profiles.clusters.update_clusters(clusters_df)
update_clusters_from_vehicles(vehicles)[source]

Update clusters DataFrame based on current vehicle DataFrame.

Parameters:

vehicles (Vehicles) – Vehicles instance with vehicle data including ‘id_cluster’ column.

Return type:

None

property df: DataFrame

Get a copy of the DataFrame of the data component. If the DataFrame is None, return an empty DataFrame with the correct schema.

property number: int

Return the number of entries in the DataFrame df.

class champpy.Locations(vehicles=None, logbooks=None, frozen=False)[source]

Component class included in MobProfiles representing locations used in journeys.

The Locations class manages location definitions for the mobility data. It is included as a component in the MobProfiles class and can be accessed via its instances. The locations DataFrame is automatically generated from the logbooks and vehicles DataFrames and cannot be set directly, but can be updated via the update methods. Location 0 is reserved for “Driving” and location 1 is typically “Home”.

The DataFrame (accessible via df) contains the following columns:

Column

Type

Description

location

int

Location identifier (0 = Driving, 1+ = stationary locations).

label

str

Human-readable label for the location (e.g., “Home”, “Work”, “Location 3”).

Parameters:
  • vehicles (Vehicles | None) – Vehicles instance to extract first_loc values from.

  • logbooks (Logbooks | None) – Logbooks instance to extract dep_loc and arr_loc values from.

  • frozen (bool) – If True, the Locations instance is immutable after creation. Default is False.

update_locations(input_df)[source]

Update existing locations based on location ID. Replaces all columns for matching locations with values from input_df.

Parameters:

input_df (DataFrame) – DataFrame with location data to update. See column description table in Locations for required columns.

Return type:

None

Examples

This example uses the instance mob_profiles defined in the MobProfiles examples:

# Get current locations DataFrame
locations_df = mob_profiles.locations.df

# Update location labels with meaningful names
locations_df.loc[locations_df["location"] == 2, "label"] = "Work"

# Apply updated labels
mob_profiles.locations.update_locations(locations_df)
update_locations_from_logbooks_vehicles(logbooks=None, vehicles=None)[source]

Update locations DataFrame based on unique dep_loc and arr_loc in logbooks.

Parameters:
  • logbooks (Optional[Logbooks]) – Logbooks instance with journey data to extract locations from.

  • vehicles (Optional[Vehicles]) – Vehicles instance with vehicle data to extract locations from.

Return type:

None

property df: DataFrame

Get a copy of the DataFrame of the data component. If the DataFrame is None, return an empty DataFrame with the correct schema.

property number: int

Return the number of entries in the DataFrame df.