PSW, Pedagogisch Sociaal Netwerk, is a Dutch care institution that focuses on providing day care, activities and assisted living for people with disabilities. In this specific use case, we will focus on the part of the organization that provides day care and activities.
For PSW, wages are their biggest expense. Therefor the more efficiently they can schedule their employees the better. Currently, the organization schedules its staff depending on the number of clients registered for day care. However, some of the clients do not show up for day care when they are planned in due to external factors such as the weather, travel distance etc.
The goal was using machine learning to predict the number of clients that will effectively show up for day care thus enabling the organization to plan the correct number of staff and optimizing their costs.
As a first step we started with the inhouse reporting model consisting of 2 fact-tables. These fact- tables contained the number of planned clients for day care per day and the number of clients and their actual presence for the last 2 years. Other components in de fact-table are the client travel distance.
3 tools were chosen for this project:
- Python for data preparation and cleaning.
- Azure ML for prediction.
- Power BI for visualizing the results.
The steps of the data science lifecycle:
- Data analysis & problem definition
- Data cleaning and preparation
- Minimum viable product
- Deployment
Step 5 and 6 (“ML Ops” and “Repeat”) were excluded due to the small scale of the use case.
Step 1: Data analysis & problem definition
DataSense performed an analysis of available data, the relations between this data, what information this data contained and the need of additional data.
We defined the problem as followed “predicting attendance on a Person level based on the planning”.
For this we needed extra data such as weather data and the distances between the location of the individual clients and the location of the day care.
Step 2: Data cleaning & Preparation
The data model provided by the organization was designed for reporting, but was insufficient for data science. Because of the importance of the Data Quality, additional data cleaning proved to be necessary.
Accordingly, the data was cleaned; filling empty values, correcting incorrect values, removing duplicates, etc.
Next the data was prepared; encoding of textual attributes, calculating target attribute (attended), adding weather data, adding coordinates for the locations and calculating distances.
This preparation resulted in 1 table of all historical data, with 1 target column, to be used as foundation for the training of the model.
Step 3: Minimal viable product
During the minimal viable product development phase, several algorithms were tested, trained, tuned, and compared to select the best possible foundation. Azure AutoML was employed to automate the experimentation process. The in-house expertise handled the analysis and preparation stages. To ensure the model’s accuracy, a separate dataset that was not used for training was used to validate the metrics.
Xgboost, an ensemble algorithm combining multiple decision trees, was chosen as a key building block of the model. Each subsequent tree in Xgboost aims to improve upon the previous one, enhancing the model’s predictive capabilities. This approach, combined with Azure AutoML and thorough validation, demonstrated the feasibility and effectiveness of the machine learning solution in this use case. The minimal viable product successfully validated the product idea early in the development cycle, paving the way for further enhancements and improvements.
Step 4: Deployment
In the deployment phase, the developed model was deployed as a service by exposing it as an API endpoint. This allowed the model to be accessed and utilized by other applications and systems within the organization. Additionally, integration with Power BI was established to facilitate data preparation and aggregation, enabling seamless visualization and analysis of the model’s results. This deployment strategy ensured the model’s accessibility and integration into the existing infrastructure, enhancing the decision-making process for the client.
In conclusion, this use case successfully implemented machine learning in Azure to optimize employee scheduling and reduce costs in a daycare department. By accurately predicting client attendance, staffing levels were efficiently planned, resulting in improved efficiency.
Key takeaways;
- Follow the data science lifecycle;
- Performing data cleaning and preparation
- Testing multiple algorithms with Azure AutoML
- Validating model accuracy
- Deploying the model as an API endpoint
- Integrating with Power BI for data visualization
- Achieving cost savings through data-driven decisions.
This implementation showcased the power of machine learning in enhancing resource allocation and operational efficiency for care institutions.