Using machine learning to predict drownings in surf beaches of southwest France

Bruno Castelle
Eric Tellier
Bruno Simmonet
Jeoffrey Dehez

December 4, 2023

Some context


Should France be your next vacation destination 🏖️ ?

South west France surf beaches

The place to be ! 🌟

Some (😨) numbers

Castelle et al. (2019)
  • One of the most dangerous coasts in the world (Castelle et al. 2019) : heavy rip current (Baïnes), shore break
  • Surveillance mainly during summer : Thousands of rescues each year
  • 20 to 30 fatal drownings each year
    (💪 0 between the flags)

Can we prevent drownings ? 🤔


Of course, there’s even a world class conference about it !

Can we prevent predict drownings ? 🤔

Spoiler : It was in the title

Previous work in France

  • Drowning risk prediction using Log. Regression (Tellier et al. 2021)
  • Based on emergency calls database
  • Probability of a drowning occuring each day, based on weather, oceanic and crowd data

Goal of this work

  • Same philosophy as (Tellier et al. 2021) : Daily prediction based on weather and beach crowd

  • Better (and cleaner) data

  • New statistical methods 🌟

New statistical methods 🌟

Machine learning 🤖 in a nutshell

AI == ML == statistics & IFELSES

Risk modelling strategy and methods


Let’s play with some data

What’s a risk ?

Our definition :

  • Hazard : Is it dangerous ?

  • Exposure : Is the beach crowded ?

The predictors

Hazard (daily maximum)

  • Wave incidence factor : \({cos}_4H = cos((278 - D\_{SWELL}) \times \frac{\pi}{180})^4\)

  • Wave factor : \(HsTp = H_{SWELL} \times P_{SWELL}\)

Exposure

  • \(T_{air}\) : air temperature in °C (3-day pred. & daily max value)

  • \(day\) : day (ex : \(6\) for \(6^{th}\) of July)

  • \(month\) : month (ex : \(7\) for July)

  • \(wday\) : weekday (ex : \(1\) for monday)

The outcome

  • Water inhalation & respiratory impairements (leads to emergency calls)
  • Emergency calls database from 2011 to 2022 | N = 522
  • Binary daily data (a drowning occured / no drowning occured)

Technical stuff 🛠️

Our challengers

Logistic regression : No tuning parameters

Random Forests : 3 tuning parameters

  • # of trees
  • # of random variables
  • minimum tree depth

XGBoost : 4 tuning parameters

  • same as RF
  • learning rate
  • spoiler: 🏆

Modelling strategy

About these steps

Pre-processing

  • Centering, scaling, dummy-coding
  • Synthetic Minority over-Sampling Technique (SMOTE) (Chawla et al. 2002) for the outcome
  • removing correlations (\(r > 0.9\))

Modelling strategy

About these steps

Modelling strategy

Results


360 models later…

Tuning workflow results

I have bad news…

Daily drownings events can’t be predicted with 100% precision.

What would be the best model ?

Improving accuracy : Discretization

  • The probability output by the model is discretized using 5 classes :
Testing set (N =600)
Risk class drownings No drownings
1 31 502
2 20 44
3 4 20
4 16 11
5 7 8

Discussion

  • What is the best drowning prediction model ? Should we prioritize lowering false negatives or false positives ? \(\rightarrow\) Risk management and political decisions
  • Low improvements over previous models \(\rightarrow\) dataset limitation ?
  • Emergency call database only covers a minority(😱) of all rescues

Perspectives and future work

Perspectives and future work

Want to try this at home ?

Try our 📦 {DrowningPrediction} on and get in touch with us !

Thank you!


References

Castelle, Bruno, Tim Scott, Rob Brander, Jak McCarroll, Arthur Robinet, Eric Tellier, Elias de Korte, Bruno Simonnet, and Louis-Rachid Salmi. 2019. “Environmental Controls on Surf Zone Injuries on High-Energy Beaches.” Natural Hazards and Earth System Sciences 19 (10): 21832205.
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “SMOTE: Synthetic Minority over-Sampling Technique.” Journal of Artificial Intelligence Research 16 (June): 321–57. https://doi.org/10.1613/jair.953.
Dehez, Jeoffrey, and Sandrine Lyser. 2021. Fréquentation des plages océanes et risques de baignade en Aquitaine en 2020. Une étude exploratoire.” Research Report. INRAE. https://hal.science/hal-03549020.
Sacks, Jerome, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. 1989. “Design and Analysis of Computer Experiments.” Statistical Science 4 (4). https://doi.org/10.1214/ss/1177012413.
Tellier, Éric, Bruno Simonnet, Cédric Gil-Jardiné, Marion Lerouge-Bailhache, Bruno Castelle, and Rachid Salmi. 2021. “Predicting Drowning from Sea and Weather Forecasts: Development and Validation of a Model on Surf Beaches of Southwestern France.” Injury Prevention 28 (1): 16–22. https://doi.org/10.1136/injuryprev-2020-044092.