Understanding spatial concentrations of road accidents using frequent item sets

doi:10.1016/j.aap.2005.03.023

Accident Analysis & Prevention

Volume 37, Issue 4, July 2005, Pages 787-799

https://doi.org/10.1016/j.aap.2005.03.023 Get rights and content

Abstract

This paper aims at understanding why road accidents tend to cluster in specific road segments. More particularly, it aims at analyzing which are the characteristics of the accidents occurring in “black” zones compared to those scattered all over the road. A technique of frequent item sets (data mining) is applied for automatically identifying accident circumstances that frequently occur together, for accidents located in and outside “black” zones. A Belgian periurban region is used as case study. Results show that accidents occurring in “black” zones are characterized by left-turns at signalized intersections, collisions with pedestrians, loss control of the vehicle (run-off-roadway) and rainy weather conditions. Accidents occurring outside “black” zones (scattered in space) are characterized by left turns on intersections with traffic signs, head-on collisions and drunken road user(s). Furthermore, parallel collisions and accidents on highways or roads with separated lanes, occurring at night or during the weekend are frequently occurring accident patterns for all accident locations. These exploratory results show the potentiality of the frequent item set method in addition to more classical statistical techniques, but also suggest that there is no unique countermeasure for reducing the number of accidents.

Introduction

Traffic collisions remain one of the leading causes of premature death and morbidity in most countries. In Belgium as in many European countries, traffic safety is currently one of the government's priorities. Identifying dangerous accident locations and profiling them in terms of accident-related data and location/environmental characteristics provide new insights into the complexity and causes of road accidents.

Long ago, the spatial structure of road accidents was demonstrated, but no official and universal agreement exists for defining significant spatial concentrations of road accidents. In general, methods developed for identifying accident concentrations often apply to hot spots (also called “black” spots, hazardous locations, sites with promise, etc.) which are pinpoint concentrations of road accidents that often migrate over time (see e.g. Silcock and Smyth, 1985, Maher, 1990, Nguyen, 1991, Joly et al., 1992, Hauer, 1996, Thomas, 1996 or Vandersmissen et al., 1996). More recently, the identification of “black” zones or hazardous road segments has been reconsidered in literature (see Flahaut et al., 2003 for a review); they arise from the awareness of the spatial interaction existing between contiguous accident pinpoint locations. The existence of such road sections on which the number of accidents is high reveals spatial concentrations and hence suggests spatial dependence between individual accidents’ occurrences. In fact, these studies focus on a well-known exploratory spatial data analysis problem: the definition and the explanation of hot spots (see e.g. Levine, 2002 or Vistisen, 2002).

In this paper, the location and the length of the “black” zones are defined by means of local spatial autocorrelation indices (see Section 3.2), and they are considered as given in our problem. Therefore, the problem tackled in this paper is not the definition of the “black” zone, but its exploration. We argue that, indeed, it is not possible to develop effective countermeasures to reduce the number of accidents at these locations without being able to properly and systematically relate accident frequency and severity to a number of variables such as roadway geometries, traffic control devices, roadside features, roadway conditions, driver behavior or vehicle type (Kononov and Janson, 2002). Hence, several attempts are found in literature for explaining the spatial variation of road unsafety at several levels of spatial aggregation (see Flahaut, 2004a, Flahaut, 2004b for a review). Our approach, however, is purely exploratory, i.e. to understand how road accidents cluster in hazardous road segments. More specifically, we are interested in finding out which factors are associated to the accidents in “black” zones by generating frequent item sets. This data mining technique automatically identifies accident circumstances that frequently occur together. This way, we expose a number of hypotheses, which we then try to explain using other research studies and domain knowledge. Statistical models have been widely used on such accident data to analyze road crashes in order to explain the relationship between crash involvement and traffic on the one hand and geometric and environmental factors on the other hand (Lee et al., 2002). However, Chen and Jovanis (2002) indicate that not only the main effects of driver, vehicle, roadway and environmental factors should be analyzed, interactions between factors are also very likely to be significant. The authors demonstrate that the large number of potentially important factors, combined with the complex nature of crash etiology and injury outcome present certain challenges when using classic statistical analysis on datasets with large dimensions such as an exponential increase in the number of parameters as the number of variables increases and the invalidity of statistical tests as a consequence of sparse data in large contingency tables. Furthermore, a large number of factors need to be selected and a comprehensive but feasible set of main factors and interactions need to be specified for testing in statistical models.

This is where data mining comes into play. Data mining can be defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large amounts of data (Fayyad et al., 1996). From a statistical perspective it can be viewed as a computer automated exploratory data analysis of (usually) large complex data sets (Friedman, 1997). However, in contrast with statistical techniques, the problems and methods of data mining have some distinct features of their own. Not only can data sets be much larger than in statistics and are data analyses on a correspondingly larger scale, there are also differences of emphasis in the approach to modeling: compared with statistics, data mining pays less attention to the large-scale asymptotic properties of its inferences and more to the general philosophy of “learning”, including consideration of the complexity of models and the computations they require (Hosking et al., 1997). Furthermore, data mining has tackled with problems such as what to do in situations where the number of variables is so large that looking at all pairs of variables is computationally infeasible (Mannila, 2000). Additionally, in contrast with statistics, data mining is typically a form of secondary data analysis: the data has been collected for some other purpose than for answering a specific data analytical question. For the purposes of this paper it is sufficient to point out that statistical models are particularly likely to be preferable when fairly simple models are adequate and the important variables can be identified before modeling. However, when dealing with a large and complex data set of road accidents, the use of data mining methods seems particularly useful.

In literature some examples of the use of data mining in road accidents analyses can be found. For example, clustering techniques are used to discover frequent patterns in accident data (see e.g. Ljubic et al., 2002). Additionally, the data mining technique of rule induction can be used to identify rule sets representing interesting subgroups in accident data (see e.g. Kavsek et al., 2002). Furthermore, decision trees (see e.g. Strnad et al., 1998, Clarke et al., 1998) and neural networks (see e.g. Mussone et al., 1999) are used to model and analyze road accidents. Finally, spatial data mining (see e.g. Zeitouni and Chelghoum, 2001) can be applied.

In this research, data mining is applied for understanding the characteristics of the accidents associated to “black” zones or hazardous road segments. In particular, an existing technique of frequent item sets is used as an explorative technique to generate accident patterns, which can give rise to possible new and surprising accident patterns that were not yet found in other research. More specifically, accident circumstances that frequently occur together inside “black” zones will be identified. Furthermore, these patterns are compared with accident characteristics occurring outside those “black” zones. This allows the investigation of the differences between accident patterns inside and outside “black” zones, and hence to understand why spatial concentrations are observed.

The remainder of this paper is organized as follows. First a formal introduction to the association algorithm and the concept of frequent item sets is provided (Section 2). This will be followed by a description of the dataset and the studied area (Section 3). In Section 4, the empirical study is explained and in Section 5 the results of this study are presented. The paper will be completed with a summary of the conclusions and directions for future research.

Section snippets

KDD process

As explained in the introduction, data mining is used to discover patterns and relationships in data, with an emphasis on large, observational databases (Friedman, 1997). According to Fayyad et al. (1996) data mining can be considered as a separate step of the “knowledge discovery in databases” (KDD) process (see Fig. 1). This KDD process refers to the overall process of discovering useful knowledge from data. The additional steps in the KDD process, such as data preparation, data selection,

The studied area

In Belgium, each road accident occurring on a public road and involving casualties is reported officially (National Institute of Statistics). Its location is known accurately on numbered roads because there is a stone marker at every hectometer; numbered roads are motorways, national and provincial roads linking towns together. Hence, this analysis is limited to accidents with casualties on numbered roads. The period under study is 1997–1999: it is long enough to limit random fluctuations in

Empirical study

As explained in Section 2.1 of this paper, we can distinguish different steps in the mining process: a pre-processing step and a transformation step in which the available data are prepared for the use of the mining technique, a mining step for generating the frequent item sets and a post-processing step for evaluating and interpreting the most interesting patterns.

Accident patterns in “black” zones

Selecting the frequent item sets that are unique for accidents occurring inside a “black” zone and with very strong lift values results in 50 item sets of size 2 (lift < 0.5 or lift > 5), 108 item sets of size 3 (lift < 0.5 or lift > 5) and 240 item sets of size 4 (lift < 0.5 or lift > 15). Table 2 gives an overview of the most interesting of these frequent item sets. In the remainder of this paper, we will refer to the number of these item sets [N] when discussing the results.

A first result shows that

Frequent item sets and accident analysis

In this paper, the association algorithm was used on a data set of road accidents to profile “black” zones in terms of accident-related data and location characteristics. More specifically, frequent item sets are generated to identify accident circumstances that frequently occur together in order to find out which factors explain the occurrence of the accidents in “black” zones. As explained in the introduction, the use of this technique coincides with the explorative character of this research

Acknowledgements

This research was supported by the OSTC and the Flemish Research Centre for Traffic Safety. The authors would also like to thank dr. Tom Brijs for his encouragement and helpful suggestions.

References (57)

S. Baker et al.
Motor vehicle deaths in children: geographic variations
Accid. Anal. Prev.
(1991)
H. Brodsky et al.
Risk of a road accident in rainy weather
Accid. Anal. Prev.
(1988)
S.T. Doherty et al.
The situational risks of young drivers: the influence of passengers, time of day, and day of week on accident rates
Accid. Anal. Prev.
(1998)
J. Edwards
Weather-related road accidents in England and Wales: a spatial analysis
Accid. Anal. Prev.
(1996)
B. Flahaut et al.
The local spatial autocorrelation and the kernel method for identifying ‘black’ zones. A comparative approach
Accid. Anal. Prev.
(2003)
B. Flahaut
Impact of infrastructure and local environment on road insecurity. Logistic modeling with spatial autocorrelation
Accid. Anal. Prev.
(2004)
P. Greibe
Accident prediction models for urban roads
Accid. Anal. Prev.
(2003)
J. Hosking et al.
A statistical perspective on data mining
Future Gen. Comput. Syst.
(1997)
A. Julien et al.
Cheminements piétonniers et exposition au risque
Recherche Transports Sécurité
(2002)
L. Larsen et al.
Multidisciplinary in-depth investigations of head-on and left-turn road collisions
Accid. Anal. Prev.
(2002)

E. LaScala et al.

Demographic and environmental correlates of pedestrian injury collisions: a spatial analysis

Accid. Anal. Prev.

(2000)

J. Lee et al.

Impact of roadside features on the frequency and severity of run-off-roadway accidents: an empirical analysis

Accid. Anal. Prev.

(2002)

M. Maher

A bivariate negative binomial model to explain traffic accident migration

Accid. Anal. Prev.

(1990)

J.-L. Martin

Relationship between crash rate and hourly traffic flow on interurban motorways

Accid. Anal. Prev.

(2002)

L. Mussone et al.

An analysis of urban collisions using an artificial intelligence model

Accid. Anal. Prev.

(1999)

S. Rajalin

The connection between risky driving and involvement in fatal accidents

Accid. Anal. Prev.

(1994)

M. Strnad et al.

Young children injury analysis by the classification entropy method

Accid. Anal. Prev.

(1998)

I. Thomas

Spatial data aggregation: exploratory analysis of road accidents

Accid. Anal. Prev.

(1996)

Y. Wong et al.

Driver behaviour at horizontal curves: risk compensation and the margin of safety

Accid. Anal. Prev.

(1992)

Agent, K.R., Deen, R.C., 1975. Relationship between roadway geometrics and accidents. Transportation Research Record...

R. Agrawal et al.

Mining association rules between sets of items in large databases

R. Agrawal et al.

Fast Discovery of Association Rules Advances in Knowledge Discovery and Data Mining

(1996)

S.S. Anand et al.

Tackling the cross sales problem using data mining

L. Anselin

Local indicators of spatial association-LISA

Geographical Anal.

(1995)

M. Berry et al.

Data Mining Techniques for Marketing, Sales and Customer Support

(1997)

M. Braddock et al.

Using a geographic information system to understand child pedestrian injury

Am. J. Public Health

(1994)

S. Brin et al.

Beyond market baskets: generalizing association rules to correlations

Casaer, F., Eckhardt, N., Steenberghen T., Thomas, I., Wets, G., Quality assessment of the Belgian traffic accident...

Cited by (84)

Environmental impacts of bicycling in urban areas: A micro-simulation approach
2023, Transportation Research Part D: Transport and Environment
We develop a microsimulation model of urban transport for a city of a medium size and use it to evaluate the impacts of modal switch to bicycle. An activity-based approach is used to generate daily transportation schedules for several groups of households. We first consider the case of mixed traffic, where the bicycles and cars share the same lanes, and find that a significant modal switch to bicycle has indisputable benefits on road congestion and emissions of pollutant gases. We then consider the development of cycle-paths, where the two modes run on separate lanes, and find that it improves the benefits obtained under mixed traffic. At the same time, our analysis shows that a small modal switch (below 15%) to bicycle does not necessarily produce the expected benefits. This is because in uncongested road sections, the bicycles cause delays to other vehicles.
Using text mining and multilevel association rules to process and analyze incident reports in China
2023, Accident Analysis and Prevention
Incident investigation reports provide information on defects related to the system safety and indications for improvements. Currently, the analysis of these reports relies heavily on expert’ experience. The foreseeable work-load and lack of understanding about the importance of near misses have created a situation where severe accidents are rigorously investigated, and minor incidents are often omitted. Consequently, incident reports have not been fully analyzed to provide sufficient solutions.
The aim of this research is to propose a framework that uses text mining and multilevel association rules to efficiently structure Chinese incident reports and identify important incident patterns, providing an analysis of trends, rectification strategies, and guidance for safety management.
A case study of a construction company in China was conducted using two years of incident data dated 2018–2019, including accidents and near misses. To identify incident elements, a pattern extraction workflow involving TextRank, and domain pertinence was devised based on the linguistic and writing styles of Chinese reports. A concept hierarchy was applied to determine the taxonomic relationships within the risk factors. Multilevel association rule mining was adopted and proven to deliver more comprehensive pattern indications. Comparative and cross-analysis of patterns in different time periods revealed the severity and temporal features of incidents as well as the effectiveness of preventive and precautionary measures. The results also highlight the importance of learning from near miss events. Decision makers can formulate countermeasures and management policies based on these results to improve safety performance.
Modeling spatiotemporal interactions in single-vehicle crash severity by road types
2023, Journal of Safety Research
Introduction: Spatiotemporal correlations have been widely recognized in single-vehicle (SV) crash severity analysis. However, the interactions between them are rarely explored. The current research proposed a spatiotemporal interaction logit (STI-logit) model to regression SV crash severity using observations in Shandong, China. Method: Two representative regression patterns-mixture component and Gaussian conditional autoregression (CAR)-were employed separately to characterize the spatiotemporal interactions. Two existing statistical techniques-spatiotemporal logit and random parameters logit-were also calibrated and compared with the proposed approach with the aim of highlighting the best one. In addition, three road types-arterial road, secondary road, and branch road-were modeled separately to clarify the variable influence of contributors on crash severity. Results: The calibration results indicate that the STI-logit model outperforms other crash models, highlighting that comprehensively accommodating spatiotemporal correlations and their interactions is a recommended crash modeling approach. Additionally, the STI-logit using mixture component fits crash observations better than that using Gaussian CAR and this finding remains stable across road types, suggesting that simultaneously accommodating stable and unstable spatiotemporal risk patterns can further strengthen model fit. According to the significance of risk factors, there is a significant positive correlation between distracted diving, drunk driving, motorcycle, dark (without street lighting), and collision with fixed object and serious SV crashes. Truck and collision with pedestrian significantly mitigate the likelihood of serious SV crashes. Interestingly, the coefficient of roadside hard barrier is significant and positive in branch road model, but it is not significant in arterial road model and secondary road model. Practical Applications: These findings provide a superior modeling framework and various significant contributors, which are beneficial for mitigating the risk of serious crashes.
Strategic planning support for road safety measures based on accident data mining
2022, IATSS Research
Citation Excerpt :
For road accident data, comprehensive clustering methods and similarity measures are presented in [10–12], and [13]. Another unsupervised method used for accident analysis is frequent itemset mining which results in information about the (relative) frequency of co-occurring accident features, as applied by [14]. When not only the co-occurrence, but also the direction of the relationship is of interest, frequent itemset mining can be extended to association rules mining.
When actions and measures to increase road safety are to be planned by the police and local authorities, it is necessary to consider the specific accident circumstances as well as their historical, current, and predicted course. In particular, combinations of accident circumstances not contained in existing police statistics are often neglected, but may nevertheless be relevant, e. g., due to an increasing frequency. In order to identify these undiscovered interesting combinations, we propose a framework to support strategic planning of road safety measures based on several consecutive data mining stages. The scope, type, and location of road safety measures must be planned at a strategic level several months in advance to be fully effective. Therefore, it is essential to investigate and predict the accident circumstances and the temporal changes in their frequency comprehensively. Only with the knowledge, e. g., about the temporal pattern, locations, conditions of roads or speeds, meaningful actions can be derived. The embedded data mining approaches, i. e., frequent itemset mining, time series clustering, time series classification, forecasting, and scoring, are carefully selected, coordinated, and aligned. As a result, the framework provides police users with information about circumstances of accidents that are of interest in the future and presents their previous temporal and local patterns in a dashboard. In this study, the framework is applied in four different geographical regions. Thereby, default parameter settings for all approaches are found that are particularly suitable for the framework to investigate novel geographic regions.
Exploring Factors in a Crossroad Dataset Using Cluster-Based Association Rule Mining
2022, Procedia Computer Science
Investigating the contributory factors in crossroad accidents is a high-priority issue in the traffic safety analysis. This study exploits a method based on association rules to analyze these contributory factors. Using data about one year of crossroad traffic accidents in Isfahan, Iran, 63 and 156 association rules are generated for non-serious and serious accidents, respectively. The results show that both accident severity levels are associated with head-to-the-side collisions and the spring season. The frequency of non-serious accidents is about 38% higher than that of serious accidents. However, the association analysis results show that serious accidents are associated with more influencing factors than non-serious. Seat belt usage and road surface condition are additional decisive factors for serious accidents but not so for non-serious. The association analysis reveals that many influencing factors (such as traffic lights and the existence of a traffic enforcement camera) exhibit effects only under some specific circumstances (e.g., the peak of traffic).
Investigating fatal and injury crash patterns of teen drivers with unsupervised learning algorithms
2021, IATSS Research
Teenagers have been emphasized as a critical driver population class because of their overrepresentation in fatal and injury crashes. The conventional parametric approaches rest on few predefined assumptions, which might not always be valid considering the complicated nature of teen drivers' crash characteristics that are reflected by multidimensional crash datasets. Also, individual attributes may be more speculative when combined with other factors. This research employed joint correspondence analysis (JCA) and association rule mining (ARM) to investigate the fatal and injury crash patterns of at-fault teen drivers (aged 15 to 19 years) in Louisiana. The unsupervised learning algorithms can explore meaningful associations among crash categories without restricting the nature of variables. The analyses discover intriguing associations to understand the potential causes and effects of crashes. For example, alcohol impairment results in fatal crashes with passengers, daytimes severe collisions occur to unrestrained drivers who have exceeded the posted speed limits, and adverse weather conditions are associated with moderate injury crashes. The findings also reveal how the behavior patterns connected with teen driver crashes, such as distracted driving in the morning hours, alcohol intoxication or using cellphone in pickup trucks, and so on. The research results can lead to effectively targeted teen driver education programs to mitigate risky driving maneuvers. Also, prioritizing crash attributes of key interconnections can help to develop practical safety countermeasures. Strategy that covers multiple interventions could be more effective in curtailing teenagers' crash risk.

View all citing articles on Scopus

View full text

Understanding spatial concentrations of road accidents using frequent item sets

Abstract

Introduction

Section snippets

KDD process

The studied area

Empirical study

Accident patterns in “black” zones

Frequent item sets and accident analysis

Acknowledgements

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Future Gen. Comput. Syst.

Recherche Transports Sécurité

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Accid. Anal. Prev.

Mining association rules between sets of items in large databases

Fast Discovery of Association Rules Advances in Knowledge Discovery and Data Mining

Tackling the cross sales problem using data mining

Local indicators of spatial association-LISA

Geographical Anal.

Data Mining Techniques for Marketing, Sales and Customer Support

Using a geographic information system to understand child pedestrian injury

Am. J. Public Health

Beyond market baskets: generalizing association rules to correlations