Crime Data Visualization and Spatial Database Management System
Michael Wyland
CS 6604 – Virginia Tech – NVC
24 April 2008
Agenda
Overview
Project Context & Objectives
Design
Database design
Data Loading
Optimization
Application Interface
Extensions
Hotspot Detection & DBSCAN
Future Work
Demonstration
Overview
Implemented Crime Spatial Database Management System components
Crime Data Engine
Mapping and Visualization Capabilities
http://crime.dnsalias.com:9090/crime/
Project Context
Design and implement a Crime Spatial Database Management System and Application to support
General Public (awareness, safety)
Police officer (optimize resources)
Crime analyst (trending, patterns)
Project Objectives
Configure project infrastructure including Spatial DBMS
Implement physical model for crime data
Load Fairfax County Police Department data into the database
Develop required queries (Window and K-Nearest Neighbor)
Optimize query performance
Project Objectives (cont’d)
Implement Hotspot Identification (Cluster)
Integrate mapping and visualization interfaces
Create a publishable quality site
Data Structure
Crime records need to have some consistent attributes
Date/time of crime
Type of crime
Narrative or description of the crime
Address of the crime
Data Structure (cont’d)
Spatial database and application requires that we store some additional info
Crime ID
Latitude/longitude location of crime
Spatial object
Geocode accuracy
A single relation was defined:
Loading the Data
Wanted to pursue the Fairfax County Police Department data
Seemed like everyone else was doing City of Falls Church Police Department data…
FFX County is also a larger dataset both in number of records and space to play with
Loading the Data (cont’d)
Several Challenges
All crime reports were in .pdf or .doc format which are difficult to access without an API
No standard format/structure exists, so each crime record has a slightly different format
To find date/time attribute requires literally reading the report text
Reports actually contain some errors, like incorrect crime types in some cases
Non-specific locations and times
Data Loading Strategy
Did not design a parser
Given the challenges above, short timeline, and purpose of this class (Not ‘Parsers 101’)
Designed web-based data input interface
Required human assistance to manually read the reports and enter the crime info
My wife entered about 1251 records into my input interface
8 of 95 .doc files were reviewed and input
Data Loading Strategy
Enter the “block” address
Address is GeoCode’d using Google Maps API via Geocoding object and JSON response object
Enter date/time, crime type, narrative
Record saved in crimes table along with GeoCode results
No need to GeoCode on-the-fly later
Performance Planning
Indexes
Normal primary key indexes implemented
Implemented R-tree index for spatial objects
Hints for spatial index are used in my queries, based on Oracle Spatial best practice documentation
Examined Oracle Query Plans
Confirmed that the required queries are utilizing the normal and spatial indices and working efficiently
System responds in a timely fashion to multiple concurrent, large queries
Query Processing
Web app builds Dynamic SQL based on user input
Web app interfaces with database (CDE) and passes SQL
CDE retrieves data and passes data set back
Web app binds data to visualizations
System Platform
Software
Windows XP Home
IIS 5
Availability
Oracle 10g Express Edition with Oracle Locator (reduced, free version of Oracle Spatial)
Familiarity
Licensing
Just enough features
Google Maps API
Dundas Chart for ASP.NET Professional Evaluation
Hardware
Server: Dell Dimension 8400
Client: Dell Inspiron 2650 Notebook
Development
Visual Studio 2005
ASP.NET in Visual Basic
Oracle Data Provider for .NET (ODP.NET)
Demo Architecture
Crime Hotspot Identification
Chose to implement Crime Hotspot Identification using the DBSCAN algorithm
DBSCAN is a density-based spatial clustering algorithm driven by two parameters
“Eps” – epsilon radius from each point
“MinPts” – minimum neighbors to categorize points
Straightforward approach makes for timely integration into the project
Density-based approach is good for crime hotspots
Resistance to noise makes DBSCAN a good choice for this application
DBSCAN Review
Density = # of points within a specified radius
Density is a “score” for each point based on Eps
Each point in the database is categorized
A core point has more than a specified number of points (MinPts) within Eps
A border point has fewer than MinPts within Eps but is in the neighborhood of a core point
A noise point is any point that is not a core or border
DBSCAN Implementation
To support incremental application of the DBSCAN algorithm, an additional relation was defined:
For each point this stores the density score for a given Eps, a categorization based on MinPts, and a cluster label
DBSCAN Implementation
First step is to calculate a density score for all points, based on Eps
This is implemented using the sdo_within_distance function for each point, and counting results (grouped)
Results are inserted into the clusters table
For points which have no neighbors within Eps, they are directly added to the clusters table with a zero Density score
DBSCAN Implementation
Now that we have Density scores for all points, we need to categorize them
Update the cluster table to mark each point as core, border, or noise
If Density score >= MinPts
Mark as a core point
If Density score < MinPts but the point is within Eps distance of a core point
Mark as a border point
If not marked as core or border
Mark as a noise point
DBSCAN Implementation
Now that we have categorized each point as core, border, or noise we can get our first visual look at the results…
DBSCAN Point Categorization
Core, Border, Noise
This example uses Eps=0.5, MinPts=20
An Early Insight
This example visually confirms that there are many noise points, so DBSCAN looks like a smart choice of clustering algorithm
DBSCAN is fairly resistant to noise
Note: level of noise is similar with other Eps and MinPts values
DBSCAN Implementation
Next we need to actually identify and label the clusters
But first we eliminate noise points
Delete these from the clusters table
Let’s review the labeling algorithm…
DBSCAN Cluster Labeling
DBSCAN Cluster Labeling
So for each core point…
If the core point is not in a cluster already, label it as part of a new cluster
Find all of its neighbors within distance Eps, and if they are not in a cluster already, label them as part of the same cluster too
Then move on to the next unlabeled core point (start of next cluster)
These labels are updated in the clusters table
DBSCAN Implementation
Now that we have labeled the clusters, we can get a visual look at the final results…
This example uses Eps=0.25, MinPts=20
Notice Tysons, Rt 1 corridor, Springfield Mall…
DBSCAN Implementation
Implemented DBSCAN as a stored procedure to be easily called with varying Eps and MinPts (and time range, crime types, etc.)
Recalculates all clusters relatively quickly
Let’s review how we optimize Eps and MinPts…
DBSCAN Optimization
Uses concept that kth nearest neighbors are at some similar distance in a cluster
So we pick a k-value and sort all the points’ distance to that neighbor
Example from class:
k=4 only
DBSCAN Optimization
DBSCAN Optimization
Not so easy to choose a good point on that curve…
Original DBSCAN algorithm authors discussed in their paper that we should eyeball the plot for the first/last “valley”
DBSCAN Observations
DBSCAN does not work well with varying densities… which is also a characteristic of this crime data…
This is observable when we change the order of the core point cluster labeling
Ex: Start w/ high vs. low density score cores?
DBSCAN Observations (cont’d)
The algorithm labels neighbors of the current core point regardless of whether we started a new cluster or not…
Initialize the cluster label to zero, not one
Many different “presentations” of the DBSCAN algorithm…
DBSCAN Algorithm Presentations
Publishable Quality (?)
Future Work
To be useful for a real law enforcement organization much more work needs done
System must handle more than one crime at
Free register and download UseNet downloader, then you can free download from UseNet.
Download "Crime Data Visualization and Spatial Database Management System" from Usenet!Next: Achieving 10 Gb/s Using Xen Para-virtualized Network Drivers
ip:198.82.183.28
refer:-
time:13 month ago
tag:
