density-凯发k8网页登录
density-based algorithm for clustering data
since r2021a
description
clusterdbscan
clusters data points belonging to a
p-dimensional feature space using the density-based spatial clustering of
applications with noise (dbscan) algorithm. the clustering algorithm assigns points that are
close to each other in feature space to a single cluster. for example, a radar system can
return multiple detections of an extended target that are closely spaced in range, angle, and
doppler. clusterdbscan
assigns these detections to a single detection.
the dbscan algorithm assumes that clusters are dense regions in data space separated by regions of lower density and that all dense regions have similar densities.
to measure density at a point, the algorithm counts the number of data points in a neighborhood of the point. a neighborhood is a p-dimensional ellipse (hyperellipse) in the feature space. the radii of the ellipse are defined by the p-vector ε. ε can be a scalar, in which case, the hyperellipse becomes a hypersphere. distances between points in feature space are calculated using the euclidean distance metric. the neighborhood is called an ε-neighborhood. the value of ε is defined by the
epsilon
property.epsilon
can either be a scalar or p-vector:a vector is used when different dimensions in feature space have different units.
a scalar applies the same value to all dimensions.
clustering starts by finding all core points. if a point has a sufficient number of points in its ε-neighborhood, the point is called a core point. the minimum number of points required for a point to become a core point is set by the
minnumpoints
property.the remaining points in the ε-neighborhood of a core point can be core points themselves. if not, they are border points. all points in the ε-neighborhood are called directly density reachable from the core point.
if the ε-neighborhood of a core point contains other core points, the points in the ε-neighborhoods of all the core points merge together to form a union of ε-neighborhoods. this process continues until no more core points can be added.
all points in the union of ε-neighborhoods are density reachable from the first core point. in fact, all points in the union are density reachable from all core points in the union.
all points in the union of ε-neighborhoods are also termed density connected even though border points are not necessarily reachable from each other. a cluster is a maximal set of density-connected points and can have an arbitrary shape.
points that are not core or border points are noise points. they do not belong to any cluster.
the
clusterdbscan
object can estimate ε using a k-nearest neighbor search, or you can specify values. to let the object estimate ε, set theepsilonsource
property to'auto'
.the
clusterdbscan
object can disambiguate data containing ambiguities. range and doppler are examples of possibly ambiguous data. setenabledisambiguation
property totrue
to disambiguate data.
to cluster detections:
create the
clusterdbscan
object and set its properties.call the object with arguments, as if it were a function.
to learn more about how system objects work, see what are system objects?
creation
description
creates a
clusterer
= clusterdbscanclusterdbscan
object, clusterer
, object with
default property values.
creates a clusterer
= clusterdbscan(name,value)clusterdbscan
object, clusterer
, with each
specified property name
set to the specified
value
. you can specify additional name-value pair arguments in any
order as
(name1
,value1
,...,namen
,valuen
).
any unspecified properties take default values. for example,
clusterer = clusterdbscan('minnumpoints',3,'epsilon',2, ... 'enabledisambiguation',true,'ambiguousdimension',[1 2]);
enabledisambiguation
property set to
true and the ambiguousdimension
set to
[1,2]
.properties
usage
syntax
description
[
also returns an alternate set of cluster ids, idx
,clusterids
] = clusterer(x
)clusterids
, for use in
the and objects. clusterids
assigns a
unique id to each noise point.
[___] = clusterer(
automatically estimates epsilon from the input data matrix, x
,update
)x
, when
update
is set to true
. the estimation uses a
k-nn search to create a set of search curves. for more information,
see estimate epsilon. the estimate is an
average of the l most recent epsilon values where l
is specified in epsilonhistorylength
to enable this syntax, set the epsilonsource
property to
'auto'
, optionally set the maxnumpoints
property, and also optionally set the epsilonhistorylength
property.
input arguments
output arguments
object functions
to use an object function, specify the
system object™ as the first input argument. for
example, to release system resources of a system object named obj
, use
this syntax:
release(obj)
examples
algorithms
references
[1] ester m., kriegel h.-p., sander j., and xu x. "a density-based algorithm for discovering clusters in large spatial databases with noise". proc. 2nd int. conf. on knowledge discovery and data mining, portland, or, aaai press, 1996, pp. 226-231.
[2] erich schubert, jörg sander, martin ester, hans-peter kriegel, and xiaowei xu. 2017. "dbscan revisited, revisited: why and how you should (still) use dbscan". acm trans. database syst. 42, 3, article 19 (july 2017), 21 pages.
[3] dominik kellner, jens klappstein and klaus dietmayer, "grid-based dbscan for clustering extended objects in radar data", 2012 ieee intelligent vehicles symposium.
[4] thomas wagner, reinhard feger, and andreas stelzer, "a fast grid-based clustering algorithm for range/doppler/doa measurements", proceedings of the 13th european radar conference.
[5] mihael ankerst, markus m. breunig, hans-peter kriegel, jörg sander, "optics: ordering points to identify the clustering structure", proc. acm sigmod’99 int. conf. on management of data, philadelphia pa, 1999.
extended capabilities
version history
introduced in r2021a
see also
| |