analyze large data in database using tall arrays -凯发k8网页登录
this example determines the minimum arrival delay of a large set of flight data that is stored in a database. you can access large data sets and create a tall array using a databasedatastore
object with database toolbox™. once a tall array exists, you can visualize data in the tall array. alternatively, you can write a mapreduce algorithm that defines the chunking and reduction of the data.
the databasedatastore
object does not support using a parallel pool with parallel computing toolbox™ installed. to analyze data using tall arrays or run mapreduce algorithms, set the global execution environment to be the local matlab® session.
this example uses a preconfigured jdbc data source to create the database connection. for more information, see the function.
create databasedatastore
object
set the global execution environment to be the local matlab® session.
mapreducer(0);
the file airlinesmall.csv
contains the large set of flight data. load this file into the microsoft® sql server® database table airlinesmall
. this table contains 123,523 records.
create a database connection to the jdbc data source mssqlserverjdbcauth
. this data source configures a jdbc driver to a microsoft® sql server® database with windows® authentication. specify a blank user name and password.
datasource = "mssqlserverjdbcauth"; username = ""; password = ""; conn = database(datasource,username,password);
create a databasedatastore
object using the database connection and an sql query. this sql query retrieves arrival-delay data from the airlinesmall
table. databasedatastore
executes the sql query.
sqlquery = 'select arrdelay from airlinesmall'; dbds = databasedatastore(conn,sqlquery,'readsize',50000);
find minimum arrival delay using tall array
because the databasedatastore
object returns a table, create a tall table.
tt = tall(dbds);
find the minimum arrival delay.
minarrdelay = min(tt.arrdelay);
minarrdelay
contains the unevaluated minimum arrival delay. to return the output value, use gather
. for details, see .
minarrdelayvalue = gather(minarrdelay)
evaluating tall expression using the local matlab session: - pass 1 of 1: completed in 1.6 sec evaluation completed in 1.9 sec minarrdelayvalue = -64
in addition to determining a minimum, tall arrays support many other functions. for details, see .
close databasedatastore
object and database connection
close(dbds)
see also
| | | | | |