main content

analyze large data in database using tall arrays -凯发k8网页登录

this example determines the minimum arrival delay of a large set of flight data that is stored in a database. you can access large data sets and create a tall array using a databasedatastore object with database toolbox™. once a tall array exists, you can visualize data in the tall array. alternatively, you can write a mapreduce algorithm that defines the chunking and reduction of the data.

the databasedatastore object does not support using a parallel pool with parallel computing toolbox™ installed. to analyze data using tall arrays or run mapreduce algorithms, set the global execution environment to be the local matlab® session.

this example uses a preconfigured jdbc data source to create the database connection. for more information, see the function.

create databasedatastore object

set the global execution environment to be the local matlab® session.

mapreducer(0);

the file airlinesmall.csv contains the large set of flight data. load this file into the microsoft® sql server® database table airlinesmall. this table contains 123,523 records.

create a database connection to the jdbc data source mssqlserverjdbcauth. this data source configures a jdbc driver to a microsoft® sql server® database with windows® authentication. specify a blank user name and password.

datasource = "mssqlserverjdbcauth";
username = "";
password = "";
conn = database(datasource,username,password);

create a databasedatastore object using the database connection and an sql query. this sql query retrieves arrival-delay data from the airlinesmall table. databasedatastore executes the sql query.

sqlquery = 'select arrdelay from airlinesmall';
dbds = databasedatastore(conn,sqlquery,'readsize',50000);

find minimum arrival delay using tall array

because the databasedatastore object returns a table, create a tall table.

tt = tall(dbds);

find the minimum arrival delay.

minarrdelay = min(tt.arrdelay);

minarrdelay contains the unevaluated minimum arrival delay. to return the output value, use gather. for details, see .

minarrdelayvalue = gather(minarrdelay)
evaluating tall expression using the local matlab session:
- pass 1 of 1: completed in 1.6 sec
evaluation completed in 1.9 sec
minarrdelayvalue =
   -64

in addition to determining a minimum, tall arrays support many other functions. for details, see .

close databasedatastore object and database connection

close(dbds)

see also

| | | | | |

related examples

    more about

      external websites

        网站地图