main content

file storage shared by matlab clients and workers -凯发k8网页登录

file storage shared by matlab clients and workers

since r2022a

description

filestore is an object that stores files owned by a specific job. each entry of the object consists of a file and its corresponding key. when the owning job is deleted, the filestore object is deleted as well. use filestore to store files from matlab® workers that can be retrieved by matlab clients during the execution of a job (even while the job is still running).

  • any matlab process client or worker can write an entry to the filestore at any time. any matlab process client or worker can then read this entry from the filestore at any time. however, the ordering of operations executed by different processes is not guaranteed.

  • filestore can be used to return files when a cluster has no shared file system, or to run code that is not concerned about the location of any shared file system.

  • filestore is not held in system memory, so it can be used to store large results.

creation

the filestore object is automatically created when you create:

  • a job on a cluster, which is a object. to create a job, use the , , or function.

  • a parallel pool of process workers on the local machine, which is a object. to create a process pool, use the parpool function.

  • a parallel pool of workers on a cluster of machines, which is a object. to create a cluster pool, use the parpool function.

you can access the filestore object on a worker by using the function. you can then retrieve the filestore object on a client by using the filestore property that is associated with the job or the parallel pool. for example, see run batch job and retrieve files from workers.

properties

callback executed when an entry is added or replaced, specified as a . the function handle must accept two input arguments that represent the filestore object and its key when an entry is added or replaced.

callback executed when an entry is removed, specified as a . the function handle must accept two input arguments that represent the filestore object and its key when an entry is removed.

object functions

determine if valuestore or filestore object contains keys
return all keys of valuestore or filestore object
copy files from local file system to filestore object
copy files from filestore object to local file system
remove entries from valuestore or filestore object

examples

run a simulation on workers and retrieve the file storage of the job on a client. the file storage is a filestore object with key-file entries.

the following simulation finds the average and standard deviation of random matrices and stores the results in the filestore object.

type workerstatscode
function workerstatscode(models)
% get the filestore of the current job
store = getcurrentfilestore;
for i = 1:numel(models)
    % compute the average and standard deviation of random matrices
    a = rand(models(i));
    m = mean(a);
    s = std(a);
    % save simulation results in temporary files
    sourcetempfile = strcat(tempname("c:\mytempfolder"),".mat");
    save(sourcetempfile,"m","s");
    % copy files to filestore object as key-file pairs
    key = strcat("result_",num2str(i));
    copyfiletostore(store,sourcetempfile,key);
end
end

the following callback function is executed when a file is copied to the filestore object.

type filenewentry
function filenewentry(store,key)
   destination = strcat(key,".mat");
   fprintf("result %s added. copying to local file system: %s\n",key,destination);
   copyfilefromstore(store,key,destination);
end

run a batch job on workers using the default cluster profile.

models = [4,8,32,20];
c = parcluster;
job = batch(c,@workerstatscode,0,{models});

retrieve the filestore object on the client while the job is still running. show the progress of the job.

store = job.filestore;
store.keyupdatedfcn = @filenewentry;
wait(job);
result result_1 added. copying to local file system: result_1.mat
result result_2 added. copying to local file system: result_2.mat
result result_3 added. copying to local file system: result_3.mat
result result_4 added. copying to local file system: result_4.mat

display all the information on the variables stored in the file "result_3.mat".

whos -file 'result_3.mat'
  name      size            bytes  class     attributes
  m         1x32              256  double              
  s         1x32              256  double              

run a simulation on a parallel pool of process workers and retrieve the file storage on a client.

the following simulation finds the average and standard deviation of random matrices and stores the results in the filestore object.

type workerstatscode
function workerstatscode(models)
% get the filestore of the current job
store = getcurrentfilestore;
for i = 1:numel(models)
    % compute the average and standard deviation of random matrices
    a = rand(models(i));
    m = mean(a);
    s = std(a);
    % save simulation results in temporary files
    sourcetempfile = strcat(tempname("c:\mytempfolder"),".mat");
    save(sourcetempfile,"m","s");
    % copy files to filestore object as key-file pairs
    key = strcat("result_",num2str(i));
    copyfiletostore(store,sourcetempfile,key);
end
end

the following callback function is executed when a file is copied to the filestore object.

type filenewentry
function filenewentry(store,key)
   destination = strcat(key,".mat");
   fprintf("result %s added. copying to local file system: %s\n",key,destination);
   copyfilefromstore(store,key,destination);
end

start a parallel pool of process workers.

pool = parpool('processes');
starting parallel pool (parpool) using the 'processes' profile ...
connected to parallel pool with 6 workers.

get the filestore for this pool and assign the callback function to be executed when an entry is added.

store = pool.filestore;
store.keyupdatedfcn = @filenewentry;

run the simulation on the pool.

models = [4,8,32,20];
future = parfeval(@workerstatscode,0,models);
wait(future);
result result_1 added. copying to local file system: result_1.mat
result result_2 added. copying to local file system: result_2.mat
result result_3 added. copying to local file system: result_3.mat
result result_4 added. copying to local file system: result_4.mat

display the variables stored in the local file result_3.mat.

whos -file 'result_3.mat'
  name      size            bytes  class     attributes
  m         1x32              256  double              
  s         1x32              256  double              

run a job of independent tasks. then, retrieve the data and file storage of the job on a client.

the following simulation finds the permutations and combinations of a vector, and stores the results in the valuestore and filestore objects.

type taskfunction
function taskfunction(dataset,keyname)
% get the valuestore and filestore of the current job
valuestore = getcurrentvaluestore;
filestore = getcurrentfilestore;
% run the simulation to find permutation and combination
[result,logfile] = runsimulation(dataset);
% store results in valuestore to release system memory
valuestore(keyname) = result;
% copy file to filestore to retrieve the file from non-shared file system
copyfiletostore(filestore,logfile,keyname);
end
function [result,logfile] = runsimulation(dataset)
    permutations = perms(dataset{1});
    combinations = nchoosek(dataset{1},dataset{2});
    result.n_perm = length(permutations);
    result.n_comb = length(combinations);
    logfile = strcat(tempname("c:\mylogfolder"),".mat");
    save(logfile,"permutations","combinations")
end

create a job using the default cluster profile.

c = parcluster;
job = createjob(c);

create independent tasks for the job. each task runs the simulation with the given input.

set_1 = {[12,34,54],2};
set_2 = {[45,33],1};
set_3 = {[12,12,12,13,14],3};
tasks = createtask(job,@taskfunction,0,{{set_1,"sim_1"},{set_2,"sim_2"},{set_3,"sim_3"}});

run the job and wait for it to finish.

submit(job);
wait(job);

retrieve the data and file storage of the job.

valuestore = job.valuestore;
filestore = job.filestore;

show the result of the third task that is stored in the valuestore object.

result_3 = valuestore("sim_3")
result_3 = struct with fields:
    n_perm: 120
    n_comb: 10

copy files from the file storage as specified by the corresponding keys "sim_1" and "sim_2" to the local files "analysis_1.mat" and "analysis_2.mat".

copyfilefromstore(filestore,["sim_1" "sim_2"],["analysis_1.mat" "analysis_2.mat"]);

display all the information on the variables stored in the local files.

whos -file 'analysis_1.mat'
  name              size            bytes  class     attributes
  combinations      3x2                48  double              
  permutations      6x3               144  double              
whos -file 'analysis_2.mat'
  name              size            bytes  class     attributes
  combinations      2x1                16  double              
  permutations      2x2                32  double              

limitations

  • when using parallel.cluster.generic clusters with 'hassharedfilesystem' set to false, the visibility of modifications made to filestore while a job is running depends on your specific implementation. without additional synchronization between the matlab client and worker jobstoragelocation, changes might only be visible once the job has completed.

version history

introduced in r2022a

see also

| | | |

网站地图