main content

generates a multi-凯发k8网页登录

generates a multi-threaded mex file from a matlab function

description

dspunfold file generates a multi-threaded mex file from the entry-point matlab® function specified by file, using the unfolding technology. unfolding is a technique to improve throughput through parallelization. the multi-threaded mex file leverages the multicore cpu architecture of the host computer and can improve speed significantly. in addition to the multi-threaded mex file, the function generates a single-threaded mex file, a self-diagnostic analyzer function, and the corresponding help files.

dspunfold options file generates a multi-threaded mex file from the entry-point matlab function specified by file, using the function arguments specified by options.

note

this function requires a matlab coder™ license.

input arguments

option

values

description

examples

-args arguments

cell array

argument types for the entry-point matlab function, specified as a cell array.

the cell array accepts numeric elements, the coder.typeof function, and the coder.constant function.

the generated multi-threaded mex file is specialized to the size, class, and complexity of arguments.

the number of elements in the cell array must be the same as the number of arguments that the entry-point matlab function expects.

  • dspunfold fcn -args {ones(10,1), 5}

    dspunfold extracts the type (size, class, and complexity) information from the elements in the arguments cell array.

    fcn is the entry-point matlab function.

  • dspunfold fcn -args {coder.typeof(ones(10,1)), coder.typeof(5)}

    coder.typeof is used to specify the types of the fcn arguments.

  • dspunfold fcn -args {coder.constant(ones(10,1)), coder.constant(5)}

  • dspunfold fcn -args {}

    by default, arguments is {}. an empty cell array {} indicates that fcn accepts no input arguments.

-o output

character vector

name of the output multi-threaded mex file, specified as a character vector. if no output name is specified, the name of the generated multi-threaded mex file is inherited from the input matlab function with an '_mt' suffix. dspunfold also adds a platform-specific extension to this name. in addition, dspunfold generates a single-threaded mex file with an '_st' suffix, and a test bench file with an '_analyzer' suffix.

  • no output name specified

    dspunfold fcn

    files generated: fcn_mt.mexw64, fcn_st.mexw64, fcn_analyzer.p

  • output name specified

    dspunfold fcn -o foo

    files generated: foo.mexw64, foo_st.mexw64, foo_analyzer.p

-s statelength

scalar integer greater than or equal to zero

auto

state length of the algorithm in the entry-point matlab function, specified as a scalar integer greater than or equal to zero, or auto. by default, the statelength is zero frames, indicating that the algorithm is stateless.

if at least one entry of frameinputs is true, statelength is considered in samples.

for information on frames and samples, see sample- and frame-based concepts

-s auto triggers automatic state length detection. in this mode, you must provide numeric inputs to the arguments cell array. these inputs detect the state length of the algorithm. you can input coder.constant but not coder.typeof. when automatic state length detection is invoked, it is recommended that you provide random inputs to the arguments array. see automatic state length detection

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [false, false, false]

    state length is three frames.

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [true, false, false]

    state length is three samples. state length is considered in samples, because at least one entry of the -f option is true.

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s auto

    automatic state length detection is invoked.

  • dspunfold fcn -args {coder.typeof (randn(10,1)), coder.typeof(randn(10,1)), coder.typeof(randn(10,1))} -s auto generates this error message: the input argument 1 is of type coder.primitivetype which is not supported when using -s auto

-f frameinputs

scalar logical

vector of logical values

frame status of input arguments for the entry-point matlab function, specified as one of true or false.

  • true — input is in frames and can be subdivided into samples without changing the system behavior.

  • false — input cannot be subdivided into samples without changing the system behavior. for example, you cannot subdivide the coefficients of a filter without changing the characteristics of the filter.

by default, frameinputs is false.

frameinputs set to a scalar logical value sets the frame status of all the inputs simultaneously.

to specify statelength in samples, set at least one entry of frameinputs to true.

if frameinputs is not specified, the unit of statelength is frames.

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f true

    all the inputs are marked as frames. state length is three samples.

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [true, false, false]

    state length is three samples.

  • dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3

    the default value of frameinputs is false. state length is three frames.

-r repetition

positive integer

repetition factor used to generate the multi-threaded mex file, specified as a positive integer. the default value of repetition is 1. see repetition factor.

dspunfold fcn -args {randn(10,2), randn(20,2), randn(30,3)} -r 2

-t threads

positive integer

number of threads used by the multi-threaded mex file, specified as a positive integer. the default value of threads is the number of physical cpu cores present on your machine. see threads.

dspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -t 4

-v verbose

scalar logical

option to show verbose output during code generation, specified as true or false. the default is true.

  • dspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -v true

  • dspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -v false

entry-point matlab function from which dspunfold generates the multi-threaded mex file. the function must support code generation.

example: dspunfold fcn -args {randn(10,1),randn(10,2),randn(20,1)}

fcn is the entry-point matlab function and {randn(10,1),randn(10,2),randn(20,1)} are its input arguments.

output files

when you invoke dspunfold on an entry-point matlab function, dspunfold generates the following files.

file

value

description

examples

multi-threaded mex file

mex file

multi-threaded mex file generated from the entry-point matlab function. the mex file inherits the output name. if no output name is specified, the name of this file is inherited from the matlab function with an '_mt' suffix. a platform-specific extension is also added to the name.

  • dspunfold fcn -o foo generates foo.mexw64

  • dspunfold fcn generates fcn_mt.mexw64

help file for the multi-threaded mex file

matlab file

matlab help file for the multi-threaded mex file. the help file has the same name as the mex file, but with an '.m' extension. to invoke the help file, type help at the matlab command prompt.

this help file displays information on how to invoke the mex file, its syntax, latency, and types (size, class, and complexity) of the inputs to the mex file. in addition, the help file documents the parameters used by dspunfoldthreads, repetition, and state length. this information is useful when you are invoking the mex file. the syntax to invoke the mex file should be the same as the syntax shown in the help file.

  • help foo

  • help fcn_mt

single-threaded mex file

mex file

single-threaded mex file generated from the entry-point matlab function. the mex file inherits the output name with an '_st' suffix. if no output name is specified, the name of this file is inherited from the matlab function with an '_st' suffix. a platform-specific extension is also added to the name. use this file as a benchmark to compare against the speed of the multi-threaded mex file.

  • dspunfold fcn -o foo generates foo_st.mexw64

  • dspunfold fcn generates fcn_st.mexw64

help file for the single-threaded mex file

matlab file

matlab help file for the single-threaded mex file. the help file has the same name as the mex file, but with an '.m' extension. to invoke the help file, type help at the matlab command prompt.

the help file displays information on how to invoke the mex file, its syntax, and types (size, class, and complexity) of the inputs to the mex file. the syntax to invoke the mex file should be the same as the syntax shown in the help file.

  • help foo_st

  • help fcn_st

self-diagnostic analyzer function

p-coded file

report = function_analyzer (input 1, input 2,...input n) measures the difference in speed between the multi-threaded mex file and the single-threaded mex file. this file verifies that the output values match.

report = function_analyzer('latency') reports the latency of the multi-threaded mex file introduced by unfolding.

report contains the following fields:

  • latency — the value of the latency (in frames)

  • speedup — the speedup difference between the multi-threaded mex file and single-threaded mex file. if you specified latency option, the value of this field is empty [].

  • pass — logical value that shows if the outputs match between the generated multi-threaded mex file and the single-threaded mex file. if you specified latency option, the value of this field is empty [].

the first dimension of the analyzer inputs must be a multiple of the first dimension of the corresponding inputs, given to the -args option. the other dimensions must match exactly.

the analyzer inherits the output name with an '_analyzer' suffix. if no output name is specified, the name of this file is inherited from the matlab function with an '_analyzer' suffix.

  • multiple frames with different values are specified along the first dimension

    example 1: report = foo_analyzer(randn(10*2,1), randn(20*2,2), randn(30*3,3))

    example 2: report = foo_analyzer([randn(10,1);randn(10,1)],[randn(20,1);randn(20,1)],[randn(30,1);randn(30,1);randn(30,1)])

  • report = foo_analyzer('latency')

help file for the self-diagnostic analyzer function

matlab file

help file for the self-diagnostic analyzer function. the help file has the same name as the mex file, but with an '.m' extension. to invoke the help file, type help in matlab.

the help file for the self-diagnostic analyzer function displays information on how to invoke the analyzer function, its syntax, and types (size, class, and complexity) of the inputs to the analyzer function. the syntax to invoke the analyzer function should be the same as the syntax shown in the help file.

help foo_analyzer

limitations

general limitations:

  • on windows and linux, you must use a compiler that supports the open multiprocessing (openmp) application interface. see .

  • if you have a macos with an xcode version 12.0 or later, using the dspunfold function is not supported.

  • if the input matlab function has runtime errors, the errors are not caught when you run the multi-threaded mex file. before you use the dspunfold function, call codegen on the matlab function and make sure that the mex file is generated successfully.

  • if the generated code uses a large amount of memory to store the local variables, around 4 mb on windows platform, the generated multi-threaded mex file can have unexpected behavior. this limit varies with each platform. as a workaround, reduce the size of the input signals or restructure the matlab function to use less local memory.

  • dspunfold does not support:

    • and inside the matlab function

    • variable-size inputs and outputs

    • input signals with an arbitrary frame length to system objects that use the decimationfactor property. the input signal is considered to have an arbitrary frame length when its frame length is not a multiple of the decimation factor. when this is the case, the output of the object in the generated code is a variable-size signal, and dspunfold does not support variable-size output signals.

      in the case of the object, you can determine the decimation factor using the function.

    • p-coded entry-point matlab functions

    • cell arrays as inputs and outputs

analyzer limitations:

the following limitations apply to the analyzer function generated by the dspunfold function. for more information on the analyzer function, see 'self-diagnostic analyzer’ in the 'more about' section of dspunfold.

  • if multiple frames of the analyzer input are identical, the analyzer might throw false positive pass results. it is recommended that you provide at least two different frames for each input of the analyzer.

  • if the algorithm in the entry-point matlab function chooses its state length based on the input values, the analyzer might provide different pass results for different input values. for an example, see the fir_mean function in .

  • if the input to the entry-point matlab function does affect the output immediately, the analyzer might throw false positive pass results. for an example, see the input_output function in .

  • if the output results of the multi-threaded mex file and single-threaded mex file match statistically but do not match numerically, the analyzer does not pass. consider the filternoise function that follows, which filters a random noise signal with an fir filter. the function calls randn from within itself to generate random noise. hence, the output results of the filternoise function match statistically but not match numerically.

    function output = filternoise(x)
    persistent firfilter
    if isempty(firfilter)
        firfilter = dsp.firfilter('numerator',fir1(12,0.4));
    end
    output = firfilter(x randn(1000,1));
    end
    
    when you run the automatic state length detection tool run on filternoise, the tool detects an infinite state length. because the tool cannot find a numerical match for a finite state length, it chooses an infinite state length.
    dspunfold filternoise -args {randn(1000,1)} -s auto 
    analyzing input matlab function filternoise
    creating single-threaded mex file filternoise_st.mexw64
    searching for minimal state length (this might take a while)
    checking stateless ... insufficient
    checking 1 ... insufficient
    checking infinite ... sufficient
    checking 2 ... insufficient
    minimal state length is inf
    creating multi-threaded mex file filternoise_mt.mexw64
    warning: the multi-threading was disabled due to performance considerations. 
    this happens when the state length is greater than or
    equal to (threads-1)*repetition frames (3 frames in this case). 
    > in coder.internal.warning (line 8)
      in unfoldingengine/buildparallelsolution (line 25)
      in unfoldingengine/generate (line 207)
      in dspunfold (line 234) 
    creating analyzer file filternoise_analyzer

    the algorithm does not need an infinite state. the state length of the fir filter, hence the algorithm is 12.

    call dspunfold with state length set to 12.

    dspunfold filternoise -args {randn(1000,1)} -s 12 -f true
    analyzing input matlab function filternoise
    creating single-threaded mex file filternoise_st.mexw64
    creating multi-threaded mex file filternoise_mt.mexw64
    creating analyzer file filternoise_analyzer

    run the analyzer function.

    filternoise_analyzer(randn(1000*4,1))
    analyzing multi-threaded mex file filternoise_mt.mexw64  ... 
    latency = 8 frames
    speedup = 0.5x
    warning: the output results of the multi-threaded mex file filternoise_mt.mexw64 do not 
    match the output results of the single-threaded mex file filternoise_st.mexw64. check that 
    you provided the correct state length value to the dspunfold function when you generated the 
    multi-threaded mex file filternoise_mt.mexw64. for best practices and possible solutions to 
    this problem, see the 'tips' section in the dspunfold function reference page. 
    > in coder.internal.warning (line 8)
      in filternoise_analyzer 
    ans = 
        latency: 8
        speedup: 0.4970
           pass: 0

    the analyzer looks for a numerical match and fails the verification, even though the generated multi-threaded mex file is valid.

speedup limitations:

  • if the entry-point matlab function contains code with low complexity, matlab overhead or multi-threaded mex overhead overshadow any performance gains. in such cases, do not use dspunfold.

  • if the number of operations in the input matlab function is small compared to the size of the input or output data, the multi-threaded mex file does not provide any speedup gain. sometimes, it can result in a speedup loss, even if the repetition value is increased. in such cases, do not use dspunfold.

more about

state length

state length of the algorithm.

most of the time, the state length used by dspunfold matches the state length of the algorithm in the entry-point matlab function. if the algorithm is simple, state length is easy to determine. for example, the state length of an fir filter is the number of taps in the filter – 1. in some scenarios, to optimize speedup, dspunfold chooses a state length that is different from the algorithm state length or the state length specified using the -s option. for example, when the state length is greater than (threads – 1) × repetition frames, dspunfold considers the state length to be infinite. also, multi-threading gets disabled due to performance considerations.

automatic state length detection

you can automatically detect the minimum state length for which the outputs of the multi-threaded mex and single-threaded mex match.

in complex algorithms, it is not easy to determine the state length analytically. in such scenarios, use the analyzer to compute the state length. when you set -s to auto, dspunfold invokes the analyzer. the analyzer computes the outputs for different state lengths and detects the minimum state length for which the outputs of the multi-threaded mex file and single-threaded mex file match. the analyzer uses the numeric value of the inputs given to -args. to detect the most efficient state length, provide random inputs to -args. in this mode, you cannot input coder.typeof to arguments. due to the extra analysis this tool requires, the time to generate the mex file increases.

when you use automatic state length detection on an algorithm with code paths that depend on the input values, use inputs that choose the code path with the longest state length. also, the inputs must have an immediate effect on the output. if inputs choose a code path that triggers runtime errors, automatic state length detection stops, and so does the analyzer. make sure that the matlab function supports code generation and does not have run-time errors for the inputs under test. before invoking dspunfold, call codegen on the entry-point matlab function. in addition, simulate the entry-point matlab function to make sure it has no run-time errors.

threads

the -t option specifies the number of threads used by the multi-threaded mex file.

increasing this value can improve the multi-threaded mex speedup, at the cost of a larger latency. decreasing this value reduces the latency and potentially decreases the multi-threaded mex speedup.

repetition factor

repetition factor is the number of consecutive frames processed by each thread in one processing step.

increasing this value reduces the overhead per frame of data, potentially improving the speedup at the cost of larger latency. decreasing this value reduces the latency, and potentially decreases the multi-threaded mex speedup.

self-diagnostic analyzer

the self-diagnostic analyzer function is a help tool that is generated with the mex file. this function measures the speedup gain of the multi-threaded mex file compared to the single-threaded mex file. the analyzer function also verifies that the outputs of the multi-threaded mex file and single-threaded mex file match.

if you specify an incorrect state length value, the outputs usually do not match. to check for the numerical match between the multi-threaded mex file and the single-threaded mex file, provide at least two different frames for each input argument of the analyzer. the frames are appended along the first dimension. the analyzer alternates between these frames while verifying that the outputs match. failure to provide multiple frames for each input can decrease the effectiveness of the analyzer and can lead to false positive verification results. in other words, the analyzer might produce pass = 1 results even when an incorrect state length value is specified. the analyzer alternates through a maximum of 3 × (2 × threads × repetition) frames. if your algorithm requires more than 3 × (2 × threads × repetition) frames to verify the results, then the analyzer cannot verify accurately.

tips

general

  • do not display plots, scopes, or execute other user interface operations from within the multi-threaded mex file. the generated mex file can have unexpected behavior.

  • do not use coder.extrinsic inside the input matlab function. the generated mex file can have unexpected behavior.

when the state length is less than or equal to (threads – 1) × repetition frames:

  • do not use a random number inside the matlab function. the outputs of the single-threaded mex file and the multi-threaded mex file might not match. also, the outputs of the consecutive executions of the multi-threaded mex file might not match. the analyzer might not pass the numerical match verification.

    it is recommended that you generate the random number outside the entry-point matlab function and pass it as an argument to the function.

  • do not use global or persistent variables anywhere other than in the entry-point matlab function. for example, avoid using persistent variables in subfunctions. the generated mex file can produce inaccurate results. in general, global variables are not recommended.

  • do not access i/o resources from within the multi-threaded mex file. the generated mex file can have unexpected behavior. these resources include file writers and readers, udp sockets, and audio players and recorders.

  • do not use functions with interactive inputs (for example, the keyboard) inside the multi-threaded mex file. the generated mex file can have unexpected behavior.

workflow

  • to generate a valid multi-threaded mex file with the required speedup and latency, follow the .

  • before using dspunfold, call codegen on the entry-point matlab function and make sure that the function generates a mex file successfully.

  • after generating the multi-threaded mex file using dspunfold, run the analyzer function. make sure that the analyzer function passes. the exception to this rule is when the algorithm produces results that match statistically, but not numerically. in this exception, the analyzer function does not pass, even though the dspunfold function generates a valid multi-threaded mex file. see 'analyzer limitations' for an example.

  • for help on using the mex file and analyzer, at the matlab command prompt, enter help and help .

state length

  • if you choose a state length that is greater than or equal to the value of the exact state length, the analyzer passes. if the analyzer fails, increase the state length, regenerate the mex file, and verify again.

  • if the state length is greater than 0, the inputs marked as frames (through -f option) must all have the same dimensions.

  • when generating the mex file and running the analyzer, use inputs that invoke the same state length.

automatic state length detection

when you set -s to auto:

  • if the algorithm in the entry-point matlab function chooses a code path based on the input values, use inputs that choose the code path with the longest state length.

  • provide random inputs to -args.

  • choose inputs that have an immediate effect on the output. see .

analyzer

  • make sure the outputs of the multi-threaded mex file and the single-threaded mex file do not contain nan or an inf. the analyzer cannot do numeric checks and returns pass as false. the automatic state length detection tool detects infinite state length and displays a warning

    warning

    the output results of the multi-threaded mex file do not match the output results of the single-threaded mex file even for infinite state length. a possible reason is that input matlab function generates different output results between consecutive runs even for the same input values.

  • provide multiple frames with different values for each input of the analyzer. to improve the analyzer effectiveness, append successive frames along the first dimension.

  • provide inputs to the analyzer that lead to efficient code coverage.

speedup

  • to improve the speedup of the multi-threaded mex file, specify the exact state length in samples. you can specify the state length in samples by setting at least one entry of frameinputs to true. the use of samples reduces the overhead and increases the speedup.

  • to increase the speedup at the cost of larger latency, you can:

    • increase the repetition factor. use the -r option.

    • increase the number of threads. use the -t option.

  • for each input that can be divided into samples without altering the algorithm behavior, set frame status to true using the -f option. the input is then considered in samples, which can increase the speedup of the generated multi-threaded mex file.

algorithms

the multi-threaded mex file buffers multiple-input signal frames into a buffer of 2 × threads × repetition frames, where threads is the number of threads, and repetition is the repetition factor. the mex file processes these frames simultaneously, using multiple cores. this process introduces some deterministic latency, where latency = 2 × threads × repetition. latency is traded off with the speedup you might gain by increasing the number of threads or the repetition factor.

version history

introduced in r2015b

see also

topics

网站地图