All of the distributions that are provided in the Apache Commons Math project are supported here, in multiple forms.
Continuous or Discrete
These distributions break down into two main categories:
Continuous Distributions
These are distributions over real numbers like 23.4323, with continuity across the values. Each of the continuous distributions can provide samples that fall on an interval of the real number line. Continuous probability distributions include the Normal distribution, and the Exponential distribution, among many others.
Discrete Distributions
Discrete distributions, also known as integer distributions have only whole-number valued samples. These distributions include the Binomial distribution, the Zipf distribution, and the Poisson distribution, among others.
Hashed or Mapped
hashed samples
Generally, you will want to "randomly sample" from a probability distribution. This is handled automatically by the
functions below if you do not override the defaults. The hash
mode is the default sampling mode for probability
distributions. This is accomplished by hashing the input before using the resulting value with the sampling curve.
This is called the hash
sampling mode by VirtData. You can put hash
into the modifiers as explained below if you
want to document it explicitly.
mapped samples
The method used to sample from these distributions depends on a mathematical function called the cumulative probability
density function, or more specifically the inverse of it. Having this function computed over some interval allows one to
sample the shape of a distribution progressively if desired. In other words, it allows for some percentile-like view
of values within a given probability distribution. This mode of using the inverse cumulative density function is known
as the map
mode in VirtData, as it allows one to map a unit interval variate in a deterministic way to a density
sampling curve. To enable this mode, simply pass map
as one of the function modifiers for any function in this
category.
Interpolated or Computed Samples
When sampling from mathematical models of probability densities, performance between different densities can vary drastically. This means that you may end up perturbing the results of your test in an unexpected way simply by changing parameters of your testing distributions. Even worse, some densities have painful corner cases in performance, like 'Zipf', which can make tests unbearably slow and flawed as they chew up CPU resources.
NOTE: Functions like 'Zipf' can still take a long time to initialize for certain parameters. If you are seeing a workload that seems to hang while initializing, it might be computing complex integrals for large parameters of Zipf. We hope to pre-compute and cache these at a future time to avoid this type of impact. For now, just be aware that some parameters on some density curves can be expensive to compute even during initialization.
Interpolated Samples
For this reason, interpolation is built-in to these sampling functions. The default mode is interpolate
. This
means that the sampling function is pre-computed over 1000 equidistant points in the unit interval (0.0,1.0), and the
result is shared among all threads as a look-up-table for interpolation. This makes all statistical sampling functions
perform nearly identically at runtime (after initialization, a one time cost). This does have the minor side effect of a
little loss in accuracy, but the difference is generally negligible for nearly all performance testing cases.
Infinite or Finite
For interpolated samples from continuous distributions, you also have the option of including or
excluding infinite values which may occur in some distributions. If you want to include them,
use infinite
, or finite
to explicitly avoid them (the default). Specifying 'infinite'
doesn't guarantee that you will see +Infinity or -Infinity, only that they are allowed. The
Normal distribution often contains -Infinity and +Infinity, for example, due to the function
used to estimate its cumulative distribution. These values can often be valuable in finding
corner cases which should be treated uniformly according to
IEEE 754.
Clamp or Noclamp
For interpolated samples from continuous distributions, you also have the option of clamping the
allowed values to the valid range for the integral data type used as input. To clamp the output
values to the range (Long.MIN_VALUE,Long.MAX_VALUE) for long->double functions, or to (Integer.
MIN_VALUE,Integer.MAX_VALUE) for int-double functions, specify clamp
, which is also the default.
To explicitly disable this, use noclamp
. This is useful when you know the downstream functions
will only work with a certain range of values without truncating conversions. When you are using
double values natively on the downstream functions, use noclamp
to avoid limiting the domain of
values in your test data. (In this case, you might also consider infinite
).
Computed Samples
Conversely, compute
mode sampling calls the sampling function every time a sample is needed. This affords a little
more accuracy, but is generally not preferable to the default interpolated mode. You'll know if you need computed
samples. Otherwise, it's best to stick with interpolation so that you spend more time testing your target system and
less time testing your data generation functions.
Input Range
All of these functions take a long as the input value for sampling. This is similar to how the unit interval (0.0,1.0) is used in mathematics and statistics, but more tailored to modern system capabilities. Instead of using the unit interval, we simply use the interval of all positive longs. This provides more compatibility with other functions in VirtData, including hashing functions. Internally, this value is automatically converted to a unit interval variate as needed to work well with the distributions from Apache Math.
Beta
@see Wikipedia: Beta distribution @see Commons JavaDoc: BetaDistribution
-
long -> Beta(double: alpha, double: beta, String[]...: mods) -> double
-
int -> Beta(double: alpha, double: beta, String[]...: mods) -> double
Binomial
@see Wikipedia: Binomial distribution @see Commons JavaDoc: BinomialDistribution
-
int -> Binomial(int: trials, double: p, String[]...: modslist) -> int
-
long -> Binomial(int: trials, double: p, String[]...: modslist) -> long
-
int -> Binomial(int: trials, double: p, String[]...: modslist) -> long
-
long -> Binomial(int: trials, double: p, String[]...: modslist) -> int
Cauchy
@see Wikipedia: Cauchy_distribution @see Commons Javadoc: CauchyDistribution
-
long -> Cauchy(double: median, double: scale, String[]...: mods) -> double
-
int -> Cauchy(double: median, double: scale, String[]...: mods) -> double
ChiSquared
@see Wikipedia: Chi-squared distribution @see Commons JavaDoc: ChiSquaredDistribution
-
long -> ChiSquared(double: degreesOfFreedom, String[]...: mods) -> double
-
int -> ChiSquared(double: degreesOfFreedom, String[]...: mods) -> double
CoinFunc
This is a higher-order function which takes an input value, and flips a coin. The first parameter is used as the threshold for choosing a function. If the sample values derived from the input is lower than the threshold value, then the first following function is used, and otherwise the second is used. For example, if the threshold is 0.23, and the input value is hashed and sampled in the unit interval to 0.43, then the second of the two provided functions will be used. The input value does not need to be hashed beforehand, since the user may need to use the full input value before hashing as the input to one or both of the functions. This function will accept either a LongFunction or a {@link Function} or a LongUnaryOperator in either position. If necessary, use {@link function.ToLongFunction} to adapt other function forms to be compatible with these signatures.
- Long -> CoinFunc(double: threshold, Object: first, Object: second) -> Object
- example:
CoinFunc(0.15,NumberNameToString(),Combinations('A:1:B:23'))
- use the first function 15% of the time
- example:
ConstantContinuous
Always yields the same value. @see Commons JavaDoc: ConstantContinuousDistribution
-
long -> ConstantContinuous(double: value, String[]...: mods) -> double
-
int -> ConstantContinuous(double: value, String[]...: mods) -> double
Enumerated
Creates a probability density given the values and optional weights provided, in "value:weight value:weight ..." form. The weight can be elided for any value to use the default weight of 1.0d. @see Commons JavaDoc: EnumeratedRealDistribution
-
int -> Enumerated(String: data, String[]...: mods) -> double
- example:
Enumerated('1 2 3 4 5 6')
- a fair six-sided die roll
- example:
Enumerated('1:2.0 2 3 4 5 6')
- an unfair six-sided die roll, where 1 has probability mass 2.0, and everything else has only 1.0
- example:
-
long -> Enumerated(String: data, String[]...: mods) -> double
- example:
Enumerated('1 2 3 4 5 6')
- a fair 6-sided die
- example:
Enumerated('1:2.0 2 3 4 5:0.5 6:0.5')
- an unfair fair 6-sided die, where ones are twice as likely, and fives and sixes are half as likely
- example:
Exponential
@see Wikipedia: Exponential distribution @see Commons JavaDoc: ExponentialDistribution
-
long -> Exponential(double: mean, String[]...: mods) -> double
-
int -> Exponential(double: mean, String[]...: mods) -> double
F
@see Wikipedia: F-distribution @see Commons JavaDoc: FDistribution @see Mathworld: F-Distribution
-
int -> F(double: numeratorDegreesOfFreedom, double: denominatorDegreesOfFreedom, String[]...: mods) -> double
-
long -> F(double: numeratorDegreesOfFreedom, double: denominatorDegreesOfFreedom, String[]...: mods) -> double
Gamma
@see Wikipedia: Gamma distribution @see Commons JavaDoc: GammaDistribution
-
long -> Gamma(double: shape, double: scale, String[]...: mods) -> double
-
int -> Gamma(double: shape, double: scale, String[]...: mods) -> double
Geometric
@see Wikipedia: Geometric distribution @see Commons JavaDoc: GeometricDistribution
-
int -> Geometric(double: p, String[]...: modslist) -> int
-
long -> Geometric(double: p, String[]...: modslist) -> long
-
int -> Geometric(double: p, String[]...: modslist) -> long
-
long -> Geometric(double: p, String[]...: modslist) -> int
Gumbel
@see Wikipedia: Gumbel distribution @see Commons JavaDoc: GumbelDistribution
-
long -> Gumbel(double: mu, double: beta, String[]...: mods) -> double
-
int -> Gumbel(double: mu, double: beta, String[]...: mods) -> double
Hypergeometric
@see Wikipedia: Hypergeometric distribution @see Commons JavaDoc: HypergeometricDistribution
-
int -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String[]...: modslist) -> int
-
int -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String[]...: modslist) -> long
-
long -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String[]...: modslist) -> long
-
long -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String[]...: modslist) -> int
Laplace
@see Wikipedia: Laplace distribution @see Commons JavaDoc: LaplaceDistribution
-
long -> Laplace(double: mu, double: beta, String[]...: mods) -> double
-
int -> Laplace(double: mu, double: beta, String[]...: mods) -> double
Levy
@see Wikipedia: Lévy distribution @see Commons JavaDoc: LevyDistribution
-
int -> Levy(double: mu, double: c, String[]...: mods) -> double
-
long -> Levy(double: mu, double: c, String[]...: mods) -> double
LogNormal
@see Wikipedia: Log-normal distribution @see Commons JavaDoc: LogNormalDistribution
-
long -> LogNormal(double: scale, double: shape, String[]...: mods) -> double
-
int -> LogNormal(double: scale, double: shape, String[]...: mods) -> double
Logistic
@see Wikipedia: Logistic distribution @see Commons JavaDoc: LogisticDistribution
-
int -> Logistic(double: mu, double: scale, String[]...: mods) -> double
-
long -> Logistic(double: mu, double: scale, String[]...: mods) -> double
Nakagami
@see Wikipedia: Nakagami distribution @see Commons JavaDoc: NakagamiDistribution
-
long -> Nakagami(double: mu, double: omega, String[]...: mods) -> double
-
int -> Nakagami(double: mu, double: omega, String[]...: mods) -> double
Normal
@see Wikipedia: Normal distribution @see Commons JavaDoc: NormalDistribution
-
long -> Normal(double: mean, double: sd, String[]...: mods) -> double
-
int -> Normal(double: mean, double: sd, String[]...: mods) -> double
Pareto
@see Wikipedia: Pareto distribution @see Commons JavaDoc: ParetoDistribution
-
int -> Pareto(double: scale, double: shape, String[]...: mods) -> double
-
long -> Pareto(double: scale, double: shape, String[]...: mods) -> double
Pascal
@see Commons JavaDoc: PascalDistribution @see Wikipedia: Negative binomial distribution
-
long -> Pascal(int: r, double: p, String[]...: modslist) -> int
-
int -> Pascal(int: r, double: p, String[]...: modslist) -> long
-
long -> Pascal(int: r, double: p, String[]...: modslist) -> long
-
int -> Pascal(int: r, double: p, String[]...: modslist) -> int
Poisson
@see Wikipedia: Poisson distribution @see Commons JavaDoc: PoissonDistribution
-
long -> Poisson(double: p, String[]...: modslist) -> long
-
int -> Poisson(double: p, String[]...: modslist) -> long
-
int -> Poisson(double: p, String[]...: modslist) -> int
-
long -> Poisson(double: p, String[]...: modslist) -> int
T
@see Wikipedia: Student's t-distribution @see Commons JavaDoc: TDistribution
-
int -> T(double: degreesOfFreedom, String[]...: mods) -> double
-
long -> T(double: degreesOfFreedom, String[]...: mods) -> double
Triangular
@see Wikipedia: Triangular distribution @see Commons JavaDoc: TriangularDistribution
-
long -> Triangular(double: a, double: c, double: b, String[]...: mods) -> double
-
int -> Triangular(double: a, double: c, double: b, String[]...: mods) -> double
Uniform
@see Wikipedia: Uniform distribution (continuous) @see Commons JavaDoc: UniformContinuousDistribution
-
int -> Uniform(double: lower, double: upper, String[]...: mods) -> double
-
long -> Uniform(int: lower, int: upper, String[]...: modslist) -> int
-
int -> Uniform(int: lower, int: upper, String[]...: modslist) -> int
-
int -> Uniform(int: lower, int: upper, String[]...: modslist) -> long
-
long -> Uniform(int: lower, int: upper, String[]...: modslist) -> long
-
long -> Uniform(double: lower, double: upper, String[]...: mods) -> double
Weibull
@see Wikipedia: Weibull distribution @see Wolfram Mathworld: Weibull Distribution @see Commons Javadoc: WeibullDistribution
-
int -> Weibull(double: alpha, double: beta, String[]...: mods) -> double
-
long -> Weibull(double: alpha, double: beta, String[]...: mods) -> double
WeightedFuncs
Allows for easy branching over multiple functions with specific weights.
- long -> WeightedFuncs(Object[]...: weightsAndFuncs) -> Object
Zipf
@see Wikipedia: Zipf's Law @see Commons JavaDoc: ZipfDistribution
-
long -> Zipf(int: numberOfElements, double: exponent, String[]...: modslist) -> long
-
int -> Zipf(int: numberOfElements, double: exponent, String[]...: modslist) -> long
-
long -> Zipf(int: numberOfElements, double: exponent, String[]...: modslist) -> int
-
int -> Zipf(int: numberOfElements, double: exponent, String[]...: modslist) -> int