Algorithms API
base_algorithm
This file contains the interface for the base algorithm used in the package.
SingularSubspaceAlgorithm
Bases: Algorithm
Source code in changepoynt/algorithms/base_algorithm.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
covered_regions()
This function returns the size of the region of interest (ROI). The ROI is the total amount of samples covered for the score detection.
The covered region depends on the window size, the number of windows, and the lag between the two Hankel matrices (past and future Hankel matrices).
The function return two different sizes:
1) total_region: The total region covered by both matrices with lag. This is also the minimum number of samples that a time series must have before the subspace method can run.
2) matrix_region: The region covered by one Hankel matrix. Essentially the region that is sampled for subsequences that are used to extract the characteristics
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
total_region: int, matrix_region: int |
Source code in changepoynt/algorithms/base_algorithm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
estimate_runtime(signal, steps=30, verbose=False)
This function estimates the runtime for a given signal by running the initial steps.
Singular subspace algorithms grow linearly with the signal length of the time series, the overall runtime is number of steps times signal length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signal
|
ndarray
|
The signal that you will run the algorithm on. |
required |
steps
|
int
|
The number of estimation steps. Increasing this number increases the runtime but also increases the accuracy of the estimation. |
30
|
verbose
|
bool
|
If true, prints out the runtime of the algorithm. |
False
|
Returns:
| Type | Description |
|---|---|
[float, float]
|
Runtime estimation in seconds, standard deviation in seconds |
Source code in changepoynt/algorithms/base_algorithm.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
baseline
MovingWindow
Bases: Algorithm
This class implements moving window algorithms that are as simple as possible for sanity checking.
This class implements a similar ideas as in: Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress Renjie Wu and Eamonn J. Keogh IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2023
Source code in changepoynt/algorithms/baseline.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
fit(time_series)
Check for the properties of the input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
|
required |
Returns:
| Type | Description |
|---|---|
None
|
|
Source code in changepoynt/algorithms/baseline.py
67 68 69 70 71 72 73 74 75 76 77 78 79 | |
ZERO
Bases: Algorithm
This class implements the ZERO baseline algorithm.
"An Evaluation of Change Point Detection Algorithms" Gerrit J. J. van den Burg and Christopher K. I. Williams arXiv preprint arXiv:2003.06222, 2020
Source code in changepoynt/algorithms/baseline.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
sst
SST
Bases: SingularSubspaceAlgorithm
This class implements all the utility and functionality necessary to compute the SST change point detection algorithm as described in:
[1] Idé, Tsuyoshi, and Keisuke Inoue. "Knowledge discovery from heterogeneous dynamic systems using change-point correlations." Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2005.
[2] Idé, Tsuyoshi, and Koji Tsuda. "Change-point detection using krylov subspace learning." Proceedings of the 2007 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2007.
It also uses a technique called randomized singular value decomposition which is surveyed and described in
[3] Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 53.2 (2011): 217-288.
For the option fast_hankel=True it uses and algorithm based on
[4] L. Weber and R. Lenz. "Accelerating Singular Spectrum Transformation for Scalable Change Point Detection," in IEEE Access, Volume 11, 2025. doi: 10.1109/ACCESS.2025.3640386.
There will be a parameter specifying, whether to use the implicit krylov approximation from [2]. This significantly speeds up the computation but can reduce accuracy as the "eigensignals" are only approximated indirectly using a krylov subspace with the possible change point signal as seed. This method also deploys a feedback of the dominant "eigensignal" as seed into the power method (dominant eigenvector estimation) with additive gaussian noise.
!NOTE!: Most computational heavy functions are implemented as standalone functions, even if they require instance variables as this enables us to use jit compiled code provided by the numba compiler for faster calculations.
Also, we decided to do type hinting but not type checking as it requires too much boilerplate code. We recommend to input the specified types as the program will break in unforeseen ways otherwise.
Source code in changepoynt/algorithms/sst.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | |
__init__(window_length, n_windows=None, lag=None, rank=5, scale=True, method='ika', lanczos_rank=None, random_rank=None, feedback_noise_level=0.001, scoring_step=1, use_fast_hankel=False, mitigate_offset=False)
Initializing the Singular Spectrum Transformation (SST) requires setting a lot of parameters. See the parameters explanation for some intuition into the right choices. Currently, there are two SST methods from [1] and [2] available for use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
This specifies the length of the time series (in samples), which will be used to extract the representative "eigensignals" to be compared before and after the lag. The window length should be big enough to cover any wanted patterns (e.g., bigger than the periodicity of periodic signals). |
required |
n_windows
|
int
|
This specifies the number of consecutive time windows used to extract the "eigensignals" from the given time series. It should be big enough to cover different parts of the target behavior. If one does not specify this value, we use the rule of thumb and take as many time windows as you specified the length of the window (rule of thumb). |
None
|
lag
|
int
|
This parameter specifies the distance of the comparison for behaviors. In easy terms it tells the algorithms how far it should look into the future to find change in behavior to the current signal. If you do not specify this parameter, we use a rule of thumb and look ahead half of the window length you specified to cover the behavior. |
None
|
rank
|
int
|
This parameter specifies the amount of "eigensignals" which will be used to measure the dissimilarity of the signal in the future behavior. As a rule of thumb, we take the five most dominant "eigensignals" if you do not specify otherwise. |
5
|
scale
|
bool
|
Due to numeric stability, we REALLY RECOMMEND scaling the signal into a restricted value range. Per default, we use a min max scaling to ensure a restricted value range. In the presence of extreme outliers, this could cause problems, as the signal will be squished. |
True
|
method
|
str
|
Currently, "svd" [1], "ika" [2] and "rsvd" are available. ika corresponds to IKA-SST in [2] which will speed up the computation significantly. |
'ika'
|
lanczos_rank
|
int
|
In order to use the implicit approximation of "eigensignals" by using "ika" [2] method, one needs to decide the rank of the implicit approximation of each "eigensignal". As a rule of thumb, we determine this value as being twice the amount of specified "eigensignals" to span the subspace (parameter rank). This is also recommended by the author of IKA-SST in [2]. |
None
|
random_rank
|
int
|
In order to use the randomized singular value decomposition, one needs to provide a randomized rank (size of second matrix dimension for the randomized matrix) as specified in [3]. The lower this value, the faster the computation but the higher the error (as the approximation gets worse). |
None
|
feedback_noise_level
|
float
|
This specifies the amplitude of additive white gaussian noise added to the dominant "eigensignal" of the future behavior when shifting forward. This idea is noted in [2] and initializes the seed of the power method for dominant eigenvector estimation with the precious dominant eigenvector plus the noise level specified here. The noise level should just be a small fraction of the value range of the signal. |
0.001
|
scoring_step
|
int
|
the distance between scoring steps in samples (e.g., 2 would half the computation). |
1
|
use_fast_hankel
|
bool
|
Use the O(N*logN) version for the decomposition. This is likely to have large speed improvements for window sizes > 150 |
False
|
mitigate_offset
|
bool
|
Use a sliding mean window to mitigate the constant offset of time series. |
False
|
Source code in changepoynt/algorithms/sst.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
covered_regions()
This function returns the size of the region of interest (ROI). The ROI is the total amount of samples covered for the score detection.
The covered region depends on the window size, the number of windows, and the lag between the two Hankel matrices (past and future Hankel matrices).
The function return two different sizes:
1) total_region: The total region covered by both matrices with lag. This is also the minimum number of samples that a time series must have before the subspace method can run.
2) matrix_region: The region covered by one Hankel matrix. Essentially the region that is sampled for subsequences that are used to extract the characteristics
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
total_region: int, matrix_region: int |
Source code in changepoynt/algorithms/base_algorithm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
estimate_runtime(signal, steps=30, verbose=False)
This function estimates the runtime for a given signal by running the initial steps.
Singular subspace algorithms grow linearly with the signal length of the time series, the overall runtime is number of steps times signal length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signal
|
ndarray
|
The signal that you will run the algorithm on. |
required |
steps
|
int
|
The number of estimation steps. Increasing this number increases the runtime but also increases the accuracy of the estimation. |
30
|
verbose
|
bool
|
If true, prints out the runtime of the algorithm. |
False
|
Returns:
| Type | Description |
|---|---|
[float, float]
|
Runtime estimation in seconds, standard deviation in seconds |
Source code in changepoynt/algorithms/base_algorithm.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
transform(time_series)
This function calculates the anomaly score for each sample within the time series, starting from an initial sample being the first sample of fitting the past hankel matrix (window_length + n_windows samples) and the new future hankel matrix (+lag). Values before that will be zero
It also does some assertions regarding the specified parameters and whether they fit the time series.
This function builds the interface to the jit compiled static function used to iterate over the array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/sst.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | |
main()
This function is not intended for users, but for quick testing during development.
Source code in changepoynt/algorithms/sst.py
594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 | |
esst
ESST
Bases: SingularSubspaceAlgorithm
This class implements an own idea for change point detection.
It has been published together with Sarah Boelter (UMN) Boelter, Sarah, Lucas Weber, et al. "Fault Prediction in Planetary Drilling Using Subspace Analysis Techniques." International Conference on Intelligent Autonomous Systems (IAS-19). 2025. (* is equal contribution) https://ntrs.nasa.gov/citations/20250002705
It is a research algorithm, please use with caution.
Source code in changepoynt/algorithms/esst.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
__init__(window_length, n_windows=None, lag=None, rank=5, scale=True, method='fbrsvd', random_rank=None, scoring_step=1, use_fast_hankel=False, mitigate_offset=False)
Experimental change point detection method evaluation the prevalence of change points within a signal by comparing the difference in eigenvectors between to points in time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
This specifies the length of the time series (in samples), which will be used to extract the representative "eigensignals" to be compared before and after the lag. The window length should be big enough to cover any wanted patterns (e.g., bigger than the periodicity of periodic signals). |
required |
n_windows
|
int
|
This specifies the number of consecutive time windows used to extract the "eigensignals" from the given time series. It should be big enough to cover different parts of the target behavior. If one does not specify this value, we use the rule of thumb and take as many time windows as you specified the length of the window (rule of thumb). |
None
|
lag
|
int
|
This parameter specifies the distance of the comparison for behaviors. In easy terms it tells the algorithms how far it should look into the future to find change in behavior to the current signal. If you do not specify this parameter, we use a rule of thumb and look ahead half of the window length you specified to cover the behavior. |
None
|
rank
|
int
|
This parameter specifies the amount of "eigensignals" which will be used to measure the dissimilarity of the signal in the future behavior. As a rule of thumb, we take the five most dominant "eigensignals" if you do not specify otherwise. |
5
|
scale
|
bool
|
Due to numeric stability, we REALLY RECOMMEND scaling the signal into a restricted value range. Per default, we use a min max scaling to ensure a restricted value range. In the presence of extreme outliers, this could cause problems, as the signal will be squished. |
True
|
random_rank
|
int
|
To use the randomized singular value decomposition, one needs to provide a randomized rank (size of second matrix dimension for the randomized matrix) as specified in [3]. The lower this value, the faster the computation but the higher the error (as the approximation gets worse). |
None
|
scoring_step
|
int
|
The distance between scoring steps in samples (e.g., 2 would half the computation). |
1
|
use_fast_hankel
|
bool
|
Whether to deploy the fast hankel matrix product. |
False
|
mitigate_offset
|
bool
|
Use a sliding mean window to mitigate the constant offset of time series. |
False
|
Source code in changepoynt/algorithms/esst.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
covered_regions()
This function returns the size of the region of interest (ROI). The ROI is the total amount of samples covered for the score detection.
The covered region depends on the window size, the number of windows, and the lag between the two Hankel matrices (past and future Hankel matrices).
The function return two different sizes:
1) total_region: The total region covered by both matrices with lag. This is also the minimum number of samples that a time series must have before the subspace method can run.
2) matrix_region: The region covered by one Hankel matrix. Essentially the region that is sampled for subsequences that are used to extract the characteristics
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
total_region: int, matrix_region: int |
Source code in changepoynt/algorithms/base_algorithm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
estimate_runtime(signal, steps=30, verbose=False)
This function estimates the runtime for a given signal by running the initial steps.
Singular subspace algorithms grow linearly with the signal length of the time series, the overall runtime is number of steps times signal length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signal
|
ndarray
|
The signal that you will run the algorithm on. |
required |
steps
|
int
|
The number of estimation steps. Increasing this number increases the runtime but also increases the accuracy of the estimation. |
30
|
verbose
|
bool
|
If true, prints out the runtime of the algorithm. |
False
|
Returns:
| Type | Description |
|---|---|
[float, float]
|
Runtime estimation in seconds, standard deviation in seconds |
Source code in changepoynt/algorithms/base_algorithm.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
transform(time_series)
This function calculates the anomaly score for each sample within the time series.
It also does some assertion regarding time series length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/esst.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
left_entropy(hankel, rank, random_rank, method)
Entropy Singular Spectrum Transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hankel
|
ndarray
|
the hankel matrix of the signal |
required |
rank
|
int
|
the number of (approximated) eigenvectors as subspace of H1 |
required |
random_rank
|
int
|
the sampling parameter for the randomized svd |
required |
method
|
str
|
which rsvd method to use |
required |
Returns:
| Type | Description |
|---|---|
float
|
the change point score, the input vector x0 |
Source code in changepoynt/algorithms/esst.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | |
skewness(pdf_matrix, x)
Compute the skewness of a matrix of probability density functions given as a numpy array.
Args: pdf_matrix (numpy array): Matrix of probability density functions, where each row is a probability density function x (numpy array): Array of values for which the pdfs are defined
Returns: Skewness of each probability density function as a numpy array
Source code in changepoynt/algorithms/esst.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
msst
MSST
Bases: SingularSubspaceAlgorithm
This class implements all the utility and functionality necessary to compute the Multivariate Singular Spectrum Transformation (MSST).
This algorithm is an extension of the SST change point detection algorithm as described in:
[1] Idé, Tsuyoshi, and Keisuke Inoue. "Knowledge discovery from heterogeneous dynamic systems using change-point correlations." Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2005.
[2] Idé, Tsuyoshi, and Koji Tsuda. "Change-point detection using krylov subspace learning." Proceedings of the 2007 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2007.
It is basically a merger of techniques from [1, 2] and: [3] Alanqary, Arwa, Abdullah Alomar, and Devavrat Shah. "Change point detection via multivariate singular spectrum analysis." Advances in Neural Information Processing Systems 34 (2021): 23218-23230.
It also uses a technique called randomized singular value decomposition which is surveyed and described in
[4] Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 53.2 (2011): 217-288.
For the option fast_hankel=True it uses and algorithm based on
[5] L. Weber and R. Lenz. "Accelerating Singular Spectrum Transformation for Scalable Change Point Detection," in IEEE Access, Volume 11, 2025. doi: 10.1109/ACCESS.2025.3640386.
There will be a parameter specifying, whether to use the implicit krylov approximation from [2]. This significantly speeds up the computation but can reduce accuracy as the "eigensignals" are only approximated indirectly using a krylov subspace with the possible change point signal as seed. This method also deploys a feedback of the dominant "eigensignal" as seed into the power method (dominant eigenvector estimation) with additive gaussian noise.
!NOTE!: Most computational heavy functions are implemented as standalone functions, even if they require instance variables as this enables us to use jit compiled code provided by the numba compiler for faster calculations.
Also, we decided to do type hinting but not type checking as it requires too much boilerplate code. We recommend to input the specified types as the program will break in unforeseen ways otherwise.
Source code in changepoynt/algorithms/msst.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
__init__(window_length, n_windows=None, lag=None, rank=5, scale=True, method='ika', lanczos_rank=None, random_rank=None, feedback_noise_level=0.001, scoring_step=1, use_fast_hankel=False)
Initializing the Singular Spectrum Transformation (SST) requires setting a lot of parameters. See the parameters explanation for some intuition into the right choices. Currently, there are two SST methods from [1] and [2] available for use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
This specifies the length of the time series (in samples), which will be used to extract the representative "eigensignals" to be compared before and after the lag. The window length should be big enough to cover any wanted patterns (e.g., bigger than the periodicity of periodic signals). |
required |
n_windows
|
int
|
This specifies the number of consecutive time windows used to extract the "eigensignals" from the given time series. It should be big enough to cover different parts of the target behavior. If one does not specify this value, we use the rule of thumb and take as many time windows as you specified the length of the window (rule of thumb). |
None
|
lag
|
int
|
This parameter specifies the distance of the comparison for behaviors. In easy terms it tells the algorithms how far it should look into the future to find change in behavior to the current signal. If you do not specify this parameter, we use a rule of thumb and look ahead half of the window length you specified to cover the behavior. |
None
|
rank
|
int
|
This parameter specifies the amount of "eigensignals" which will be used to measure the dissimilarity of the signal in the future behavior. As a rule of thumb, we take the five most dominant "eigensignals" if you do not specify otherwise. |
5
|
scale
|
bool
|
Due to numeric stability, we REALLY RECOMMEND scaling the signal into a restricted value range. Per default, we use a min max scaling to ensure a restricted value range. In the presence of extreme outliers, this could cause problems, as the signal will be squished. |
True
|
method
|
str
|
Currently, "svd" [1], "ika" [2] and "rsvd" are available. ika corresponds to IKA-SST in [2] which will speed up the computation significantly. |
'ika'
|
lanczos_rank
|
int
|
In order to use the implicit approximation of "eigensignals" by using "ika" [2] method, one needs to decide the rank of the implicit approximation of each "eigensignal". As a rule of thumb, we determine this value as being twice the amount of specified "eigensignals" to span the subspace (parameter rank). This is also recommended by the author of IKA-SST in [2]. |
None
|
random_rank
|
int
|
In order to use the randomized singular value decomposition, one needs to provide a randomized rank (size of second matrix dimension for the randomized matrix) as specified in [3]. The lower this value, the faster the computation but the higher the error (as the approximation gets worse). |
None
|
feedback_noise_level
|
float
|
This specifies the amplitude of additive white gaussian noise added to the dominant "eigensignal" of the future behavior when shifting forward. This idea is noted in [2] and initializes the seed of the power method for dominant eigenvector estimation with the precious dominant eigenvector plus the noise level specified here. The noise level should just be a small fraction of the value range of the signal. |
0.001
|
scoring_step
|
int
|
the distance between scoring steps in samples (e.g., 2 would half the computation). |
1
|
use_fast_hankel
|
bool
|
Use the O(N*logN) version for the decomposition. This is likely to have large speed improvements for window sizes > 150 |
False
|
Source code in changepoynt/algorithms/msst.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
covered_regions()
This function returns the size of the region of interest (ROI). The ROI is the total amount of samples covered for the score detection.
The covered region depends on the window size, the number of windows, and the lag between the two Hankel matrices (past and future Hankel matrices).
The function return two different sizes:
1) total_region: The total region covered by both matrices with lag. This is also the minimum number of samples that a time series must have before the subspace method can run.
2) matrix_region: The region covered by one Hankel matrix. Essentially the region that is sampled for subsequences that are used to extract the characteristics
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
total_region: int, matrix_region: int |
Source code in changepoynt/algorithms/base_algorithm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
estimate_runtime(signal, steps=30, verbose=False)
This function estimates the runtime for a given signal by running the initial steps.
Singular subspace algorithms grow linearly with the signal length of the time series, the overall runtime is number of steps times signal length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signal
|
ndarray
|
The signal that you will run the algorithm on. |
required |
steps
|
int
|
The number of estimation steps. Increasing this number increases the runtime but also increases the accuracy of the estimation. |
30
|
verbose
|
bool
|
If true, prints out the runtime of the algorithm. |
False
|
Returns:
| Type | Description |
|---|---|
[float, float]
|
Runtime estimation in seconds, standard deviation in seconds |
Source code in changepoynt/algorithms/base_algorithm.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
transform(time_series)
This function calculates the anomaly score for each sample within the time series, starting from an initial sample being the first sample of fitting the past hankel matrix (window_length + n_windows samples) and the new future hankel matrix (+lag). Values before that will be zero
It also does some assertions regarding the specified parameters and whether they fit the time series.
This function builds the interface to the jit compiled static function used to iterate over the array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/msst.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
main()
This function is not intended for users, but for quick testing during development.
Source code in changepoynt/algorithms/msst.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
messt
MESST
Bases: SingularSubspaceAlgorithm
This class implements the Multivariate Enhanced Singular Spectrum Transformation (MESST)
The univariate algorithm ESST has been published together with Sarah Boelter (UMN) Boelter, Sarah, Lucas Weber, et al. "Fault Prediction in Planetary Drilling Using Subspace Analysis Techniques." International Conference on Intelligent Autonomous Systems (IAS-19). 2025. (* is equal contribution) https://ntrs.nasa.gov/citations/20250002705
It is a research algorithm, please use with caution.
Source code in changepoynt/algorithms/messt.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
__init__(window_length, n_windows=None, lag=None, rank=5, scale=True, method='rsvd', random_rank=None, scoring_step=1, use_fast_hankel=False)
Experimental change point detection method evaluation the prevalence of change points within a signal by comparing the difference in eigenvectors between to points in time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
This specifies the length of the time series (in samples), which will be used to extract the representative "eigensignals" to be compared before and after the lag. The window length should be big enough to cover any wanted patterns (e.g., bigger than the periodicity of periodic signals). |
required |
n_windows
|
int
|
This specifies the number of consecutive time windows used to extract the "eigensignals" from the given time series. It should be big enough to cover different parts of the target behavior. If one does not specify this value, we use the rule of thumb and take as many time windows as you specified the length of the window (rule of thumb). |
None
|
lag
|
int
|
This parameter specifies the distance of the comparison for behaviors. In easy terms it tells the algorithms how far it should look into the future to find change in behavior to the current signal. If you do not specify this parameter, we use a rule of thumb and look ahead half of the window length you specified to cover the behavior. |
None
|
rank
|
int
|
This parameter specifies the amount of "eigensignals" which will be used to measure the dissimilarity of the signal in the future behavior. As a rule of thumb, we take the five most dominant "eigensignals" if you do not specify otherwise. |
5
|
scale
|
bool
|
Due to numeric stability, we REALLY RECOMMEND scaling the signal into a restricted value range. Per default, we use a min max scaling to ensure a restricted value range. In the presence of extreme outliers, this could cause problems, as the signal will be squished. |
True
|
random_rank
|
int
|
To use the randomized singular value decomposition, one needs to provide a randomized rank (size of second matrix dimension for the randomized matrix) as specified in [3]. The lower this value, the faster the computation but the higher the error (as the approximation gets worse). |
None
|
scoring_step
|
int
|
The distance between scoring steps in samples (e.g., 2 would half the computation). |
1
|
use_fast_hankel
|
bool
|
Whether to deploy the fast hankel matrix product. |
False
|
Source code in changepoynt/algorithms/messt.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
covered_regions()
This function returns the size of the region of interest (ROI). The ROI is the total amount of samples covered for the score detection.
The covered region depends on the window size, the number of windows, and the lag between the two Hankel matrices (past and future Hankel matrices).
The function return two different sizes:
1) total_region: The total region covered by both matrices with lag. This is also the minimum number of samples that a time series must have before the subspace method can run.
2) matrix_region: The region covered by one Hankel matrix. Essentially the region that is sampled for subsequences that are used to extract the characteristics
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
total_region: int, matrix_region: int |
Source code in changepoynt/algorithms/base_algorithm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
estimate_runtime(signal, steps=30, verbose=False)
This function estimates the runtime for a given signal by running the initial steps.
Singular subspace algorithms grow linearly with the signal length of the time series, the overall runtime is number of steps times signal length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signal
|
ndarray
|
The signal that you will run the algorithm on. |
required |
steps
|
int
|
The number of estimation steps. Increasing this number increases the runtime but also increases the accuracy of the estimation. |
30
|
verbose
|
bool
|
If true, prints out the runtime of the algorithm. |
False
|
Returns:
| Type | Description |
|---|---|
[float, float]
|
Runtime estimation in seconds, standard deviation in seconds |
Source code in changepoynt/algorithms/base_algorithm.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
transform(time_series)
This function calculates the anomaly score for each sample within the time series.
It also does some assertion regarding time series length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/messt.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
ulsif
ULSIF
Bases: Algorithm
This class implements change point detection based on density ration estimation optimizing a least squares approach from:
[1] "A least-squares approach to direct importance estimation" T. Kanamori, S. Hido, and M. Sugiyama. Journal of Machine Learning Research, 10:1391–1445, 2009.
and can be implemented as a special version of with alpha = 0
[2] Liu, Song, et al. "Change-point detection in time-series data by relative density-ratio estimation." Neural Networks 43 (2013): 72-83.
which we are doing here.
Source code in changepoynt/algorithms/ulsif.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
__init__(window_length=10, n_windows=50, lag=None, estimation_lag=None, scoring_step=1, n_kernels=100, symmetric=True)
This defines all necessary parameters for the uLSIF to work. As uLSIF is similar to RuLSIF only with a zero alpha, we put all parameters through to RuLSIF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
the length of the windows we want to compare the densities for (k in the paper) |
10
|
n_windows
|
int
|
the amount of windows (n in the paper) |
50
|
lag
|
int
|
the difference in time between the past and the future comparison window |
None
|
estimation_lag
|
int
|
how often we should re-estimate lambda and sigma for the gaussian kernels |
None
|
scoring_step
|
int
|
the number of samples between each change score value (e.g. 2 would half the computations) |
1
|
n_kernels
|
int
|
the number of kernels for the density ration estimation |
100
|
symmetric
|
specifies whether to use two processes to compute a forward and backward pass simultaneously |
True
|
Source code in changepoynt/algorithms/ulsif.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
rulsif
RuLSIF
Bases: Algorithm
This class implements change point detection based on relative density ration estimation optimizing the least squares approach (RuLSIF) from:
[1] Liu, Song, et al. "Change-point detection in time-series data by relative density-ratio estimation." Neural Networks 43 (2013): 72-83.
The code has been created looking at: - https://github.com/hoxo-m/densratio_py - https://pypi.org/project/densratio/ - http://www.makotoyamada-ml.com/RuLSIF.html (Matlab implementation original author) - https://github.com/anewgithubname/change_detection (Matlab implementation co-author)
Source code in changepoynt/algorithms/rulsif.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
__init__(window_length=10, n_windows=50, lag=None, estimation_lag=None, scoring_step=1, n_kernels=100, alpha=0.01, symmetric=True, parallel=False)
This defines all necessary parameters for the RuLSIF to work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
the length of the windows we want to compare the densities for (k in the paper) |
10
|
n_windows
|
int
|
the amount of windows (n in the paper) |
50
|
lag
|
int
|
the difference in time between the past and the future comparison window |
None
|
estimation_lag
|
int
|
how often we should re-estimate lambda and sigma for the gaussian kernels |
None
|
scoring_step
|
int
|
the number of samples between each change score value (e.g. 2 would half the computations) |
1
|
n_kernels
|
int
|
the number of kernels for the density ration estimation |
100
|
alpha
|
float
|
the smoothing parameter for the RELATIVE in RuLSIF |
0.01
|
symmetric
|
bool
|
specifies whether to use two processes to compute a forward and backward pass simultaneously |
True
|
Source code in changepoynt/algorithms/rulsif.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
fluss
FLUSS
Bases: Algorithm
This class uses the FLUSS algorithm described in:
[1] Gharghabi, Shaghayegh, et al. "Matrix profile VIII: domain agnostic online semantic segmentation at superhuman performance levels." 2017 IEEE international conference on data mining (ICDM). IEEE, 2017.
This class essentially wraps the stumpy library which implements a fast approach to calculate the matrix profile. https://stumpy.readthedocs.io/en/latest/index.html
TODO: Implement GPU support and streaming? (FLOSS)
Source code in changepoynt/algorithms/fluss.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
__init__(window_length)
We initialize the matrix profile and the subsequent floss using only the window length used for comarisons.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
the length of the window used for the distance comparisons in the matrix profile. |
required |
Source code in changepoynt/algorithms/fluss.py
22 23 24 25 26 27 28 29 30 | |
transform(time_series)
This function calculate the anomaly score for each sample within the time series.
It also does some assertion regarding time series length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/fluss.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
floss
FLOSS
Bases: Algorithm
This class uses the FLOSS algorithm described in:
[1] Gharghabi, Shaghayegh, et al. "Matrix profile VIII: domain agnostic online semantic segmentation at superhuman performance levels." 2017 IEEE international conference on data mining (ICDM). IEEE, 2017.
This class essentially wraps the stumpy library which implements a fast approach to calculate the matrix profile. https://stumpy.readthedocs.io/en/latest/index.html
TODO: Implement GPU support and streaming? (FLOSS)
Source code in changepoynt/algorithms/floss.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
__init__(window_length, initial_length=None)
We initialize the matrix profile and the subsequent floss using only the window length used for comarisons.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
the length of the window used for the distance comparisons in the matrix profile. |
required |
initial_length
|
int
|
the length for the initial matrix profile length |
None
|
Source code in changepoynt/algorithms/floss.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
transform(time_series)
This function calculate the anomaly score for each sample within the time series.
It also does some assertion regarding time series length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_series
|
ndarray
|
1D array containing the time series to be scored |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
anomaly score |
Source code in changepoynt/algorithms/floss.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
bocpd
============================================================================ Author: Gregory Gundersen
Python implementation of Bayesian online changepoint detection for a normal model with unknown mean parameter. For algorithm details, see
Adams & MacKay 2007
"Bayesian Online Changepoint Detection"
https://arxiv.org/abs/0710.3742
For Bayesian inference details about the Gaussian, see:
Murphy 2007
"Conjugate Bayesian analysis of the Gaussian distribution"
https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf
This code is associated with the following blog posts:
https://gregorygundersen.com/blog/2019/08/13/bocd/
https://gregorygundersen.com/blog/2020/10/20/implementing-bocd/
============================================================================
GaussianUnknownMean
Source code in changepoynt/algorithms/bocpd.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
var_params
property
Helper function for computing the posterior variance.
__init__(mean0, var0, varx)
Initialize model.
meanx is unknown; varx is known p(meanx) = N(mean0, var0) p(x) = N(meanx, varx)
Source code in changepoynt/algorithms/bocpd.py
190 191 192 193 194 195 196 197 198 199 200 201 | |
log_pred_prob(t, x)
Compute predictive probabilities \pi, i.e. the posterior predictive for each run length hypothesis.
Source code in changepoynt/algorithms/bocpd.py
203 204 205 206 207 208 209 210 | |
update_params(t, x)
Upon observing a new datum x at time t, update all run length hypotheses.
Source code in changepoynt/algorithms/bocpd.py
212 213 214 215 216 217 218 219 220 221 | |
bocd(data, model, hazard)
Return run length posterior using Algorithm 1 in Adams & MacKay 2007.
Source code in changepoynt/algorithms/bocpd.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
generate_data(varx, mean0, var0, T, cp_prob)
Generate partitioned data of T observations according to constant
changepoint probability cp_prob with hyperpriors mean0 and prec0.
Source code in changepoynt/algorithms/bocpd.py
232 233 234 235 236 237 238 239 240 241 242 243 244 | |
clasp
CLASP
Bases: Algorithm
This class uses the ClaSP algorithm published in
[1] Schäfer, Patrick, Arik Ermshaus, and Ulf Leser. "Clasp-time series segmentation." Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021.
This class essentially wraps the claspy library https://github.com/ermshaua/claspy https://sites.google.com/view/ts-clasp/
In the current configuration it is recommended to set no parameters, as CLASP is able to extract it from the data. TODO: Own implementation?
Source code in changepoynt/algorithms/clasp.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |