What's new in JMSL Numerical Library
Java Numerical Library is tuned for high-performance data analysis with tools for big data – providing 100% Java analytics to simplify complex code development.
Java Numerical Library 2018 Highlights
New and Improved Features
Major changes in the new release include performance improvements, enhancements to linear regression, and some architectural upgrades behind the scenes.
The first area with improved performance is time series outlier detection. Outliers in a data set are unusually large or small observations compared to the rest of the data. Outliers in a time series may occur as single points, as a short series of anomalous values, as a temporary change, or as a permanent level shift. The underlying causes for the different types of outliers are various and while they can be instructive, generally they are not predictable. In most cases, outliers should be filtered out of the data before producing model estimates and forecasts.
JMSL ARMAOutlierIdentification implements the procedure described in Chen and Liu (1993) for automatic detection of outliers in time series. The algorithm detects potential outliers and identifies them as one of the types mentioned above. At a certain stage, the algorithm solves a sequence of least squares regressions. Instead of computing each of these problems from scratch, it is possible to solve the initial problem and then update the solution at each step via Givens rotations. This saves a lot of computation time, especially when many outliers were initially detected. After making these changes, the method AutoARIMA.compute, which uses ARMAOutlierIdentification, is now 80% to 99% faster, depending on the size of the problem.
The second area with performance improvements is decision trees. Decision tree algorithms build a model by recursively splitting the data on values of the "best" predictor variable (i.e., the variable that best explains the values of the target variable in that subset of data). The process is repeated in each new subset until stopping criteria is met. By better exploiting these data partitions at each stage, the decision tree algorithms (C45, ALACART, CHAID, and QUEST) are from 10% to 50% faster than before, depending on the size of the data. Class methods that use the decision tree algorithms, such as GradientBoosting.fitModel, are also comparably faster as a result.
In the process of making performance improvements, a few new methods were added and updated including:
IMSL has been around for almost 50 years, so there are fewer bugs than one might find in less mature libraries; however, together with our customers, we always managed to find and fix a few to help continuously improve the robustness of the IMSL library. Details can be found in the product change logs: 2018.0 and 2018.1.
Another key component of JMSL 2018.0.0 is the improvement of internal JMSL tools and processes to enable more rapid defect patching cycles and Java platform certification going forward. With these updates and additional planned improvements, the development team will provide new product releases on the most widely adopted Java versions first, then respond to requests for platform support from our customers. Additional platforms will be made available as warranted by demand.