ASA Conference of Statistical Practice
The 2018 American Statistical Association Conference on Statistical Practice aims to bring together hundreds of statistical practitioners and data scientists—including data analysts, researchers, and scientists—who engage in the application of statistics to solve real-world problems on a daily basis.
Abstract: Stochastic gradient boosting (SGB) is a popular and effective predictive model, but scaling up the model to distributed data is complicated by the fact that the algorithm is sequential. Achieving the exact estimate requires, in effect, coding the algorithm on the network so that each iteration of the algorithm involves all the data. This level of coding may be difficult to do in the short term for many practitioners. We propose an approximate method that fits SGB to each processor individually, but communicates from one node to the next partitions of the data that are used to initialize the SGB on the next processor. We compare the method to the exact method and one other approximate method on several data sets. The advantage of the approach is that it can be used as a proof of concept to determine whether or not the problem is predictable, given the data, before spending a good deal more for potentially more accuracy.
February 16, 2018
2:00 PM (Local Time / Pacific)