Data Mining, Quant, Statistics, Computer Science: Jobs, Resumes, Directory

Precision Recruiting

Site Map

[ Home ]

[ Finance ]

[ Web Audit ]

[ Consulting ]

Statistical Software

Robust Multivariate Ridge and Linear Regression with Bootstrap
- Description
- Download Perl source code (462 lines)
- Download C source code (540 lines; fast, well documented)
Stock Market Simulator
- Description
- C source code (127 lines)
Simulation of Clustered Data
- Produces simulated clusters.
- Description: The seed routine creates a cluster of 1000 points, saved in cluster.txt: each row corresponds to a point; the first column is the cluster number, and the next two columns are the x and y coordinates. The cluster number is automatically incremented each time a new call to seed is made, resulting in the creation of a new cluster. The distance routine computes the distance between two points, for 100 points randomly selected in the data set previously created (cluster.txt). The output is a file dist.txt, with one row per pair of points, with two fields: the first column is an indicator and is equal to 1 if both points belong to the same cluster; the second column is the distance between the two points. This script illustrates how to check whether a data set contains one or two clusters by looking at the distribution of distances: a gap in the distribution means the presence of distinct clusters. It also suggests that the computational complexity of computing whether a data set contains one of more clusters is well below O(n), possibly O(n^0.5), if one uses sampling techniques.
- Perl source code (77 lines)

Data Mining • Machine Learning • Analytics • Quant • Statistics • Econometrics • Biostatistics • Web Analytics • Business Intelligence • Risk Management • Operations Research • AI • Predictive Modeling • Actuarial Sciences • Statistical Programming • Customer Insight • Data Modeling • Competitive Intelligence • Market Research • Information Retrieval • Computer Science • Retail Analytics • Healthcare Analytics • ROI Optimization • Design Of Experiments • Scoring Models • Six Sigma • SAS • Splus • SAP • ETL • SPSS • CRM • Cloud Computing • Electrical Engineering • Fraud Detection • Marketing Databases • Data Analysis • Decision Science • Text Mining