[Noisebridge-discuss] Build advice for a new system / heavy cluster GPU AI processing?
sai at saizai.com
Mon Jul 11 20:27:01 PDT 2011
On Mon, Jul 11, 2011 at 22:15, Mike Schachter <mike at mindmech.com> wrote:
> The grid search is your problem! It's unavoidable when you're
> doing cross validation though, because you definitely want the
> parameters that give you the lowest generalization error. You're
> doing cross validation, right?
Of course. That's kinda the main point - I want to know
a) what the best performances is on various parameters of binning,
vectorization method etc
b) whether there's some trend that may be interesting in the C/G
params over that, such as narrowness of optimum params, relationship
to bin size, or the like
Cross-validation results are the primary datum. ;-)
> Although a GPU will help individual instances of training the
> SVM classifer, in general you should parallelize the grid search
> across cores.
Sorry, I should've been clearer - I can easily use all 4 of my cores
using matlabpool (and for that matter multiple remote cores if it's
set up correctly), I just reported the single-core timings for
> Specifically, train an SVM classifer per hyperparameter
> combination (kernel, bin size, etc).
As in one training per hyperparam combo? If that were possible — i.e.
if I didn't have to retrain the damn thing from scratch for every step
in the grid search — that would drastically cut down my optimization
> Also, SVM kind of sucks for multi-class classification. Have you
> considered random forests?
I'm not familiar with that. Could you give me a pointer?
Ideally I would like to be able to compare multiple different
classifier methods, as that's a large part of what interests me in the
question - eg maybe there's some interesting case where some
classifiers are better in one kind of binning and another set are
better for another kind.
Which of course means I still need to run even the slow ones. :-/
More information about the Noisebridge-discuss