is an
Gemma Lynn / @ellotheth
What if people could run network diagnostics from our servers?
Launched in March 2013 with four tests:
lookup
ping
dig
trace
Founders were unhappy with Lithium:
Is there a better option?
We're happy with Gearman, and we're still using it today.
$gmworker= new GearmanWorker();
$gmworker->addServer();
$gmworker->addFunction("dig", "gearman_dig");
// three more of these for lookup, ping and trace
$m = new Mongo();
$collection = $m->wheresitup->results;
while($gmworker->work()) {
if ($gmworker->returnCode() != GEARMAN_SUCCESS) {
echo "return_code: " . $gmworker->returnCode() . "\n";
break;
}
}
function gearman_dig($job) {
global $collection;
list($server, $url, $workID) = unserialize($job->workload());
$result = shell_exec("scripts/whereisitup-dig.sh $server $url");
$collection->update(/* add $result to the $workID job */);
}
#!/bin/sh
ssh $1 dig $2
[program:wheresitup]
command=/usr/bin/php /var/local/wheresitup/worker.php
numprocs=25
process_name=%(program_name)s_%(process_num)02d
directory=/tmp
stdout_logfile=/var/log/supervisor/wheresitup.log
autostart=true
autorestart=true
Restarting the workers was ugly.
... wait for a lull ...
$ ps ax|grep wheresitup
$ kill -9 [pid] [pid] [pid] ...
25 workers was not cutting it.
* 9a229d5 (2014-04-03 01:43:08 +0000) Will Roberts
| and bump to 190 workers
* 152f3a7 (2014-04-03 00:07:00 +0000) Will Roberts
bump number of workers up to 70
* df6ea55 (2013-07-18 19:20:40 +0000) Will Roberts
bump max hops to 50
* d8ec934 (2013-04-20 20:24:28 +0000) Will Roberts
bump to 50 workers
Memory became a problem after 50 workers.
+ $done = 0;
while( $gmworker->work() ) {
+ $done++;
if( $gmworker->returnCode() != GEARMAN_SUCCESS ) {
echo "return_code: " . $gmworker->returnCode() . "\n";
break;
+ } else if( $done > 200 ) {
+ echo "quitting after 200 jobs\n";
+ break;
}
Supervisord was limited to 1024 file descriptors.
1024 file descriptors = ~200 workers
(This may not be a problem anymore!)
class worker {
public $readPipe;
function run($cmd) {
$descriptorspec = array(
array("pipe", "r"), // STDIN
array("pipe", "w"), // STDOUT
array("file", "/tmp/error-output.txt", "a") // STDERR
);
$this->process = proc_open($cmd, $descriptorspec, $pipes);
$this->readPipe = $pipes[1];
stream_set_blocking($this->readPipe, 0);
}
}
$pipes = array();
foreach($workers as $k => $worker) { // global array of workers
$pipes[$k] = $work->readPipe;
}
stream_select($pipes, $write, $except, 1, 0);
if ($pipes) { // readable pipes, re-indexed (thanks, PHP)
foreach($pipes as $k => $stdout) {
foreach($workers as $index => $worker) {
if ($stdout === $worker->readPipe) {
// munge and save the result
// release the worker and the pipe
}
}
}
}
Concurrency makes everything faster! Yay!
A bunch of the old problems still exist. Boo.
We hit some walls really hard.
I was not enthusiastic about introducing another language into our stack.
Every other "beginning Go" tutorial was "how to build a worker queue"
func main() {
pipe := make(chan int)
go doThing(pipe)
pipe <- 7 // blocks until read
log.Println(<-pipe) // i == 8
}
func doThing(pipe chan int) {
i := <-pipe // blocks until written
i += 1
pipe <- i
}
Because X is terrible.
April 2015: Initial prototype
func NewManager(maxConcurrent int, timeout int) *Manager {
manager := &Manager{}
manager.isFinished = make(chan bool)
manager.stop = make(chan bool)
manager.newTasks = make(chan Taskable, maxConcurrent)
manager.doneTasks = make(chan Taskable, maxConcurrent)
go manager.start(timeout)
go manager.finish()
return manager
}
func (mgr *Manager) start(taskTimeout int) {
// pull tests off the job queue
for task := range mgr.newTasks {
// non-blocking: setup stdin and stderr, start the process
task.Start()
// blocking: read stdin and stderr, wait for finish
go task.Process(taskTimeout, mgr.doneTasks)
}
// when there are no more tests, stop
mgr.stop <- true
}
Go is now part of WonderProxy's toolkit.
Recognize non-nail-like problems.
Find the right tool, and they'll make you better.
Gemma Lynn / @ellotheth