before code : loop should run @ least 143,792,640,000 times , create table @ least produce 563,760 rows without duplicated want know how speed or parallel computing hadoop accelerate between php , mysql.
code below:
mysql connection
$link=mysql_connect($servername,$username,$password); mysql_select_db($dbname); $sql= "insert em (source,target) values "; for loop read data mysql check function if duplicate not insert , update count=count+1
for($i=0;$i<$combine_arr_size;$i++){ for($j=0;$j<$combine_arr_size;$j++){ //below check if find duplicated a,b recognize b,a same thing
if(check($combine_words_array[$i],$combine_words_array[$j])) { $update_query="update em set count = count+1 (source='$combine_words_array[$i]' , target='$combine_words_array[$j]') or (source='$combine_words_array[$j]' , target='$combine_words_array[$i]');"; mysql_query($update_query); } else { if (!$link) { die("connection failed: " . mysql_error()); } //else using insert table () value concatenate string
$sql.="('$combine_words_array[$i]','$combine_words_array[$j]'),"; mysql_query(substr($sql,0,-1)); $sql= "insert em (source,target) values "; } } } read vector align comebine_word_array[] combine_word_array[]
below check function , check if find pair return value
function check($src, $trg) { $query = mysql_query("select * em (source='$src' , target='$trg') or (source='$trg' , target='$src');"); if (mysql_num_rows($query) > 0) { return 1; } else { return 0; } } table
+--------+--------------+------+-----+---------+-------+ | field | type | null | key | default | | +--------+--------------+------+-----+---------+-------+ | source | varchar(255) | yes | | null | | | target | varchar(255) | yes | | null | | | count | int(11) | no | | 0 | | | prob | double | no | | 0 | | +--------+--------------+------+-----+---------+-------+ now php code influence source ,target , count
it difficult know want duplicate combinations. example getting every combination of array, going lots of duplicates count twice.
however tempted load words table (possibly temp table) , cross join of table against every combination, , use insert on duplicate key clause.
very crudely, this:-
<?php $sql = "create temporary table words ( word varchar(255), primary key (`word`), )"; $link = mysql_connect($servername,$username,$password); mysql_select_db($dbname); $sql = "insert words (word) values "; $sql_parm = array(); foreach($combine_words_array $combine_word) { $sql_parm[] = "('".mysql_real_escape_string($combine_word)."')"; if (count($sql_parm) > 500) { mysql_query($sql.implode(',', $sql_parm)); $sql_parm = array(); } } if (count($sql_parm) > 0) { mysql_query($sql.implode(',', $sql_parm)); $sql_parm = array(); } $sql = "insert em(source, target) select w1.word, w2.word words w1 cross join words w2 on duplicate key update `count` = `count` + 1 "; mysql_query($sql); this rely on having unique key covering both source , target columns.
but whether option depends on details of records. example current code if there 2 words (say , b) find combination / b , combination b / a. both combinations update same records
Comments
Post a Comment