I was looking for a SQL-only solution. I currently do not have enough extension building experience to pick the right solution. I did find a link to a related example from MS but left that alone for now. I have a small working example and it seems to work... with only two test cases: ``` drop table if exists sample; create table sample(name,val); insert into sample values ('ten', 10); insert into sample values ('fifty', 50); insert into sample values ('fifty', 50); insert into sample values ('fifty', 50); insert into sample values ('fifty', 50); insert into sample values ('fifty', 50); insert into sample values ('hundred', 100); insert into sample values ('hundred', 101); insert into sample values ('hundred', 102); insert into sample values ('hundred', 103); insert into sample values ('hundred', 150); insert into sample values ('2hundred', 200); insert into sample values ('2hundred', 225); insert into sample values ('2hundred', 250); insert into sample values ('3hundred', 300); select * from sample; select ''; -- k-mean -- The algorithm works as follow: -- Step 1: Choose groups in the feature plan randomly -- Step 2: Minimize the distance between the cluster center and the different observations (centroid). It results in groups with observations -- Step 3: Shift the initial centroid to the mean of the coordinates within a group. -- Step 4: Minimize the distance according to the new centroids. New boundaries are created. Thus, observations will move from one group to another -- Repeat until no observation changes groups -- Step 1: Choose groups in the feature plan randomly drop table if exists centers; create table centers (k, center); insert into centers (k,center) select k, avg(val) as center from (select ntile(4) over r as k, val from sample window r as (order by val)) group by k ; select * from centers order by k; -- Step 2: Minimize the distance between the cluster center and the different observations (centroid). It results in groups with observations -- Step 3: Shift the initial centroid to the mean of the coordinates within a group. -- Step 4: Minimize the distance according to the new centroids. New boundaries are created. Thus, observations will move from one group to another update centers as ce set center=n.center from ( select k, avg(val) center from ( select * from ( select c.k, s.val, abs(s.val-c.center) d, row_number() over r as rowno from sample s, centers c window r as (partition by s.val order by abs(s.val-c.center) ) ) as x where rowno=1 ) as y group by k ) as n where ce.k=n.k; select * from centers order by k; -- print the results select k, min(val) minval, max(val) maxval from ( select c.k, s.val, abs(s.val-c.center) d, row_number() over r as rowno from sample s, centers c window r as (partition by s.val order by abs(s.val-c.center) ) ) as x where rowno=1 group by k order by 1; ``` In step 1 above I used `ntile()` to select the "random" centroids. I updated the centers table with: ``` delete from centers; insert into centers values (1,10),(2,50),(3,75),(4,100); ``` And after repeating steps 2-4 I go to the same set from the first run. [a] this could be converted to a recursive CTE [b] `k` was just a guess but maybe it should be slightly more deterministic [c] and it needs more testing thanks again