Friday, February 24, 2012

Alphanumeric sorting

This is a problem for almost every project that I work on. I get a list of
location for water and/soil sampling which look like this
mw-1
mw-2
mw-3
mw-4
mw-10
mw-11
mw-20
mw-31
so I must use a character/text field. But when I sort this varchar filed, I
get
mw-1
mw-10
mw-11
mw-2
mw-20
mw-3
mw-31
mw-4
but I want a natural alphanumeric sorting order, hence
mw-1
mw-2
mw-3
mw-4
mw-10
mw-11
mw-20
mw-31
How can I acheive this in SQL?
Thanks.
Archerbagman3rd wrote:
...
> mw-1
> mw-2
> mw-3
> mw-4
> mw-10
> mw-11
> mw-20
> mw-31
> How can I acheive this in SQL?
How about ORDER BY CAST(SUBSTRING(ColName, 4, LEN(ColName)-3) AS INT) ?|||Assuming that "mw-" is always present...
Try this:
order by cast(replace([column_name], 'mw-', '') as int)
An even better solution would be to split this string into the two
individual pieces of information that it holds and store them in separate
columns.
This will definitely save you a lot of trouble in the future.
ML|||What do these codes represent? Assuming there is a predetermined set of
codes you can put them in their own table (I expect you would have a
table of them anyway) and add a "sequence" column to define the order.
If in fact they are not codes but numbers representing some numeric
value then you have a more fundamental design problem - that is, why
are you storing numeric values concatenated as strings in the first
place? If you can't fix the design you can in fact order on an
expression, like:
CAST(SUBSTRING(col,4,10) AS INT)
or
CAST(SUBSTRING(col,CHARINDEX('-',col)+1,10) AS INT)
David Portas
SQL Server MVP
--|||> What do these codes represent? Assuming there is a predetermined set of
> codes you can put them in their own table (I expect you would have a
> table of them anyway) and add a "sequence" column to define the order.
Adding a "sequence" column to define the order has been my workaround for
now, but it is long and tedious, especially when I have hundreds of wells
They are monitoring wells, and I have no control over the naming
conventions. Sometimes they use dashes, sometimes they dont. Sometimes the
y
call them emw(extraction monitoring well), and I also have soil borings too.
sb-1
sb-2
sometimes they use
esb-1
esb-2
So, essentially, I cannot rely on a certain number of characters being text,
then numbers, etc.
I have any idea of how to do this, but it seems awkward and kludgian.
Essentially, I would set up a cursor to loop through every character of ever
y
row, and break up the rows into sections every time it changed from text to
numeric and vice versa, then order by the sections that I created.
Has anyone done this before and/or is there a better way?
Thanks.
Archer
"David Portas" wrote:

> What do these codes represent? Assuming there is a predetermined set of
> codes you can put them in their own table (I expect you would have a
> table of them anyway) and add a "sequence" column to define the order.
> If in fact they are not codes but numbers representing some numeric
> value then you have a more fundamental design problem - that is, why
> are you storing numeric values concatenated as strings in the first
> place? If you can't fix the design you can in fact order on an
> expression, like:
> CAST(SUBSTRING(col,4,10) AS INT)
> or
> CAST(SUBSTRING(col,CHARINDEX('-',col)+1,10) AS INT)
> --
> David Portas
> SQL Server MVP
> --
>|||How about :
ORDER BY
LEFT(col, PATHINDEX('%[0-9]%', col) - 1),
CAST(SUBSTRING(col, PATHINDEX('%[0-9]%', col), 8000) AS INT)
BG, SQL Server MVP
www.SolidQualityLearning.com
"bagman3rd" wrote:
> Adding a "sequence" column to define the order has been my workaround for
> now, but it is long and tedious, especially when I have hundreds of wells
> They are monitoring wells, and I have no control over the naming
> conventions. Sometimes they use dashes, sometimes they dont. Sometimes t
hey
> call them emw(extraction monitoring well), and I also have soil borings to
o.
> sb-1
> sb-2
> sometimes they use
> esb-1
> esb-2
> So, essentially, I cannot rely on a certain number of characters being tex
t,
> then numbers, etc.
> I have any idea of how to do this, but it seems awkward and kludgian.
> Essentially, I would set up a cursor to loop through every character of ev
ery
> row, and break up the rows into sections every time it changed from text t
o
> numeric and vice versa, then order by the sections that I created.
> Has anyone done this before and/or is there a better way?
> Thanks.
> Archer
> "David Portas" wrote:
>|||That is a step in the right direction, but as I said before, I have no
control over the naming of these wells. So, they sometimes switch from text
to numeric to numeric to text, etc. i.e.
mw-1-d-050505
The d is for duplicate and 050505 is a date. So,
ORDER BY
LEFT(col, PATHINDEX('%[0-9]%', col) - 1),
CAST(SUBSTRING(col, PATHINDEX('%[0-9]%', col), 8000) AS INT)
works, but not if I have text following my number. What would I do to take
this to the next level of being able to handle strings with multiple
text-num-text-num formats?
Thanks.
Archer
"Itzik Ben-Gan" wrote:
> How about :
> ORDER BY
> LEFT(col, PATHINDEX('%[0-9]%', col) - 1),
> CAST(SUBSTRING(col, PATHINDEX('%[0-9]%', col), 8000) AS INT)
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
>
> "bagman3rd" wrote:
>|||So now you need alpha, numeric and date sorting from your code!
Obviously the encoding is badly designed. Unless you can tell us the
logic by which you would sort it I don't see how you can expect to code
that logic. I.e. which characters count as delimiters? which way do
dates sort? how do you distinguish a date from a number? More
importantly, since a human being would probably have great difficulty
doing the same thing your sorting is probably going to be very
difficult to read even if you can achieve it - thus defeating the
purpose of the sort in the first place.
Itzik's example should give you a start. You can extend that kind of
logic to as many parts and sub-parts as you like - assuming there is
some logic to be found in the code. I would use something like that to
populate your sequence column in a table each time you find a new code.
That way, if the sort sequence doesn't suit you can easily tweak it to
make the order more user-friendly.
David Portas
SQL Server MVP
--|||You can treat the date as a number. As I stated before, I have NO control
over how the well/soil boring sampling locations are named. I just want the
m
sorted in natural sorting order. I can find functions in scripting language
s
to do this, such as
http://maconlinux.net/php-online-ma...on.natsort.html
natsort
(PHP 4 , PHP 5)
natsort -- Sort an array using a "natural order" algorithm
Description
bool natsort ( array &array )
This function implements a sort algorithm that orders alphanumeric strings
in the way a human being would while maintaining key/value associations. Thi
s
is described as a "natural ordering". An example of the difference between
this algorithm and the regular computer string sorting algorithms (used in
sort()) can be seen below:
I cant believe that no one has run into this problem before, and someone
does not already have this figured out.
As stated, I would like to order alphanumeric strings in the way a human
being would.
I will try to extend the Itzik's example.
Thanks.
Archer
"David Portas" wrote:

> So now you need alpha, numeric and date sorting from your code!
> Obviously the encoding is badly designed. Unless you can tell us the
> logic by which you would sort it I don't see how you can expect to code
> that logic. I.e. which characters count as delimiters? which way do
> dates sort? how do you distinguish a date from a number? More
> importantly, since a human being would probably have great difficulty
> doing the same thing your sorting is probably going to be very
> difficult to read even if you can achieve it - thus defeating the
> purpose of the sort in the first place.
> Itzik's example should give you a start. You can extend that kind of
> logic to as many parts and sub-parts as you like - assuming there is
> some logic to be found in the code. I would use something like that to
> populate your sequence column in a table each time you find a new code.
> That way, if the sort sequence doesn't suit you can easily tweak it to
> make the order more user-friendly.
> --
> David Portas
> SQL Server MVP
> --
>|||As people have shown you, you can pull out the substring or set up a
look-up table for the sorting order. However, I would suggest
designing the codes to have leading zeroes in the right places. People
forget it takes time to design codes and often just whip up a
sequential list or a varyng lengh, mixed alphanumeric string.

No comments:

Post a Comment