Converting a string to slug with JavaScript

090503

Recently I've been working on implementing slugs in my CMS to be able to generate nicer URLs. In order to do so I've created a little JavaScript function that converts a string to a slug. I'll first give you the code and then explain it a bit. [update: 2.07.10 Fixed IE issue][update: 18.09.10 Minor improvements]

Note that you should not trust any JavaScript validation or processing. The submitted data should always be validated on the server. The reason to do JavaScript validation or processing is to provide an enhanced user experience, but is not a security measure.

function string_to_slug(str) {
  str = str.replace(/^\s+|\s+$/g, ''); // trim
  str = str.toLowerCase();
  
  // remove accents, swap ñ for n, etc
  var from = "àáäâèéëêìíïîòóöôùúüûñç·/_,:;";
  var to   = "aaaaeeeeiiiioooouuuunc------";
  for (var i=0, l=from.length ; i<l ; i++) {
    str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i));
  }

  str = str.replace(/[^a-z0-9 -]/g, '') // remove invalid chars
    .replace(/\s+/g, '-') // collapse whitespace and replace by -
    .replace(/-+/g, '-'); // collapse dashes

  return str;
}

Here's a step by step description:

The first thing we do is trim the string, that is, remove any whitespace at the beginning and end. The regular expression /^\s+|\s+$/g does exactly that:
- / marks the start of the regular expression
- ^\s+ means "one or more white-space caracteres at the beginning of the string"
- | means "or"
- \s+$ means "one or more white-space caracteres at the end of the string"
- /g ends the regular expression, and sets the global flag (otherwise only one substitution would be performed)
Next, we convert the string to lower case
We are going to remove any invalid characters, but first we'll replace any 'special' letters for their 'plain' versions. For example in Spanish we have á, é and so on, and even though these are not valid characters in a slug, we don't want to simply remove them, so instead we replace them for a, e, etc. The JavaScript has nothing fancy here.

Note that I also choose to replace ·/_,:; for dashes (the first dot is the middle dot, used for example in Catalan), I think this will generate better slugs than if we simply remove this characters.

You might need/want to adjust this part of the function to suit your needs (your language might have other symbols that I haven't included here).
Now we're ready to remove any remaining invalid characters. The regular expression /[^a-z0-9 -]/g will match any character that is not a lowercase letter, a digit, a space or a dash. I won't explain this regexp in detail, this post is getting way too long! :) Do a search for "character classes", there's plenty of info around.

Note that we include spaces as a valid character. Don't worry, we'll get rid of them in the next step. We can't just remove them from the string, because we want to replace them by dashes.
Now it's time to replace any spaces with dashes. But we'll collapse any whitespace as well, so multiple spaces will be converted to a single dash. The expression /\s+/g should be easy if you understood the one about trimming the string.
Almost there! The expression /-+/g matches any series of consecutive dashes (which may occur as a result of the previous substitutions), so we replace that for a single dash. Job done!

There's room for improvement. For instance, we could replace the & sign for "and", but that brings problem with multilanguage sites. One could detect the language being used and replace by the appropriate word, but it seems a bit overkill to me... As it is, this should generate nice slugs in most cases.

Posted in: English, Web, JavaScript
Tags: regular expressions

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

24 comments to “Converting a string to slug with JavaScript”

#01 By Tiago S., 090829 at 08:22
Hi, Thanks for your post. Saved me a lot of time and work that I
would have if I had to convert my similar function from ruby to js.
http://snippets.dzone.com/posts/show/2384 Thanks again for sharing,
Tiago
#02 By David Prek, 091007 at 06:31
Looks great except does not work in IE browsers.
#03 By dense13, 091007 at 09:32
You're right David, I'll have to look into that. The strange thing is, I remember it working in my CMS... I'll try to sort it out soon.
#04 By Shane, 100610 at 01:48
Awesome! Thank you. Saves me having to try get my head around more
regex... ;)
#05 By dense13, 100610 at 11:13
Glad you find it useful Shane. But make sure you test it in IE (see comments 2 and 3). I still haven't gotten around to checking that out.
#06 By dense13, 100702 at 13:39
@David Prek, I managed to fix that, there was a problem with Regular expressions that was triggering an Out of memory message. In fact I don't know why I was using a regular expression in the loop, it's not necessary at all. I must have done it for a reason, but obviously it was not a _good_ reason. :)
#07 By Thomas Lopes, 100917 at 09:16
Hi! Thanks for sharing this snippet. Here goes two little fixed to
it. In the line: str = str.replace(from[i], to[i]);
replace with: str = str.replace(new RegExp(from[i], 'gi'),
to[i]); Since the original one will attempt to replace just
the first occurrence of the given char. Also, with 'gi' modifiers,
don't need to repeat UPPER and lower chars, so you can change:
var from =
"ÀÁÄÂÈÉËÊÌÍÏÎÒÓÖÔÙÚÜÛàáäâèéëêìíïîòóöôùúüûÑñÇç·/_,:;"; with:
var from = "ÀÁÄÃÂÈÉËÊ?ÌÍÏÎ?ÒÓÖÔÕÙÚÜÛ?ÑÇ·/_,:;"; (And his
pair, respectively. Enjoy!
#08 By dense13, 100918 at 12:54
@Thomas Lopes: actually, you can't do that, it will crash IE6/7 (see comment #2 by David Prek). In fact that was my first version of the script, but later got rid of the regular expression (see comment #6, btw thanks to Pat Allan for the fix). Another consideration is that Regexp replacement might be less efficient (although in this case that probably wouldn't be a problem, this is not likely to be a function that's executed repeatedly).

But your comment made think about the function, and I've made it a bit more compact (post has been updated): now it doesn't need to replace upper and lower case characters.
#09 By dense13, 101004 at 18:40
Hey, thanks to myself for writing this, now I can reuse it in my current Rails project. :)
#10 By Paulius, 101123 at 22:30
This script does not replace repetitive non-latin letters, i.e.:
xxxààà wil become xxxa because of str.replace behaviour You should
use RegExp object to replace these strings globally: for (var i=0,
l=from.length ; i
#11 By dense13, 101124 at 09:07
@Paulius: you're absolutely right, gotta fix that. The problem is that using a RegExp crashes IE6 (see comment 6 - it was a good reason after all).
#12 By dense13, 101124 at 09:23
@Paulius: in fact it crashes both IE6 and IE7 (but not IE8). Here's the code I'm using:

slug = slug.replace(new RegExp(from[i], 'g'), to[i]);

Maybe there is a more efficient way to do this? Is this definitely an IE bug?

Btw, sorry for the lack of proper formatting in comments, I know it's annoying. It works well when I post comments though, another thing to look at...
#13 By dense13, 101124 at 12:02
@Paulius: found it! The problem wasn't the RegExp, but the square bracket notation for strings. Now it works, with:

str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i));

Maybe you were already suggesting that, since your comment got cut half-way through.
#14 By DOgi, 110118 at 01:25
Polish language support added: var from = "àáäâèéëêìíïîòóöôùúüûñç·/_,:;??????ó??"; var to = "aaaaeeeeiiiioooouuuunc------aceslnozz";
#15 By dense13, 110118 at 14:03
@DOgi: sorry, the polish characters got all messed up. Feel free to email me (blog AT dense13 DOT com), and I'll fix it.

Edit: Mmh, not easy, I can't just add them here, some of the characters you sent still get ignored (not sure if it's a WordPress or a browser issue). I'll try to sort that out.
#16 By Wiseman, 110809 at 20:13
Thanks a lot! Nice snippet
#17 By Flavia, 110901 at 01:52
Thank you so much for this, I was going mad to achieve exactly this
result in sanitizing a string (but didn't succeed of course...).
You saved my day :) Muchissimas gracias!! Flavia
#18 By Drew, 120913 at 14:29
replace line 8 with: str =
str.split(from.charAt(i)).join(to.charAt(i)); and you've cut down
on most of the regexp. it turns out that it's faster too.
#19 By dense13, 120913 at 15:24
@Drew: thanks! I'll try that when I get a chance.
#20 By Paulo, 130304 at 07:21
I've written a very extensive "slugify.js" that binds directly to the String object within Javascript. It's quite robust because it handles any character, in any language (see the comments in the link below):

https://gist.github.com/demoive/4249710
#21 By pid, 130711 at 23:41
for archive, there is a module available for browser & server
(nodejs,...) https://github.com/pid/speakingurl
#22 By Dimitar Raev, 160622 at 20:47
you can trim front/back dashes at the end: "my fine title!" ->
"my-fine-title-"
#23 By dense13, 160623 at 16:20
@Dimitar Raev: that's a good idea! I'll try to add it to the script, but no promises, doing very different things these days and not sure if I'll find the time. Thanks!
#24 By Hossein, 190222 at 17:50
Great! but it has a problem.
It's not working for Persian or Arabic languages!
Any idea, please?

Converting a string to slug with JavaScript

24 comments to “Converting a string to slug with JavaScript”

Additional content and navigation

Categories

Search the blog

Recent comments