Fork me on GitHub

article

Regex

January 27, 2005 | PHP Scripting

Regular expressions have always been (and will most likely continue to be) a challenge to construct. This site is using a system for parsing bb code in the posts that I swore to generate myself and am only permitting myself to seek help on it when I’m absolutely stuck. I guess that’s why it is taking so damn long.

But there was a breakthrough of sorts yesterday. I got the URL parsing to work correctly with the help of Google. The only bad part, and partly embarrassing, is that I’m not completely sure how it works. I went in search of help and found a few examples of people doing similar things and pretty much read their code to see what they were doing. Then I came across something that ended up helping. So, now, my URL parsing looks like this:

  1.  <?php
  2.  // replace URLs
  3.  $text = eregi_replace("\\[url=([^\\[]*)\\]([^\\[]*)\\[/url\\]","<a href=\"\\1\">\\2</a>", $text);
  4.  ?>

It works beautifully, and making a variant to insert off site links that open in a new window was terribly easy. What I’m not sure of is the use of the double escape characters. So, I’m gonna have to break out the Regex manual and figure out just why it works.

Other parts of the regex, like replacing bold, italic, and bold italic text and line breaks with the proper formatting, were easy.

  1.  <?php
  2.  // replace <br />, <b>,</b>,<i> & </i> tags
  3.  $text = eregi_replace("\[(/?i|/?b)\]", "<\1>", $text);
  4.  $text = eregi_replace("\[br\]", "<br />", $text);
  5.  ?>

(yes, I’m still using <b> and <i> tags – get over it!)

I’m digging Regex as a tool but man is it one hell of a brain buster. Every time I write some Regex and think I’m getting a hold of it I end up not having to do it for a while and then feel like I have to start all over again when I have to do something else complex.

Don’t get me wrong, I am proud of what I’ve been able to learn. Sometimes I wonder if I am capable of learning new things and this proves that this old dog can still hunt. This entry is a prime example – that I was able to figure that out makes me feel good about myself and that my brain is still capable of grasping complex concepts – even if that complex concept is just an unordered list.

Well, now that I’ve looked at the entry above I realize that I haven’t formatted blockquotes since I redesigned this sucker, so I’d better get on that before anyone actually reads this…

9 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  • Seems I need to do a little bit of addslashes on my posts…

    shawn, January 27, 2005 6:39 am | permalink

  • fwhew! I thought it was only newbs like me who have a brain warp on occassion.

    shawna, January 27, 2005 2:36 pm | permalink

  • I’m always in a brain warp… achk thpt!

    shawn, January 28, 2005 12:06 am | permalink

  • If I ever get around to writing a cms for imamis, I’m just going to leave the whole BBCode thing alone and just make it ignore html besides , , , , and , and possibly a few more. I think there’s a PHP function that does that… Anyway, it’ll make it easier to code. Or is there something wrong with letting people use HTML…

    Max(, January 28, 2005 4:59 am | permalink

  • Its not so much that is it bad to use html, its more that for future expansion bb code is more accommodating. If standards change or a new technique arises then you can make a change to the bb parser to reflect the change. Other wise if you allow for html then you’ve got a lot of information that you either have to ignore or that you have to change.

    I feel that its just safer to use bb code.

    But thats just me.

    shawn, January 28, 2005 6:35 am | permalink

  • And as you can tell – I’m outright stripping html code from comments. I might play with converting them to display characters, but for now I think stripping is just fine.

    shawn, January 28, 2005 6:36 am | permalink

  • (yes, I’m still using and tags – get over it!)

    Was that little remark directed at me? 😛

    Anyhoo. This page has some great, relevant info. The gist of it is that you need to escape your backslashes. I confess to having limited tangible experience with regexes in php, but well, it seems to makes some sense to me.

    Here’s a snip:

    In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters will be processed by the compiler, before the regex library sees the string. So the regex 1+1=2 must be written as "1\+1=2" in C++ code. The C++ compiler will turn the escaped backslash in the source code into a single backslash in the string that is passed on to the regex library. To match c:temp, you need to use the regex c:\temp. As a string in C++ source code, this regex becomes "c:\\temp". Four backslashes to match a single one indeed.

    jp, January 29, 2005 11:37 pm | permalink

  • yeah… you do have something of a slashing issue with the entries.

    jp, January 28, 2005 7:29 am | permalink

  • Actually – the escaping issue was fixed right after I posted that… those regex statements are how they are supposed to look and how they work. Yep, looked odd to me too when I first tried it but what you see now the correct code.

    I guess I should also put a legend of acceptable code to be used in comments. b, i, code, img, url, offsiteurl (opens in new window), list and quote are all acceptable. I am actually rather proud of the list function – haven’t tested it with embedded lists but it works extremely well for single level lists.

    And that all html code gets stripped (I’m eventually gonna just convert it to html entities but I haven’t had enough time to implement it yet.

    ———-

    Back to Regex – I seem to be getting better at Regex. But I still seem to have problems.

    The other day I was working on some .htaccess stuff and got two rules that would work, but not together. Odd stuff. I’m convinced Regex is akin to mysticism and that I am out of favor with the gods.

    shawn, January 28, 2005 11:20 am | permalink

Comments are closed