Home » Java » Regular Expression for Parsing Hashtags in Java

Regular Expression for Parsing Hashtags in Java

Posted by: admin December 28, 2021 Leave a comment

Questions:

Here is the Twitter message I am trying to parse (as you can see, some of these are not tags, just part of URLs):

#anothertag Arrogance and bad PR http://www.adobe.com/index.html#anchor1. John 
Nack on &#Adobe: Information about Photoshop© CS3 on Snow Leopard 
#fail #design</pre>

This regular expression is what I have so far, but it still picks up some the url tags:

[##]+([A-Za-z0-9-_]+)
Answers:

Isn’t it ironic, as soon as I post this I find an answer. So if you are looking for a matching pattern to do this, the following seems to work:

(?:\s|\A)[##]+([A-Za-z0-9-_]+)

I am going to do a lot more testing with this to see if there are any edge cases that are outside the scope of this expression and will report back if I find any.

###

I think this

(\s|\A)#(\w+)

works as well and is a little more precise.
BTW, when working with Java regular expressions I always use regexplanet to test my expressions. Much faster than testing in Java.

–Hardy

###

@Daniel Dura: [##] this is supposed to be pointless. The brackets [] are used to denote character classes. [##] is same as [#]. And [##]+ is same as [#]+ .