Tuesday, March 28, 2017

An indexOfSubList(...) to Find all Matching SubLists

Quick Tip

Java comes with 2 useful methods for finding a sublist in a list: Collections.indexOfSubList(List<?> source, List<?> target) Collections.lastIndexOfSubList(List<?> source, List<?> target).

These are useful methods, however, with these 2 methods I can only find the first matching sublist and the last matching sublist. What about all the other sublists in between? What if there are 3 matching sublists, or 10, or 100? Here is a quick example (Listing 1) of an new, overloaded indexOfSubList(List<?> source, List<?> target, int fromIndex) that has an extra parameter, int fromIndex. This extra parameter gives you the ability to go through and find every matching sublist.

NOTE This code comes directly from the source code of the existing Collections.indexOfSubList(List<?> source, List<?> target>) method.

Listing 1 - Find all Matching SubLists

public static int indexOfSubList(List<?> source, List<?> target, int fromIndex) {
    int sourceSize = source.size();
    int targetSize = target.size();
    int maxCandidate = sourceSize - targetSize;

    ListIterator<?> si = source.listIterator();
    if (fromIndex > 0) {
        for (int i=0; i<fromIndex; i++) {
            si.next();
        }
    }
    nextCand:
        for (int candidate = fromIndex; candidate <= maxCandidate; candidate++) {
            ListIterator<?> ti = target.listIterator();
            for (int i=0; i<targetSize; i++) {
                if (!eq(ti.next(), si.next())) {
                    // Back up source iterator to next candidate
                    for (int j=0; j<i; j++)
                        si.previous();
                    continue nextCand;
                }
            }
            return candidate;
        }

    return -1;  // No candidate matched the target
}

private static boolean eq(Object o1, Object o2) {
    return o1==null ? o2==null : o1.equals(o2);
}

Monday, March 13, 2017

Regex Match HTML/XML with Laziness to get Tag Contents

Abstract

Regular expressions are extremely powerful. Figuring out how to get them to match what you want though can be a challenge. One of the tougher matches is with HTML/XML content. Often you get more matched than you want; that’s because you are being greedy. Be lazy! You’ll get a better match.

Disclaimer

This post is solely informative. Critically think before using any information presented. Learn from it but ultimately make your own decisions at your own risk.

Problem

Suppose you have the following bit of HTML.

<p> this is a <span>very</span> <b>cool</b> regex tip </p>

You want to use a capturing group to get the contents of the <span> tag. So you put together a regular expression that looks like this:

<span>(.+)<

But unfortunately this doesn’t get you the contents of the tag. The regular expression is to greedy and matches too much of the string. The regular expression matches all the way to the start of the closing paragraph tag.

<p> this is a <span>very</span> <b>cool</b> regex tip </p>

So what’s the problem here? The problem is the regular expression is being too greedy. Let’s make it less greedy and a bit more lazy.

Solution

The solution is to put together a regular expression that is a bit more lazy. This more lazy regular expression will stop matching once it hits the first new opening tag instead of matching to the last opening tag. Here is a more lazy regular expression.

<span>(.+?)<

Now this will match to the start of the closing </span> tag like you might expect it to. Plus, now that the matching is working more as expected, the capturing group can easily get the contents of the tag. Here is how the regular expression matches now.

<p> this is a <span>very</span> <b>cool</b> regex tip </p>

Summary

That’s it. Be a little more lazy and a little less greedy. I hope this has helped you a little bit figuring out your regular expression matching problem.

References

Goyvaerts, J. (2016, December 08). Laziness Instead of Greedinesss. Regular-Expressions.info. Retrieved from http://www.regular-expressions.info/repeat.htmlhttp://www.regular-expressions.info/repeat.html.

Java NIO, Files & Paths - Single Statement to Read File as a String

Quick Tip

Here is a quick example (Listing 1) of a single Java statement to read the contents of a file into a single String instance.

NOTE Don’t forget to specify the charset! It’s essential when working with text data.

Listing 1 - Single Statement File to String

String content = new String(
  Files.readAllBytes(
    Paths.get("File_To_Read.txt")
  )
  ,Charset.forName("UTF-8")
);