Sometimes programming is difficult - You have to make large architectural decisions, with overreaching implications on maintainability, performance and security. Other times, programming is simply frustrating - Something 'simple' doesn't work the way you expected, you're seriously testing your logic and finally you bring out the paper and pen for some good-old pseudo code!
The problem...
I've included some source code showing 3 different approaches:The problem...
I was doing some pro-bono processing work on an excel file, removing duplicate rows based on some complicated criteria. Preferring to keep my sanity I abandoned VBA so I exported the spreadsheet to CSV. The crunch is, some of the cells were blank and this produced unpredictable results in my code. Here's a sample of the data:
- ,,dsafdf,,15,,,,
- ,,fdsfjladsjf,,13,,,,
- df,,sdff,,bemail,1,1,,
- dsf,,sffdsf,,bsgemail,1,1,,
- token1,token2,token3,token4,token5,token6,token7,token8,token9
I've added the fifth row for demonstration of the number of tokens. In a row that is complete, we are expecting 9 tokens. Their position is important!
Source Code
Source Code
- import java.util.StringTokenizer;
-
-
- public class Processor {
-
- public static void main (String[] args) {
-
- final String _DELIM = ",";
-
- //An array of the strings
- String[] sourceStrings = {
- ",,dsafdf,,15,,,,",
- ",,fdsfjladsjf,,13,,,,",
- "df,,sdff,,bemail,1,1,,",
- "dsf,,sffdsf,,bsgemail,1,1,,",
- "token1,token2,token3,token4,token5,token6,token7,token8,token9"
- };
-
- System.out.println("Approach One: String.split()");
- for(String s: sourceStrings)
- {
- //do the processing on the strings using
- //the delimiter as a regular expression;
- String[] split = s.split(_DELIM);
-
- System.out.println("Number of tokens : " + split.length);
- }
-
- System.out.println("Approach Two: StringTokenizer");
- for(String s: sourceStrings)
- {
- StringTokenizer st = new StringTokenizer(s, _DELIM, false);
-
- System.out.println("Number of tokens : " + st.countTokens());
- }
-
- System.out.println("\r\nApproach Three: StringTokenizer returning delimiter - Take One");
- for(String s: sourceStrings)
- {
-
- StringTokenizer st = new StringTokenizer(s, _DELIM, true);
-
- System.out.println("Number of tokens : " + st.countTokens());
- }
- }
- }
Output
A listing of the output of the code above is shown below:
- Approach One: String.split()
- Number of tokens : 5
- Number of tokens : 5
- Number of tokens : 7
- Number of tokens : 7
- Number of tokens : 9
-
- Approach Two: StringTokenizer
- Number of tokens : 2
- Number of tokens : 2
- Number of tokens : 5
- Number of tokens : 5
- Number of tokens : 9
-
- Approach Three: StringTokenizer returning delimiter - Take One
- Number of tokens : 10
- Number of tokens : 10
- Number of tokens : 13
- Number of tokens : 13
- Number of tokens : 17
Approach One: String.split()
The first approach uses the String.split() method passing in the delimiter as a regular expression. As can be seen, this fails the test as we have ranges of 5-9 tokens. The correct answer should be 9.
Approach Two: StringTokenizer
The second approach uses the StringTokenizer to split up the strings. Again, we get unpredictable results. 2 tokens for the first two lines, as it picks only the items that are not the delimiter. The Correct answer here would be 9 and for the items that are null, we should get a null token.
Approach Three: StringTokenizer returning the delimiter
The third approach uses the StringTokenizer with a different constructor that also returns the delimiter. Again, lots of wrong results. The correct number of tokens should be 17 - 8 commas and 9 non comma items, preferrably null.
So there you have it, a rather simple task (I would have thought), but I did bang my head against the wall trying to find the right solution. If I ever get to be a teacher, this would be great for an exam! There's the challenge! I'll post the solution soon.
The first approach uses the String.split() method passing in the delimiter as a regular expression. As can be seen, this fails the test as we have ranges of 5-9 tokens. The correct answer should be 9.
Approach Two: StringTokenizer
The second approach uses the StringTokenizer to split up the strings. Again, we get unpredictable results. 2 tokens for the first two lines, as it picks only the items that are not the delimiter. The Correct answer here would be 9 and for the items that are null, we should get a null token.
Approach Three: StringTokenizer returning the delimiter
The third approach uses the StringTokenizer with a different constructor that also returns the delimiter. Again, lots of wrong results. The correct number of tokens should be 17 - 8 commas and 9 non comma items, preferrably null.
So there you have it, a rather simple task (I would have thought), but I did bang my head against the wall trying to find the right solution. If I ever get to be a teacher, this would be great for an exam! There's the challenge! I'll post the solution soon.
UPDATE!
- public class NonCollapsingStringTokenizer {
- private String str;
- private String delim;
- private int currentPosition;
- private NonCollapsingStringTokenizer() {
- }
- public NonCollapsingStringTokenizer(String str, String delimStr) {
- this.str = str;
- this.delim = delimStr;
- }
- public String nextToken() {
- int nextDelimPosition = str.length();
- int delimPosition = str.indexOf(delim, currentPosition);
- if (delimPosition >= 0 && delimPosition < nextDelimPosition) {
- nextDelimPosition = delimPosition;
- }
- String token = str.substring(currentPosition, nextDelimPosition);
- currentPosition = nextDelimPosition + 1;
- return token;
- }
- public boolean hasMoreTokens() {
- return (currentPosition < str.length());
- }
- public int countTokens() {
- int count = 0;
- NonCollapsingStringTokenizer clone = (NonCollapsingStringTokenizer) this.clone();
- while(clone.hasMoreTokens())
- {
- clone.nextToken();
- count++;
- }
- return count;
- }
- public Object clone() {
- NonCollapsingStringTokenizer copy = new NonCollapsingStringTokenizer();
- copy.str = str;
- copy.delim = delim;
- return currentPosition;
- }
- }
The class above handles the challenge accordingly. This can be tested with the code below:
- public static void main(String[] args) {
- String s = ",,dsafdf,,15,,,,";
- final String _DELIM = ",";
- //s = "token1,token2,token3,token4,token5,token6,token7,token8,token9";
- s = ",,sffdsf,,bsgemail,1,1,5,dsf";
- NonCollapsingStringTokenizer ncst = new NonCollapsingStringTokenizer(s, _DELIM);
- int x = 1;
- while(ncst.hasMoreTokens()) {
- String nextToken = ncst.nextToken();
- System.out.println(Integer.toString(x++) + " " + nextToken);
- }
- }
This produces the listing below:
- 1
- 2
- 3 sffdsf
- 4
- 5 bsgemail
- 6 1
- 7 1
- 8 5
- 9 dsf