QlikView strings – an adventure in parsing email

Here is the background – I was working on a dashboard on top of a trouble ticket system and we wanted to look at open tickets/cases and determine the last time somebody contacted the customer.

Most of it is straight forward with the exception of the CRM’s ability to have users drag and drop emails into the system, where the contents of the email are saved as text in the notes for whatever trouble ticket it was dropped on.

The problem is that although you know who dropped it in, you don’t know who sent the email, or who it was sent to.

So, having only the raw text of the email, how do you determine if it was an outbound communication to the customer that it was logged against?

(I’ll warn you in advance – this is a long post.)

So – i figure if it came FROM our domain and went TO the customers DOMAIN – then it was outbound. Simple. Sounded like a winner. So the first order of business is to figure out where the different pieces of the email start and more difficult, where they end. Although FROM: is always at the beginning, how do you find the end of the TO: field? It could either be ended by a CC:, a BCC:, or Subject:

And YES, there were a few more twists – but let’s just stick to the basics for now.

So – how do you parse out who it was to? – assuming a table called email_Hist, and field called WORKINGNOTES I did the following:

Left Join (email_Hist)
Replace(Replace(Replace(Replace(Replace(Replace(Replace(WORKINGNOTES, ‘FROM : ‘, ‘|F|’), ‘TO :’, ‘|T|’), ‘CC :’, ‘|C|’), ‘BCC :’, ‘|B|’), Chr(13), ‘~’), Chr(10), ”) , ‘Subject:’, ‘|S|’)
as delimNotes
Resident email_Hist

What this does is change each of the headings of From / To / etc into a standard format delimited by the pipe character, both front and back. The replace operation above can be a bit hard to read only because it is nested through seven replace operations. I also flatten out anything with carriage returns by replace those with tildes. The reason for the pipe characters is to create “book ends” where I can use the TextBetween and not have to worry about what the next field is. This looks like this:

Left Join (email_Hist)
LOAD EventDetailsID,
Replace(TextBetween(dLimNOTES, ‘|F|’, ‘|’), ‘~’, ”)) as fromName,
Replace(TextBetween(dLimNOTES, ‘|T|’, ‘|’), ‘~’, ”) as ToNames
resident email_Hist;

The beauty of this is that now that I have my own delimiters in place – I don’t care what the next field is anymore because I know it starts with a pipe.

Next hurdle – domain match.

In order to use what we have so far to do the outbound domain match you need to have the domain(s) handy. This is turned out to be more complex than I had anticipated because not all company email addresses match the URL of their website. I decided nothing works as well as brute force so for each contact at an account, I stripped the email domain from the address and turned that into a mask to match against.

I started with a raw list of domains, and then turn them into a single string



LOAD ACCOUNTID, ‘*’ & Concat(EDOMAIN, ‘*,*’) & ‘*’ AS DomainMatchList
Resident AccountDomains Group by ACCOUNTID

DROP Table AccountDomains;

Left Join (email_Hist)
Resident DomainList;

DROP Table DomainList;


The idea behind this is that I only care about what comes after the @ sign. I am actually being generous here with my tech support folks because I am saying that if you emailed anybody at the company – I’ll give you credit as an outbound. The DomainList table results in a single entry per account with the DomainMatchList field ready to use with a Wildmatch string operation, since it creates a string that might look like this ‘*domain*’, ‘*domain2*’, ‘*domain3*’
(Yes – I realize I cheated and did some of the string work directly in the SQL, but it was fast and easy to test)

The email_Hist table has a few more fields such as the contact’s last name, email address, alternate contacts last name and email. So taking all the pieces and combining them..

Left Join (TicketActivity)
if(WildMatch(ToNames, ‘*’&ContactLastName&’*’, ‘*’&ContactEmail&’*’, ‘*’&AltContactLastName&’*’, ‘*’&AltContactEmail&’*’)>0 OR WildMatch(ToNames, DomainMatchList) > 0, ‘Yes’, ‘No’) as ToCustomer,
if(WildMatch(fromName, ‘*@ourdomain.com*’, ‘*ourAlternateDomain*’) > 0, ‘Yes’, ‘No’) as isFromUs,
if(WildMatch(fromName, ‘*’&UserLastName&’*’) > 0, ‘Yes’, ‘No’) as isFromUser
Resident email_Hist;

There is some more processing that goes on and then email_Hist gets dropped.

The logic is very generous for giving credit – i.e. if you sent it to the company and/or either of the ticket/case contacts – it counts.

A few notes before the criticism starts.

1. This processing is limited to Open tickets, which cuts down on the data volumes dramatically.

2. The code shown here is optimized for clarity, not performance.

3. I am using wildmatch() > 0 here really just to detect whether what/who i am looking for is in the mix. TO: addresses often have numerous recipients, and I decided NOT to separate those out and look for exact matches. You could certainly use this as the starting point to make it more restrictive.

Hope you find this useful