It’s all in the numbers

About a year ago I was hired by a word-of-mouth marketing agency to work on a social media pilot project for a client. This agency had typically done event marketing, and this project was quite different. It was, in effect, a social media research project. This was about using social media in a fundamentally different way. It wasn’t about “brand monitoring” or using social media as some kind of new broadcast medium. This was about using social media to develop real, accurate insights and have an affect on how a company thinks about their industry, their consumer, and their marketing. This was, without a doubt, one of the most interesting projects I’ve ever worked on.
The client asked us how they should be allocating brand resources and budgets to create the most effective word-of-moth campaigns. They wanted us to find new opportunities for them. Today, almost any effective word of mouth campaign will reach the Internet, and so it stands to make sense that one could make recommendations based on learning about the larger marketplace, where the brand fits within it, and how the consumer interacts with both the brand and the market.
The project consisted of a four key elements:
- A survey and audience segmentation.
- Scraping tons of data from different social media outlets.
- Regression analysis involving social media and sales data.
- Alignment of brand messaging with particular retail locations.
I’m going to talk a little bit about each part of the project, and then talk about the outputs and results. Due to contractual obligations, I can’t mention clients or agencies by name… although a truly clever person (with way to much time on their hands) could piece it all together.

1. Survey and audience segmentation
Advertising agencies traditionally learn about their audience by using third-party research firms or hosting small focus groups. In this case, the client needed learn about an area that was far to large to develop any kind of actionable insights through traditional methodology (the entire State of California, and its nearly 25 million persons over the age of 21). We also had a very small budget.
Instead of using a third-party firm to field surveys or fly to all the major cities in California and host focus groups, we did the entire audience segmentation through a combination of Facebook and Craigslist. Not exactly the most random sample, but it turned out to be much better than expected. We ended up hyper-targeting six major cities within the State, and the demographic information we got back from our survey was very close, proportionally, to what the U.S. Census reports for each of those cities. There were some problems with under-representation from certain demographics (the Hispanic community in particular), but not in all cities. Cities like San Francisco, Los Angeles and San Deigo were almost perfectly represented. There were some problems with cities like Fresno, but that was to be expected, and we kept that top of mind throughout the rest of the project.
About 85% of the responses to our survey came from Facebook. We were able to get over 2,000 responses, with very little incentive. And we did it for less than half of what we had originally budgeted the survey for. How?
I spent about a week copy testing ads to find the most effective copy and creative. I was able to get the average CPC insanely low, despite the fairly modest incentive—a chance to win an iPod. Facebook also allowed me to cut my target into little chunks by age and gender. This is what allowed us to do such an accurate survey. I calculated out how many men age 21-24 I need to take the survey to proportionately represent the population, and ran that survey against them until I hit my quota. Then I filled that quote for women. Then I did it again for the next age chunk.
At first I thought this methodology would give me extremely skewed results, but I started looking at the responses as they came in and everything was matching up. I had the right percentage of people who self-identified as Asian. I had almost the exact amount of people in the $35-$50k income bracket in Los Angeles that I would expect to have. I had the right percentage of married people.
Once our survey was complete, it was very easy to look at people and put them into buckets. We actually went about it a very simple way—segmentation by level of education, going off of an insight we had gathered that people who go out drinking together tend to drink with individuals of a similar education, and thus, income level.

2. Scraping data from different social media outlets
There are dozen of “social media monitoring” solutions out there, but they aren’t a perfect science. At best, the good ones return data sets that you can use to create moderately interesting pie charts. You can learn about how the Internet perceives your brand, but you can’t use them to learn about the larger market. We needed something a little more custom, and a lot more structured, so we built it ourselves. We weren’t interested in a single brand. What we needed was a complete database of retail locations. In our case, a database of bars, restaurants, and venues.
We created a database of over 1,000 on-premise locations across California. We pulled from APIs like Yelp and Yahoo Upcoming to get an idea of ratings, reviews, and the number of events organized at different locations. We pulled all other other description, address, location, and contract information. Then we went to Google and started pulling the number of search results that each location returned, to get a better sense of how well they index and how many people might be talking them.
But we weren’t done. We needed to know which locations had a license to serve liquor, so we had merge that database (which was available) with our own. Then we decided to that with the clients last two years of sales data for each location. Now we had a real data set.

3. Regression analysis
Now that we had a real data set, we could do some real analysis. We were able to use regression analysis to determine what kind of impact social media has on sales. We found that certain variables are very well correlated with sales, and others not so much. The total number of reviews on Yelp, for example, doesn’t really matter. The rating, however, does.
The reason that we actually needed to run a regression was because we wanted to do three things:
- Find a way to weight different social media indicators so that we could create a composite score in order to rank each venue.
- Be able to justify why the client needs to seriously look at certain highly social locations where they have no distribution.
- Take a good look at our outliers and figure out what’s going on.
Once this was complete, we were able to create a composite score for each venue, rank them, and then take a deeper dive into the top 100.

4. Alignment of brand messaging
This was actually the most precarious part of the entire project. We wanted to align each of the top venues with a particular person and a particular brand. We had developed our own personas as a result of the segmentation we had done, and used them in this alignment because we actually had an idea of who the consumer and what their preferences were.
However, the client did not like our personas. They had developed personas of their own that were rooted in emotional attachment to the brand and how their brand is represented as a lifestyle product. Our personas, on the other hand, were rooted in our survey. We would make statements like, “Your consumer is twice as likely to smoke as the average California male.” The client would rebuttal with a, “I don’t know about that. We’ve developed our own personas, and our brand isn’t really aligned with smoking. Smoking is disgusting.”
The real problem is that the client didn’t like the consumer that they actually had. The personas they had developed had more to do with aligning their brand image with a certain kind of person than it did with actually understanding who it is that’s out there consumer their products.
But I digress. This was a minor hiccup in communication in an otherwise good relationship. Hopefully the client will be more willing to take another look at the data and analysis we provided them, and reconsider it.
The rest of the brand messaging alignment consisted of us looking at the top locations in our database by hand, and doing a qualitative assessment of them to determine how well a particular location might fit for a particular brand-sponsored word-of-mouth program. We used Google Docs to crowd source this portion of the project, which allowed us to complete this portion in less than a day, when it otherwise might have taken a week.
In terms of specific deliverables we provided the client three things:
- A 120-page book showcasing the top bars and venues by social media composite, in six major markets in California, complete with actual consumer reviews and sentiment.
- A sortable database with over 1,000 locations and a ton of query-able information attached to each location.
- An KML file which allowed the client to look at the database overlayed onto a real-world landscape using Google Maps or Google Earth.
I thought this was a remarkable project and an interesting way to use social media beyond just feeding an RSS feed into a Twitter stream. This is the kind of project I’d love work on again. In fact, I’d love to build a custom, real-time engine that can track and analyze these kinds of market trends. It be incredible—for a beer, wine or spirits brand—to actually be able to see what people are saying about programs as they are happening, and better measure just how effective (or ineffective) word-of-mouth programs can be.