Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 04 2014

User ID linking and privacy

If you dive into the Piwik tracking code, you'll probably notice this snippet: _paq.push(['setUserId', '32952541e7c65511b49']);

yes, this is (if you are logged in) your obfuscated user id. Why obfuscated? After all, we're using the real user id in javascript anyway! Well, this is just so that we, when we look at analytics, don't accidentally "see" users we know. Of course we could reverse the encoding (by bruteforce encoding all user ids the same way and then look up the hash), but this is basically what we can do in any case. What any maintainer of any similar social media/blog platform can do. So, it's kinda sorta assumed that you trust us, and the only reason why this is obfuscated is, that we can't accidentally abuse this trust.

Ok, "But why do you link my visit to my user id at all?!" you ask. Weeeell. Because we don't know how many actual users we have. Yes, seriously. In a world with spammers, signup numbers and visit stats mean exactly nothing.

Also, it is incredibly hard and borders on divination to find out how actual users use soup without this seperation. The same goes for questions suchs as "How many devices does a Soup user surf Soup on?" and similar.

Ironically, we could have all that information in the backend (so we wouldn't be required to use a tool such as piwik), but we're not collecting it. Not because it is technically impossible to do it - but because we do not wish to design our database and user data in a fashion that would make it possible.

Yeah, two souls and all that...

Reposted bymushubrightbyteElbenfreund

The basic set of data we're collecting in-house

As mentioned in another post, we're now making use of Piwik for in-depth user and blog related data. Currently, the implementation mainly just collects interesting data, and we're still working on a good way to segment it (i.e. write a Piwik plugin).

Anyhow, here it goes:

Visitor related

  • Login status of the user - all following flags apply to a logged in visitor (= Soup user) only
  • Blog privacy - the privacy status the visitor configured for their blog
  • NSFW toggle - this pertains to an upcoming release that let's the visitor toggle if they want to see NSFW material in /everyone, /friends etc.
  • Exports - tells us which exports the visitor has configured (currently this can only be facebook)
  • Reported someone - did the visitor report posts for anything, like spam. This may pertain to the visitors engagement level.
  • Email - did the visitor supply an email with their registration?
  • Which imports did the visitor configure?
  • Did the visitor connect their account to facebook, either via export or signup?
  • How long has the visitor had his account with Soup, in days
  • Which pool does the visitor belong to? Currently there is only A, which are all members of @testkitchen, and B, which is the default for everyone. We may use this to do split-testing in the future.
  • Is the visitor using an adblocker?
  • How many feeds is the visitor importing to their blog?
  • How many original (non-imported) posts does the visitor have on their blog?
  • Days since the last original post of the visitor
  • Number of groups the visitor is member of
So why are we collecting this data? Because we don't know. We have no idea what's relevant. Are users with an email address more likely to come back? Or facebook users? Do we convert people coming from facebook? Does a NSFW feature make people use Soup more at work? (chichichichi...) And in general, what weird patterns that should be investigated can be seen?

FIXME: page attributes description

You can also have an up-to-date look into our internal analytics spreadsheet slash scratchpad where we keep track of what we're keeping track of. This is what we ourselves are referring to when building new queries or reports, so it's pretty definitive.

You can use it to decipher the javascript section in every page that looks like this:

_paq.push(['setCustomVariable', 2, 'rd', '2674', 'visit']); _paq.push(['setCustomVariable', 3, 'fr', '170', 'visit']); _paq.push(['setCustomVariable', 4, 'fe', '6', 'visit']); _paq.push(['setCustomVariable', 5, 'p', '6767', 'visit']); _paq.push(['setCustomVariable', 6, 'o', '0', 'visit']); _paq.push(['setCustomVariable', 7, 'fo', '215', 'visit']); _paq.push(['setCustomVariable', 8, 'g', '33', 'visit']); _paq.push(['setCustomVariable', 1, "u", "-l-B-i-w-exfb-x-fb-e-ob-om-p-898-8af-8b1-119-bf1-249-744-fl-" + window.SOUP_test_ab, "visit"]); _paq.push(['setCustomVariable', 1, "v", "-o-ga-mp-", "page"]); _paq.push(['setCustomVariable', 2, 'rd', '4', 'page']); _paq.push(['setCustomVariable', 3, 'fr', '0', 'page']); _paq.push(['setCustomVariable', 4, 'fe', '0', 'page']); _paq.push(['setCustomVariable', 5, 'p', '0', 'page']); _paq.push(['setCustomVariable', 6, 'o', '-1', 'page']); _paq.push(['setCustomVariable', 7, 'm', '1', 'page']); _paq.push(['setCustomVariable', 8, 'fo', '0', 'page']); _paq.push(['setCustomVariable', 9, 'g', '0', 'page']);

That is what a visit to whatweknow.soup.io generates for me, which translates to:

"Visitor has been registered on Soup for 2674 days, has (follows) 170 friends, has 215 followers, imports six feeds, created 6767 original posts (including deleted ones), 0 days have gone by without creating original content, and he is member of 33 groups. Additionally, we know that he is logged in, in pool B, has configured imports, some of which work, exports posts to facebook, exports posts, has a facebook user, supplied an email address, has created bookmarklet posts, has created mobile posts, has created original posts, has a few (specific, I'm not gonna look them up now) imports configured, and reported other users for something. Also, let us know if the browser uses an ad blocker."

"The page he is visiting is his own, it's a group and he has group admin privileges, and the group moderation policy is public (crap, I need to change that asap). The group has existed for four days, has zero friends, zero feeds, zero posts (I have written a few since I opened the page, also those statistics only get regenerated every 24 hours), -1 days since the last original post (=never made an original post), the group has one member, no followers and the owner is not member of any other group."

Reposted bydanielbohrer danielbohrer

What we're using to collect data about you

We are using three tools:

  • quantcast, which we initially began using as a replacement for google analytics
  • google analytics, which we wanted to get rid off, but that effort has been on hold since we came to understand that everything and everyone who does analytics wants to use GA, and historical data would be a good thing to have, just in case
  • piwik, which is a free/libre analyticcs platform that aims to replace google analytics. It's quite mature and we're using it to do analytics on user data - something we do not want to delegate to a 3rd party such as google. Also, piwik is very flexible and can be extended in ways that's simply not possible with google analytics. We host piwik ourselves.
Additionally, ghostery (which I highly recommend to any privacy concerned netizen) will report "Facebook Connect" if you connected with facebook or are using facebook export, and "Doubleclick", which seems to be related to the google analytics integration. The author has the best intentions of getting rid of the google stuff eventually, but can not promise anything, because $business reasons.
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!