Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 08 2014

November 04 2014

User ID linking and privacy

If you dive into the Piwik tracking code, you'll probably notice this snippet: _paq.push(['setUserId', '32952541e7c65511b49']);

yes, this is (if you are logged in) your obfuscated user id. Why obfuscated? After all, we're using the real user id in javascript anyway! Well, this is just so that we, when we look at analytics, don't accidentally "see" users we know. Of course we could reverse the encoding (by bruteforce encoding all user ids the same way and then look up the hash), but this is basically what we can do in any case. What any maintainer of any similar social media/blog platform can do. So, it's kinda sorta assumed that you trust us, and the only reason why this is obfuscated is, that we can't accidentally abuse this trust.

Ok, "But why do you link my visit to my user id at all?!" you ask. Weeeell. Because we don't know how many actual users we have. Yes, seriously. In a world with spammers, signup numbers and visit stats mean exactly nothing.

Also, it is incredibly hard and borders on divination to find out how actual users use soup without this seperation. The same goes for questions suchs as "How many devices does a Soup user surf Soup on?" and similar.

Ironically, we could have all that information in the backend (so we wouldn't be required to use a tool such as piwik), but we're not collecting it. Not because it is technically impossible to do it - but because we do not wish to design our database and user data in a fashion that would make it possible.

Yeah, two souls and all that...

Reposted bymushubrightbyteElbenfreund

The basic set of data we're collecting in-house

As mentioned in another post, we're now making use of Piwik for in-depth user and blog related data. Currently, the implementation mainly just collects interesting data, and we're still working on a good way to segment it (i.e. write a Piwik plugin).

Anyhow, here it goes:

Visitor related

  • Login status of the user - all following flags apply to a logged in visitor (= Soup user) only
  • Blog privacy - the privacy status the visitor configured for their blog
  • NSFW toggle - this pertains to an upcoming release that let's the visitor toggle if they want to see NSFW material in /everyone, /friends etc.
  • Exports - tells us which exports the visitor has configured (currently this can only be facebook)
  • Reported someone - did the visitor report posts for anything, like spam. This may pertain to the visitors engagement level.
  • Email - did the visitor supply an email with their registration?
  • Which imports did the visitor configure?
  • Did the visitor connect their account to facebook, either via export or signup?
  • How long has the visitor had his account with Soup, in days
  • Which pool does the visitor belong to? Currently there is only A, which are all members of @testkitchen, and B, which is the default for everyone. We may use this to do split-testing in the future.
  • Is the visitor using an adblocker?
  • How many feeds is the visitor importing to their blog?
  • How many original (non-imported) posts does the visitor have on their blog?
  • Days since the last original post of the visitor
  • Number of groups the visitor is member of
So why are we collecting this data? Because we don't know. We have no idea what's relevant. Are users with an email address more likely to come back? Or facebook users? Do we convert people coming from facebook? Does a NSFW feature make people use Soup more at work? (chichichichi...) And in general, what weird patterns that should be investigated can be seen?

FIXME: page attributes description

You can also have an up-to-date look into our internal analytics spreadsheet slash scratchpad where we keep track of what we're keeping track of. This is what we ourselves are referring to when building new queries or reports, so it's pretty definitive.

You can use it to decipher the javascript section in every page that looks like this:

_paq.push(['setCustomVariable', 2, 'rd', '2674', 'visit']); _paq.push(['setCustomVariable', 3, 'fr', '170', 'visit']); _paq.push(['setCustomVariable', 4, 'fe', '6', 'visit']); _paq.push(['setCustomVariable', 5, 'p', '6767', 'visit']); _paq.push(['setCustomVariable', 6, 'o', '0', 'visit']); _paq.push(['setCustomVariable', 7, 'fo', '215', 'visit']); _paq.push(['setCustomVariable', 8, 'g', '33', 'visit']); _paq.push(['setCustomVariable', 1, "u", "-l-B-i-w-exfb-x-fb-e-ob-om-p-898-8af-8b1-119-bf1-249-744-fl-" + window.SOUP_test_ab, "visit"]); _paq.push(['setCustomVariable', 1, "v", "-o-ga-mp-", "page"]); _paq.push(['setCustomVariable', 2, 'rd', '4', 'page']); _paq.push(['setCustomVariable', 3, 'fr', '0', 'page']); _paq.push(['setCustomVariable', 4, 'fe', '0', 'page']); _paq.push(['setCustomVariable', 5, 'p', '0', 'page']); _paq.push(['setCustomVariable', 6, 'o', '-1', 'page']); _paq.push(['setCustomVariable', 7, 'm', '1', 'page']); _paq.push(['setCustomVariable', 8, 'fo', '0', 'page']); _paq.push(['setCustomVariable', 9, 'g', '0', 'page']);

That is what a visit to whatweknow.soup.io generates for me, which translates to:

"Visitor has been registered on Soup for 2674 days, has (follows) 170 friends, has 215 followers, imports six feeds, created 6767 original posts (including deleted ones), 0 days have gone by without creating original content, and he is member of 33 groups. Additionally, we know that he is logged in, in pool B, has configured imports, some of which work, exports posts to facebook, exports posts, has a facebook user, supplied an email address, has created bookmarklet posts, has created mobile posts, has created original posts, has a few (specific, I'm not gonna look them up now) imports configured, and reported other users for something. Also, let us know if the browser uses an ad blocker."

"The page he is visiting is his own, it's a group and he has group admin privileges, and the group moderation policy is public (crap, I need to change that asap). The group has existed for four days, has zero friends, zero feeds, zero posts (I have written a few since I opened the page, also those statistics only get regenerated every 24 hours), -1 days since the last original post (=never made an original post), the group has one member, no followers and the owner is not member of any other group."

Reposted bydanielbohrer danielbohrer

What we're using to collect data about you

We are using three tools:

  • quantcast, which we initially began using as a replacement for google analytics
  • google analytics, which we wanted to get rid off, but that effort has been on hold since we came to understand that everything and everyone who does analytics wants to use GA, and historical data would be a good thing to have, just in case
  • piwik, which is a free/libre analyticcs platform that aims to replace google analytics. It's quite mature and we're using it to do analytics on user data - something we do not want to delegate to a 3rd party such as google. Also, piwik is very flexible and can be extended in ways that's simply not possible with google analytics. We host piwik ourselves.
Additionally, ghostery (which I highly recommend to any privacy concerned netizen) will report "Facebook Connect" if you connected with facebook or are using facebook export, and "Doubleclick", which seems to be related to the google analytics integration. The author has the best intentions of getting rid of the google stuff eventually, but can not promise anything, because $business reasons.

About tracking and privacy

We've been using quantcast and google analytics for quite some time (I think since the birth of Soup?) to collect minimal meaningful data about the visitors, where they come from and their approximate demographics.

We've not even been using the extended feature sets of those platforms, because that would've meant putting user data into the hands of a third party.

But what do we know from these solutions? Yes, we have a lot of polish users. And we're very proud of being in the top 30 something polish websites - at least according to quantcast - despite not understanding how we got there.

And that's kinda sorta the point - insight how we got anywhere. And this means metrics.

Now, metrics is a very loaded topic. Everyone does it, but nobody talks about it, except in their Terms Of Service legalese. People immediately think of spying, and it's kinda ... somehow ... well, it's not exactly not spying. But I personally don't think it's really spying either, but that maybe because I'm the one running the platform and have many years of experience what it means to know jack shit about your users and what they're doing.

So my view on things is biased. Thus, I decided to go the pragmatic route and just commit myself and the company to explaining what we're doing - hopefully in a way that people who are not into metrics can get a grasp too. If I can't explain it, or it feels creepy to explain it, well, that should be enough to make me think it over. And I'm taking a huge flaming pile of dung on legalese, because, you know, fuck you - not.

So my take on it is, metrics should be more like a census and what's called "city development data". To be able to plan what areas to develop, a city needs to know where people go to work, when and where they buy things, what the average income is etc.

It's the same with a web platform like Soup, except for jobs and shopping centers we have blogs and groups, and instead of cars and net worth of the individual we have posts and days since registration.

Anyhow, @whatweknow is dedicated from now on to explain what we're collecting, what we're doing with it, and maybe even post some results. If you have an interesting question that we should be able to answer with metrics, don't be afraid to ask @kitchen or reply to a post here! Sometimes we may be unable to give answers or give them not in absolute numbers, but we should be able to crank out at least a percentage or a general vicinity every once in a while. And if not, you may at least get an explanation why :)

Last but not least, a word to the wise: be inquisitive about your privacy on the internets, and use tools such as ghostery or disconnect to understand what sites know about you.

Reposted byupdatesovalsilveralinakayanidarksideofthemoonslovaLonePoProstusmoke11Fate46radaetykipaketn0gmprathalismarsjaninzmarsalunabrightbytetishkaslightlyworryingsceenshotsminderleistercoloredgrayscalepuszczyksoberoopsiaksignofthemothcheg00klaastfun84naciasofiasnaichskizzorandomuser
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!