The Personal Analytics of My Life

March 8, 2012

One day I’m sure everyone will routinely collect all sorts of data about themselves. But because I’ve been interested in data for a very long time, I started doing this long ago. I actually assumed lots of other people were doing it too, but apparently they were not. And so now I have what is probably one of the world’s largest collections of personal data.

Every day—in an effort at “self awareness”—I have automated systems send me a few emails about the day before. But even though I’ve been accumulating data for years—and always meant to analyze it—I’ve never actually gotten around to doing it. But with Mathematica and the automated data analysis capabilities we just released in Wolfram|Alpha Pro, I thought now would be a good time to finally try taking a look—and to use myself as an experimental subject for studying what one might call “personal analytics”.

Let’s start off talking about email. I have a complete archive of all my email going back to 1989—a year after Mathematica was released, and two years after I founded Wolfram Research. Here’s a plot with a dot showing the time of each of the third of a million emails I’ve sent since 1989:

Plot with a dot showing the time of each of the third of a million pieces of email

The first thing one sees from this plot is that, yes, I’ve been busy. And for more than 20 years, I’ve been sending emails throughout my waking day, albeit with a little dip around dinner time. The big gap each day comes from when I was asleep. And for the last decade, the plot shows I’ve been pretty consistent, going to sleep around 3am ET, and getting up around 11am (yes, I’m something of a night owl). (The stripe in summer 2009 is a trip to Europe.)

But what about the 1990s? Well, that was when I spent a decade as something of a hermit, working very hard on A New Kind of Science. And the plot makes it very clear why in the late 1990s when one of my children was asked for an example of “being nocturnal” they gave me. The rather dramatic discontinuity in 2002 is the moment when A New Kind of Science was finally finished, and I could start leading a different kind of life.

So what about other features of the plot? Some line up with identifiable events and trends in my life, sometimes reflected in my online scrapbook or timeline. Others at first I don’t understand at all—until a quick search of my email archive jogs my memory. It’s very convenient that I can always drill down and read a raw email. Because as with essentially any long-timescale data project, there are all kinds of glitches (here like misformatted email headers, unset computer clocks, and untagged automated mailings) that have to be found and systematically corrected for before one has consistent data to analyze. And before, in this case, I can trust that any dots in the middle of the night are actually times I woke up and sent email (which is nowadays very rare).

The plot above suggests that there’s been a progressive increase in my email volume over the years. One can see that more explicitly if one just plots the total number of emails I’ve sent as a function of time:

Daily outgoing emails and monthly outgoing emails

Again, there are some life trends visible. The gradual decrease in the early 1990s reflects me reducing my involvement in day-to-day management of our company to concentrate on basic science. The increase in the 2000s is me jumping back in, and driving more and more company projects. And the peak in early 2009 reflects with the final preparations for the launch of Wolfram|Alpha. (The individual spikes, including the all-time winner August 27, 2006, are mostly weekend or travel days specifically spent “grinding down” email backlogs.)

Distribution of emails per dayThe plots above seem to support the idea that “life’s complicated”. But if one aggregates the data a bit, it’s easy to end up with plots that seem like they could just be the result of some simple physics experiment. Like here’s the distribution of the number of emails I’ve sent per day since 1989:

What is this distribution? Is there a simple model for it? I don’t know. Wolfram|Alpha Pro tells us that the best fit it finds is to a geometric distribution. But it officially rejects that fit. Still, at least the tail seems—as so often—to follow a power law. And perhaps that’s telling me something about myself, though I have to say I don’t know what.

Monthly distinct email recipients

The vast majority of these recipients are people or mailgroups within our company. And I suspect the overall growth is a reflection of both the increasing number of people at the company, and the increasing number of projects in which I and our company are involved. The peaks are often associated with intense early-stage projects, where I am directly interacting with lots of people, and there isn’t yet a well-organized management structure in place. I don’t quite understand the recent decrease, considering that the number of projects is at an all-time high. I’m just hoping it reflects better organization and management…

OK, so all of that is about email I’ve sent. What about email I’ve received? Here’s a plot comparing my incoming and outgoing email:

Average daily emails

The peaks in 1996 and 2009 are both associated with the later phases of big projects (Mathematica 3 and the launch of Wolfram|Alpha) where I was watching all sorts of details, often using email-based automated systems.

OK. So email is one kind of data I’ve systematically archived. And there’s a huge amount that can be learned from that. Another kind of data that I’ve been collecting is keystrokes. For many years, I’ve captured every keystroke I’ve typed—now more than 100 million of them:

Diurnal plot of keystrokes

Daily keystrokes, averaged by month

There are all kinds of detailed facts to extract: like that the average fraction of keys I type that are backspaces has consistently been about 7% (I had no idea it was so high!). Or how my habits in using different computers and applications have changed. And looking at the daily totals, I can see spikes of writing activity—typically associated with creating longer documents (including blog posts). But at least at an overall level things like the plots above look similar for keystrokes and email.

What about other measures of activity? My automated systems have been quietly archiving lots of them for years. And for example this shows the times of events that have appeared in my calendar:

Diurnal plot of calendar events

The changes over the years reflect quite directly things going on in my life. Before 2002 I was doing a lot of solitary work, particularly on A New Kind of Science, and having only a few scheduled meetings. But then as I initiated more and more new projects at our company, and took a more and more structured approach to managing them, one can see more and more meetings getting filled in. Though my “family dinner stripe” remains clearly visible.

Here’s a plot of the daily average total number of meetings (and other calendar events) that I’ve done over the years:

Average events per day

The trend is pretty clear. And it reflects the fact that in the past decade or so I’ve gradually learned to work better “in public”, efficiently figuring things out while interacting with groups of people—which I’ve discovered makes me much more effective both at using other people’s expertise and at delegating things that have to be done.

It often surprises people when I tell them this, but since 1991 I’ve been a remote CEO, interacting with my company almost exclusively just by email and phone (usually with screensharing). (No, I don’t find videoconferencing with the company very useful, and the telepresence robot I got recently has mostly been standing idle.)

So phone calls are another source of data for me. And here’s a plot of the times of calls I’ve made (the gray regions are missing data):

Diurnal plot of phone calls

Yes, I spend many hours on the phone each day:

Daily hours on the phone and monthly hours on the phone

And this shows how the probability to find me on the phone varies during the day:

On-phone probability

This is averaged over all days for the last several years, and in fact I’m guessing that the “peak weekday probability” would actually be even higher than 70% if the average excluded days when I’m away for one reason or another.

Here’s another way to look at the data—this shows the probability for calls to start at a given time:

Call start times

There’s a curious pattern of peaks—near hours and half-hours. And of course those occur because many phone calls are scheduled at those times. Which means that if one plots meeting start times and phone call start times one sees a strong correlation:

Calls and meetings

Differences between meeting and phone call start timesI was curious just how strong this correlation is: in effect just how scheduled all those calls are. And looking at the data I found that at least for my external phone meetings at least half of them do indeed start within 2 minutes of their appointed times. For internal meetings—which tend to involve more people, and which I normally have scheduled back-to-back—there’s a somewhat broader distribution, shown on the left.

Call durationsWhen one looks at the distribution of call durations one sees a kind of “physics-like” background shape, but on top of that there’s the “obviously human” peak at the 1-hour mark, associated with meetings that are scheduled to be an hour long.

So far everything we’ve talked about has measured intellectual activity. But I’ve also got data on physical activity. Like for the past couple of years I’ve been wearing a little digital pedometer that measures every step I take:

Diurnal plot of steps taken

Daily steps averaged by month

And once again, this shows quite a bit of consistency. I take about the same number of steps every day. And many of them are taken in a block early in my day (typically coinciding with the first couple of meetings I do). There’s no mystery to this: years ago I decided I should take some exercise each day, so I set up a computer and phone to use while walking on a treadmill. (Yes, with the correct ergonomic arrangement one can type and use a mouse just fine while walking on a treadmill, at least up to—for me—a speed of about 2.5 mph.)

OK, so let’s put all this together. Here are my “average daily rhythms” for the past decade (or in some cases, slightly less):

Graphs of incoming emails, outgoing emails, keystrokes, meetings and events, calls, and steps as a function of time

The overall pattern is fairly clear. It’s meetings and collaborative work during the day, a dinner-time break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data I am struck by how shockingly regular many aspects of it are. But in general I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic—and spontaneous—about intellectual and other things.

And for me one of the objectives is to have ideas, and hopefully good ones. So can personal analytics help me measure the rate at which that happens?

It might seem very difficult. But as a simple approximation, one can imagine seeing at what rate one starts using new concepts, by looking at when one starts using new words or other linguistic constructs. Inevitably there are tricky issues in identifying genuine new “words” etc. (though for example I have managed to determine that when it comes to ordinary English words, I’ve typed about 33,000 distinct ones in the past decade). If one restricts to a particular domain, things become a bit easier, and here for example is a plot showing when names of what are now Mathematica functions first appeared in my outgoing email:

First email appearance of Mathematica functions

The spike at the beginning is an artifact, reflecting pre-existing functions showing up in my archived email. And the drop at the end reflects the fact that one doesn’t yet know future Mathematica names.  But it’s interesting to see elsewhere in the plot little “bursts of creativity”, mostly but not always correlated with important moments in Mathematica history—as well as a general increase in density in recent times.

As a quite different measure of creative progress, here’s a plot of when I modified the text of chapters in A New Kind of Science:

Plot of when chapters were modified in A New Kind of Science

I don’t have data readily at hand from the beginning of the project. And in 1995 and 1996 I continued to do research, but stopped editing text, because I was pulled away to finish Mathematica 3 (and the book about it). But otherwise one sees inexorable progress, as I systematically worked out each chapter and each area of the science. One can see the time it took to write each chapter (Chapter 12 on the Principle of Computational Equivalence took longest, at almost 2 years), and which chapters led to changes in which others. And with enough effort, one could drill down to find out when each discovery was made (it’s easier with modern Mathematica automatic history recording). But in the end—over the course of a decade—from all those individual keystrokes and file modifications there gradually emerged the finished A New Kind of Science.

It’s amazing how much it’s possible to figure out by analyzing the various kinds of data I’ve kept. And in fact, there are many additional kinds of data I haven’t even touched on in this post. I’ve also got years of curated medical test data (as well as my not-yet-very-useful complete genome), GPS location tracks, room-by-room motion sensor data, endless corporate records—and much much more.

And as I think about it all, I suppose my greatest regret is that I did not start collecting more data earlier. I have some backups of my computer filesystems going back to 1980. And if I look at the 1.7 million files in my current filesystem, there’s a kind of archeology one can do, looking at files that haven’t been modified for a long time (the earliest is dated June 29, 1980).

Here’s a plot of the latest modification times of all my current files:

Modification dates of all current files

The colors represent different file types. In the early years, there’s a mixture of plain text files (blue dots) and C language files (green). But gradually there’s a transition to Mathematica files (red)—with a burst of page layout files (orange) from when I was finishing A New Kind of Science. And once again the whole plot is a kind of engram—now of more than 30 years of my computing activities.

So what about things that were never on a computer? It so happens that years ago I also started keeping paper documents, pretty much on the theory that it was easier just to keep everything than to worry about what specifically was worth keeping. And now I’ve got about 230,000 pages of my paper documents scanned, and when possible OCR’ed. And as just one example of the kind of analysis one can do, here’s a plot of the frequency with which different 4-digit “date-like sequences” occur in all these documents:

Occurrence of years in scanned documents

Of course, not all these 4-digit sequences refer to dates (especially for example “2000″)—but many of them do. And from the plot one can see the rather sudden turnaround in my use of paper in 1984—when I turned the corner to digital storage.

What is the future for personal analytics? There is so much that can be done. Some of it will focus on large-scale trends, some of it on identifying specific events or anomalies, and some of it on extracting “stories” from personal data.

And in time I’m looking forward to being able to ask Wolfram|Alpha all sorts of things about my life and times—and have it immediately generate reports about them. Not only being able to act as an adjunct to my personal memory, but also to be able to do automatic computational history—explaining how and why things happened—and then making projections and predictions.

As personal analytics develops, it’s going to give us a whole new dimension to experiencing our lives. At first it all may seem quite nerdy (and certainly as I glance back at this blog post there’s a risk of that). But it won’t be long before it’s clear how incredibly useful it all is—and everyone will be doing it, and wondering how they could have ever gotten by before. And wishing they had started sooner, and hadn’t “lost” their earlier years.

90 comments to “The Personal Analytics of My Life”


    Really inspiring! Thanks!

    Hampus Jakobsson

    This is awesome. I love data. It has a huge reflective power when you collect it about yourself. You discover patterns and correlations you wouldn’t oherwise.

    I even asked people to graph their life and comment on the correlations they discover. Granted, human memory yields no accurate data, but the reflective power stays. I was pleased to see my requests had transformed into a half-successful meme that spread to a number of site:

    I have been collecting data about myself for a year now. I have weigh, (duration, average speed) of jogging, number of new people I meet, satisfaction of the work I produce, progress in my project, and more data that is *rather personal*.

    You say you wish you had collected more data. What data would you recommend someone in his mid twenties start collecting?

    Ben P

    Would love to hear what tools are used to capture all this data (email metadata, keystrokes, file metadata etc)


    Dear Stephen,
    Thank for such an interesting and informative article! I’ve been collecting data for a year, but I analyse only a few types, i.e. mood, quantity of steps and I want to research some more. So could tell me, what kind of programs do you use in a keystrokes and phone calls data collection? Could Wolfram|Alpha do all this?


    The occurences of years in his emails (i.e 1980,1990, etc) reminded me of one of my favorite visualizations that shows the frequency of numbers on the internet.

    Steve Souza

    Oh, and by the way really cool and nice writeup. You are one busy guy! Reviewing your graphs it looks like each day you

    * Are on the phone 12 hours
    * Have 10 events
    * Receive 400 emails,
    * Send 100 emails
    * Type 20,000 keystrokes

    In addition you have a family, and are a CEO. Wow.

    Steve Souza

    Cool to share that information with us, thanks.
    How many devices and probes did you have to wear and maintain for all these years :-) .
    Smart phones should simplify your life in the future.
    It is quite fascinating to collect data about yourself but it could also be dangerous if suddenly it ends up in hands with different interests such as health insurrances for example.
    Hopefully each individual will keep control of its own data.

    Guillaume Aubert

    I liked the post very much, creative and smart :)


    Fascinating read. Now that you have the temporal aspect covered, start keeping track of your spatial properties. I would be really interested in learning about how your spatial patterns emerge. More than likely there will be an autocorrelation; first law of geography.


    This might help you figure out your distributions a bit!


    @StatsTrade It’s already written: “The rather dramatic discontinuity in 2002 is the moment when A New Kind of Science was finally finished, and I could start leading a different kind of life.”


    StatsTrade, here is the explanation taken from the text:

    “The rather dramatic discontinuity in 2002 is the moment when A New Kind of Science was finally finished, and I could start leading a different kind of life.”


    It’s an amazing analysis, thanks for sharing and describing all that – nice effort!



    >What is this distribution? Is there a simple model for it? I don’t know.

    Stephen, it is a Poisson distribution.


    @StatsTrade – he says, “The rather dramatic discontinuity in 2002 is …”


    Wow, this is really important analysis and most thorough of a knowledge worker, thank you. Out of curiosity, how many hours on average as a function of time you spent on email, phone, meetings and writing, excluding/including weekends?


    Fascinating. Have you found any correlations between creativity (new ideas, as opposed to just keystrokes) and controllable lifestyle factors (walking further, sleeping early, skipping dinner, avoiding phone calls…)?



    From the article:
    “The rather dramatic discontinuity in 2002 is the moment when A New Kind of Science was finally finished, and I could start leading a different kind of life.”


    I’m curious about the software you used to collect keystrokes? Is it some custom made software? What kind of information does it collect? Only keystrokes, or some additional information?


    “But it won’t be long before it’s clear how incredibly useful it all is …”

    Can someone offer a hypothesis with examples of usefulness?

    Frank Ch. Eigler

    Have you looked at days of the week? I’ve written some code to analyse my gmail account and found that people tend to be extremely inactive on a Tuesday – by almost half as much as measured against any other weekday.


    Impressive dataset and analysis indeed! When this type of data collection gets popular (and for many other reasons as well) there will be a need for data structures and even programming languages that match human aspects of behavior. The present ones don’t don’t do it.

    I’ve written a quick blog on the data structure topic:

    and there is an inspiring talk by Rich Hickey on similar needs of programming languages: (

    This is a most inspiring topic now and in near future!


    It would also be interesting to see a breakdown (and shift) of hits to top N web sites visited over the years. Of course, being a programmer, his list would look nothing like a “normal” person’s list, but it would still be interesting to watch the rise and fall of certain web sites (which can take up a significant portion of someone’s time).


    Very interesting material.

    Have you thought about tying in personal health diagnostics? Folks in the “measured human” effort like Larry Smarr would love to be able input personal health metric tracking data at some granularity and be able to extrapolate all sorts of prognostications.

    avi weiss

    impressive. with muuuuuuuuuuuch less data, I have similar trends. one wonders: how much could this continue growing? I mean we´re humans, right?


    Very interesting. I’ve been collecting this data in some areas, but should do it more. Pedometer values, and distance travelled (via GPS) would be interesting. Smartphones could definitely be used more, apps exist that use the accelerometer to determine time slept, sleep cycles, etc.


    The rise and fall of the information tides of a creative genius. Amazing.


    Amazing work Sir.. Do you type all your mails or voice input your emails.. I was just wondering, how the dynamics of the activity would change if you were using voice to text feature.. Obviously the number of emails would not change but the time you spend on emails might..



    Very interesting! I am an electrical engineer (Drexel University) and I work for the US Navy, where a good portion of my work is data analysis. I love data. I collect data on all sorts of things. I love aggregating data from things like my facebook, and then trying different statistical analysis of it. I recently ran all sorts of causality analysis between various items from my facebook timeline, web history, etc. against recorded weather data over the timespan. I love finding correlations and patterns in the results, especially when they can show me how to be more efficient, save money, etc.

    I have a few suggestions/notes on two of the graphs, above:

    1. In the graph showing Incoming vs Outgoing emails, I feel that you need to refine the data a bit. Outgoing emails are fine as is, however incoming emails may be improperly inflated. Each time someone else hits “reply all” to an email where you’re included, even if the message isn’t addressed to you (i.e. Dear Stephen…) it’s being included as an incoming email. It’d be more appropriate to aggregate “emails you send” vs “emails sent to you,” instead of incoming vs outgoing. You could take a big whack at this by differentiating between incoming emails with your address in the “To” field and the “CC” field. I would also imagine that when you’re in the “Bcc” field, it is more like it was a message directed towards you, however you would likely not reply to that (given the nature of Bcc). Obviously there is a lot this does not account for (if you’re in CC but someone writes “Dead Stephen…”), but it may yet return interesting results.

    2. In the distribution of emails per day, where you weren’t sure if there was a simple model for it… I couldn’t help but recognize it as very similar to a graph of a capacitor’s discharge. See and


    Robert Cull

    Since Colombia: Congratulations! Nice work and data analysis!


    Visually it seems that the “distribution of emails per day” is a Poisson distribution.


    Fascinating! I would rather like an analysis of my life since the 80’s, what an entirely useful tool in learning about patterns and outcomes in my world. Thank you for sharing your work.


    @StatsTrade: Second paragraph after the chart: “The rather dramatic discontinuity in 2002 is the moment when A New Kind of Science was finally finished, and I could start leading a different kind of life.”


    >What is this distribution? Is there a simple model for it? I don’t know.

    It’s definitely a Poisson distribution, and possibly one with time varying intensity – with the time varying intensity being serially correlated. You could use the following models for forecasting purposes:

    Send me a mail, and i’ll be happy to send you a chapter from my PhD dissertation where Heinen(2003)’s Autoregressive Conditional Poisson model is applied to forecasting the number of news items for each of 28 large publicly listed US corporations.

    … and by the way, great post!


    While this data collection is quite impressive, I can’ help but wonder where your family and son (the one who notes you are nocturnal) would fit in all of your time? The measure of success is not in the work we do, but in the love we share. Have you passed this love on to your family?


    How did you keep this data? How do you keep track of emails and phone calls and keystrokes etc.? I would think you don’t do it manually since that would waste a lot of time. Can you talk about how you record you daily data?

    James Almeida

    Can i ask what you use to track all these things? I think it would be fun to try to set some of this up on my system to just see what type of raw data i can produce over a period of time.


    I’d really just like to see the design / schematics / arrangement for the workstation that can be used while walking on the treadmill, including typing!


    It looks like there are less keystrokes per email earlier in the day- I wonder if you type more detailed emails later in the day/night and if there is a correlation with thoughtfulness of responses over the day.
    I’m currently working on analyzing how bird’s song changes over the course of the day (the fundamental frequency of some notes rises during the morning and then peaks and falls again at night) and have heard from colleagues that there is a similar change in the velocity of non-human primate reaching behaviors over the course of the day. It would be interesting to see how your typing speed varies over the course of the day- maybe you could analyze typing speed as the speed with which you type particular words or phrases that you would be unlikely to pause during, like “thanks for your email” or other phrases that you commonly use. I love the fact that I’m learning things about myself that I didn’t know before using statistics, well done!

    Bill Wood

    Awesome way to play with data…data can tell us so much while telling us absolutely nothing. :)


Leave a Comment