What Dataset Would You Love To Have?

Human Trafficking

I think a dataset related to human trafficking would be interesting. It would need to contain: when, where, and the age of the person kidnapped. It could also contain the eventual location of the victim. I don’t know that any organisation has this data. Many times the kidnappings occur unknowingly or the persons involved are not allowed to speak about it. I think this data could be used to predict when kidnappings for human trafficking would occur. Thus preventing the crime.

My Life

Also, I would love a dataset all about my life. I would love to know what factors constitute a better day for me. I would like the dataset to contains foods I eat, accomplishments I get done, sleep (including how often I wake up), exercise, devotion time, rating of how good I thought the day was and possibly anything else. I know books and experts say that good food and exercise make people feel better. I would really like to know for me, which factors are most important. The problem is: I don’t want to take the time and effort to track all this data. I bet there is an app for it.

Chinese Gender Predictor

This one would just be for fun. Currently, I would enjoy a large dataset with information about child births. The dataset would need to contain the conception date (or due date), mother’s birth date, and child’s gender. I know that hospitals have this type of data, but HIPPA prevents the sharing of medical records. Here is why I would like it. There are numerous Chinese Gender Predictors around. They claim to be able to accurately predict the gender of a baby. Given enough data, this would be a fairly simple thing to validate or invalidate. Just perform the Chinese Gender Predictor and see how often it is correct. If it is correct significantly more than 50% of the time, then the early Chinese may have known something we do not. Otherwise, the Chinese Gender Predictor is not a useful tool. This data would have little impact for bettering the world, but just sounds like a fun little project.

Whether it exists or not, what dataset would you love to access?







7 responses to “What Dataset Would You Love To Have?”

  1. neilmillshill Avatar

    A dataset of people at the point of I need to find a new job preferably so that it could be sliced and diced by skill/keyword.

    Thinking a filter of all Tweets per day collated with keyword/phrase triggers e.g. ‘Grrr, I hate my job’, ‘commuting to my job on a packed train, again’. Don’t know if this would work though as it would need to savvy enough to extract the right data with the right meaning/sentiment behind the keywords/phrase, e.g. ‘Yeah, like I really hate my job’ in a sarcastic ‘yeah like I’d ever leave my job’ meaning as opposed ‘Yeah I really hate my job’ and ‘I’d definitely mean it’.

    Its this kind of point really boil down to the golden nirvana of big data, e.g. extracting the meaning/sentiment behind words/data to give a useful end analysis?

    1. Ryan Swanstrom Avatar

      That would be interesting, but I am not sure if people would want to be identified as a job searcher after one tweet. In any case, the analysis to identify those people would be really neat.

      1. neilmillshill Avatar

        Very valid point Ryan. Maybe its a combination of Tweets, web posts etc that would indicate a prevalence to moving?

  2. V Jam Avatar
    V Jam

    I actually might be of assistance for the first data set you mentioned. How did you become interested in this issue?

    This data is obviously highly confidential, but perhaps we could talk to the organization that collects this data and we’ll see if you can be granted access. Send me an email, so I can learn more about your interest in this topic!

    1. Ryan Swanstrom Avatar

      Thanks, I will send you an email.

  3. Ed Avatar

    Unfortunately, there’s no catch-all life tracking app. There’s a bunch of stuff you can mash together, like Bodymedia, Daily Burn, Trackr, etc., but something like you want is just about the holy grail of personal informatics. Along those lines, you should check out the Quantified Self movement; it’s all about using data to make your life better.

    Personally, I’d love to have the OKCupid dataset. The insights their data science guys have pulled from that thing are awesome.

    also, great site. I found out about the UW Data Science certificate through here, which hopefully will give me a leg-up in applying to the analytics master’s programs next year!

    1. Ryan Swanstrom Avatar

      All great points. I am looking into Quantified Self right now. Life-tracking is difficult, but it would be fun if I could magically have all the data.
      OKCupid does find some great insights, and their blog posts are entertaining.
      Thanks for the comment, and I am glad you are enjoying the site.

Leave a Reply