So, you have identified a fascinating new problem to solve with data. You correctly started with a problem and not the data. It seems both beneficial and interesting. Now where do you get the data? Here are 4 steps (in order) for how to find data.
1. Existing Data
The best place to start is the data you currently have. What data does your organization currently collect? How can you get access to that? Start there.
Then look for industry specific open data (data that is freely available). Many industries publish data monthly or yearly. Also, data is frequently available with government funded research. If industry specific data is not available, what other related data is openly available? It is often beneficial to augment your existing data with open data. Here are some lists of open data, Open Data, Part 1, Open Data, Part 2. There are also many others available.
Next, explore the opportunity of using an API to access data. Many application have existing API access. An API (Application Programming Interface) allows a person to write some computer code to pull machine-readable data from an existing system. Some are freely available, others have associated costs. Many allow the data to be available in near real-time. There are numerous API’s available where you can pull in data. Check with some of your existing applications. They are available for weather, stocks, news, social media, web analytics, and many more.
4. Create The Data
The last resort is to begin the creation of data. An obvious choice is to create a survey. Be careful because surveys can be trickier than initially thought. You often do not get good representation and the result is biased data. Another way to collect data is to change your application to begin collecting the desired information. You may even have to build a new application. Sometimes an entire process needs to be created or modified to include methods to collect the data. This last step usually takes the longest and costs the most money.