Crowdsourcing and Its Applications on Data Mining : A Brief Survey


  • K. Karthika  Department of Computer Applications, Saradha Gangadharan College, Puduchery, Tamil Nadu, India
  • R. Durga Devi  Department of Computer Applications, Saradha Gangadharan College, Puduchery, Tamil Nadu, India


Clustering, Crowdsourcing, Data mining, Sampling, Quality control


Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplars works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research


  1. Boim, R., Greenshpan, O., Milo, T., Novgorodov, S., Polyzotis, N., & Tan, W. C, Asking the right questions in crowd data sourcing, Proc. 28th international conference on data engineering (ICDE), 2012.
  2. Quinn, A. J., & Bederson, B. B, Human computation: A survey and taxonomy of a growing field, Proc. SIGCHI conference on human factors in computing systems, 2011.
  3. Barbier, G., Zafarani, R., Gao, H., Fung, G., & Liu, H, Maximizing benefits from crowdsourced data, Computational and Mathematical Organization Theory, 18(3), 2012, 257???279.
  4. Agarwal, N., Liu, H., Tang, L., & Yu, P. S, Identifying the influential bloggers in a community, Proc. international conference on web search and data mining, 2008.
  5. Eickhoff, C., & de Vries A, How crowdsourceable is your task,?? Proc. fourth ACM international conference on web search and data mining (WSDM), 2011.
  6. Liu, X., Lu, M., Ooi, B. C., Shen, Y., Wu, S., & Zhang, M, Cdas: A crowdsourcing data analytics system, Proc. VLDB Endowment, 2012.
  7. Heimerl, K., Gawalt, B., Chen, K., Parikh, T., & Hartmann,?? Community Sourcing: Engaging local crowds to perform expert work via physical kiosks, Proc. ACM annual conference on human factors in computing systems,2012.
  8. Allahbakhsh, M., Ignjatovic, A., Benatallah, B., Beheshti, S. M. R., Bertino, E., & Foo, Reputation management in crowdsourcing systems,?? Proc. 8th international conference on collaborative computing: networking, applications and worksharing,2012.
  9. Ipeirotis, P. G., Provost, F., & Wang, J, Quality management on Amazon mechanical Turk, Proc.?? ACM SIGKDD workshop on human computation, 2010.
  10. Bernstein, M. S., Teevan, J., Dumais, S., Liebling, D., & Horvitz, Direct answers for search queries in the long tail, Proc. SIGCHI conference on human factors in computing systems, 2012
  11. Le, J., Edmonds, A., Hester, V., & Biewald, L., Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution,?? Proc. SIGIR 2010 workshop on crowdsourcing for search evaluation,2010.
  12. Bernstein, M. S., Brandt, J., Miller, R. C., & Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces, Proc. 24th annual ACM symposium on user interface software and technology, 2011.
  13. Callison-Burch, C., & Dredze, M, Creating speech and language data with Amazon’s mechanical Turk, Proc. NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical Turk, 2010.






Research Articles

How to Cite

K. Karthika, R. Durga Devi, " Crowdsourcing and Its Applications on Data Mining : A Brief Survey, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 4, Issue 2, pp.24-29, January-February-2018.