How do the Pros do Data Analysis?

By: Diego Bustamante and William-Elijah Clark

INTRODUCTION

As technology continues to evolve, the infrastructure needed to run this technology gets more and more sophisticated. Processes and tasks carried out by personal computers, smartphones, and appliances are increasingly automated and run with minimal input from the user. This is made possible through code that is developed with one or more computer programming languages.  However, with the increase in the quantity of software and programming applications, the demand for programmers and the number of languages they are required to learn has increased.  Furthermore, many employers now require skills in data analysis and computer programming as prerequisites for job applications.  In this blog post, we will discuss the most in demand languages in the market and give a brief explanation of each.  (Grand Canyon University 2020; Jiidee 2020; Meinke 2020; University of California – Berkeley, n.d.) 

PROGRAMMING LANGUAGES

1. Python

Designed by Guido Van Rossum in 1991, Python is an “object oriented and high-performance programming language used for general purpose programming”. Python currently ranks as the number one programming language in average yearly salary ($119,000) and is the most in-demand language on job postings per year (50,000).  Python is most commonly used for web development, data science, and scripting, although it has applicability in a variety of disciplines and fields. 

Pros: Python is easily extendable to other programming languages (i.e. C++) and its simpler code can often be read and used by other programming languages.  It also contains extensive libraries and packages, resulting in less coding required for the user. Python is also free to install as an open source software.

Cons: Due to its simpler code, Python utilizes a large amount of memory. This renders it disadvantageous for developing mobile applications or applications where memory optimization is needed (such as web browsers). There are also speed limitations with Python due to it being an interpreted language, meaning it will have a higher execution time than other software.  (Data Flair, n.d.; Levy 2020; Python Software Foundation 2001; Sugi 2018)

2. R

R is an open-source programming language that is specifically designed for complex statistical analysis. It was originally developed by Ross Ihaka and Robert Gentleman in the early 1990s at the University of Auckland in New Zealand. There are currently over 10,000 different packages that can be installed into it. You can code directly into R’s default graphical user interface (GUI), or you can use R within an integrated development environment (IDE) such as RStudio. R will also work with an assortment of code editors such as Notepad++, Visual Code Studio, or Jupyter Notebooks. R is widely used in academic research, although it has become more popular in the private sector within the past decade.

Pros: R is open-source and can be modified with self-coded packages if you can code in C.  R is also good at generating statistical graphs.  There is also extensive documentation on how to utilize R for different types of statistical analysis.

Cons: Some more obscure R packages may not be as functional as more well-known ones. Also, possible mathematical discrepancies in results may (very rarely) occur between different program consoles for R. Finally, the processing speed of R is slower than MATLAB and Python. (Data Flair, n.d.; Hornik 2021; The R Foundation, n.d.)

3. SQL

SQL (Structured Query Language) is a programming language used to access information from databases and pull information out of the databases that it interfaces with. It was originally developed by IBM in the early 1970s. SQL can also be used to concatenate multiple forms of data pulled from a database to create a single dataset, as well as do some minor data manipulation such as adding variables together.

Pros: SQL is an open source software that boasts easy-to-learn syntax and extensive help documentation. It requires very few lines of code to implement functions and can be called and used within several other programming languages (e.g., Python, R, SAS). SQL is used in almost every single industry.

Cons: SQL may not interface with certain proprietary databases without additional code modifications. SQL also cannot analyze data on its own and requires the use of another programming language for data analysis. (International Organization for Standardization 2016; Laine, n.d.; Srivastava 2021)

4.TensorFlow

TensorFlow is an open-source machine learning library/engine that was developed by Google researchers and released in 2015. The engine can be implemented in other programming languages (for example, JavaScript, Python, and R).

Pros: Tensorflow is open source and is usable in several different programming environments. It can be used for complex applications in artificial intelligence and machine learning.

Cons: Tensorflow requires knowledge of at least one other programming language (e.g JavaScript or Python) to implement. There are also inconsistencies with its syntax due to it being a newer programming language. (Abadi et. al. 2015; Falbel et. al. 2021; Metz 2015; Tech Vidvan 2021) 

5. Julia

Julia is a free, open source, and high-performance programming language released in 2012 by Jeff Bezanson and others. Julia was designed to combine all of the best qualities of well established programming languages, including C, R, MATLAB, and Python. Its features are especially well suited for statistical models, risk analysis, economical models, machine learning, computer science, and even spacecraft dynamics simulations. 

Pros: Julia boasts fast compiling speeds due to having a JIT (Just-In-Time) compiler which runs code generated during the execution of the program. Also, it has a dynamic typing style that gives it the feel of  a scripting language such as C. It also has an interactive command-line interface like Python and has math friendly syntax. 

 Cons: Julia consumes a large amount of memory (about 150MB) and is unable to easily integrate into other programming languages. There have also been many bugs reported for Julia, likely due to its relative newness as a programming software.  (Bezanson et al 2012; Bezanson 2017; Nissen 2021). 

6. Java

Java is a free to use programming language that was developed by James Gosling in 1995. This language is present in more than three billion smartphones and on approximately 97% of enterprise computers. The applications for Java are very extensive, and include mobile, web-based, scientific, gaming, business, and cloud based applications (among many others).

Pros: Some of the benefits of using this language are its simplicity, constant updates, and the security of its data. 

Cons: Some of the drawbacks that Java presents are that, although it possesses a simple language, it also has a learning curve that new users normally go through due to its syntax being somewhat wordy, cryptic and rather lengthy compared to other programming languages. (Britannica, n.d.; Chand 2019; Oracle Corporation, n.d.; Zaręba 2020) 

7. MATLAB

MATLAB, created in 1984 by Cleve Moler, is used by scientists and engineers for a variety of applications in industry and academia. It is most commonly used for machine learning, data analytics, statistics, curve fitting, image processing, financial analysis, and control systems.

Pros: MATLAB possesses a large database of built-in algorithms for imaging and programming. The software also allows testing for algorithms without recompilation and the ability to call external libraries. Finally, MATLAB boasts extensive online help resources and clearly labeled documentation. 

Cons: MATLAB is an interpreted language, meaning that it will have higher run times than other coding software. MATLAB is also proprietary and relatively expensive, requires a fast computer, and doesn’t feature real time apps. (RF Wireless World 2012; Moler, n.d.; Patil 2019)

8. SAS

SAS (Statistical Analysis System) is a proprietary, statistical programming language from the SAS Institute. It was originally developed at North Carolina State University by Anthony James Barr between 1966 and 1976. SAS has additional functionality that allows users to code in SQL within SAS, making it easy to analyze data pulled from an SQL database inside of SAS.

Pros: SAS has a consistent coding syntax and well documented functions. It also has the ability to process big data sets faster than R or Python. Access to SAS and hands-on learning opportunities are often available at academic institutions (FSU even has enough courses to offer a SAS certificate).

Cons: SAS is a proprietary software, with fees for using the software often in the range of $8000. It is also phasing out of use in some industries and is now primarily limited to specific industries (i.e; government, big banks, medical fields). Furthermore, graph generation in SAS can be clunky, and its procedural programming might be unfamiliar to newer users. (Kent State University 2021; SAS Institute, n.d.; Southern Illinois University n.d.)

HONORABLE MENTIONS

Go(Golang)

Go (also known as “Golang”) is an open source programming language that was designed by Google’s Robert Griesemer, Rob Pike, and Ken Thompson in 2007. The software was invented by Google to manage its growing complex infrastructure, with a focus on simplicity and quick performance (A Cloud Guru). Go was released publicly in 2012 and since then has become widely used, with companies such as Uber and Soundcloud utilizing it to improve their mobile and web-based applications. Furthermore, Go has been used to develop banking, ecommerce, and dating apps.

Pros: Go is ideal for saving money and time while developing apps. Also, it is useful in developing apps used for systems and network programming, big data, machine learning, audio and video editing. 

Cons: Due to the relative newness of the software, there are less add-on libraries or packages available. There is also less help documentation compared to more established programming languages. Finally, Go’s simplicity can lead to “undisciplined” coding, which can be detrimental if a given project grows and becomes bigger.  

Microsoft Excel

Microsoft Excel is a widely used proprietary spreadsheet software program utilized across a variety of industries. Excel is often used by analysts, surveyors, accountants, data entry clerks, marketers, and more. Excel has the ability to output data as .csv spreadsheets, which are readable by almost any other statistical analysis programming language. MS Excel is often used for tracking data for experiments, survey responses, financial information, and many other forms of data. It can be used to do tasks such as constructing pivot tables, solving systems of linear equations, and (with the right external libraries installed) generating basic linear regression models. While Microsoft Excel itself is proprietary, there are several open source alternatives, such as Google Sheets, Apache OpenOffice: Calc, and the pyspread library for Python.

Pros: Microsoft Excel boasts a very intuitive interface for making data tables and has well documented formulas and syntax. It is also useful for creating usable data files for a variety of programming languages and software. Finally, Excel is ubiquitous to most computers and is used in almost every industry.

Cons: Some computers might not have Excel and need an open-source alternative. Data analysis in Excel can be clunky, and more complex analysis may be difficult, time consuming, and/or impossible. Number formatting can also be counterintuitive for non-programmers new to Excel. Furthermore, some Excel add-ins cost additional money to use. (Google Inc, n.d.; Martin Manns, n.d.; Microsoft Corporation, n.d.; The Apache Software Foundation, n.d.)

CONCLUSION

There are many different programming languages utilized across different industries, and it can be difficult to determine the best language to learn or use. One’s choice of programming language is not only determined by its functionality, but also by budgetary restraints, industry (e.g. SPSS and STATA are almost exclusively used in social sciences academia and research), and even the personal preferences of an already established team. It should also be mentioned that most professional data scientists know multiple programming languages, even if they do have personal preferences. Regardless, once you know at least one programming language and understand the fundamentals of statistical analysis, picking up additional programming languages for data science becomes a lot easier. 

RESOURCES

Thankfully, there are a number of resources available for learning a new programming language, many of which are free.  These resources include, but are not limited to:

REFERENCES

Abadi, Martin, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015. tensorflow.org.

Data Flair. “Advantages and Disadvantages of Python – How It Is Dominating Programming World.” .Training. Accessed December 7, 2021. https://data-flair.training/blogs/advantages-and-disadvantages-of-python/.

RF Wireless World. “Advantages of MATLAB | Disadvantages of MATLAB.” .Com, 2012. https://www.rfwireless-world.com/Terminology/Advantages-and-Disadvantages-of-MATLAB.html.

Apache Software Foundation. Apache OpenOffice (version 4.1.11). Wilmington, DE: Apache Software Foundation, 2021. https://www.openoffice.org/.

Benzanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59, no. 1 (January 2017): 65–98. https://doi.org/10.1137/141000671.

Benzanson, Jeff, Stefan Karpinski, Viral B Shah, and Alan Edelman. “Why We Created Julia.” .Org. Julialang.Org/Blog (blog), February 14, 2012. https://julialang.org/blog/2012/02/why-we-created-julia/.

Chand, Swatee. “Top 10 Applications of Java Programming Language.” .Com. Edureka (blog), September 16, 2019. https://medium.com/edureka/applications-of-java-11e64f9588b0.

Codecademy. “Codecademy.” .Com. Codecademy, 2012. https://www.codecademy.com/.

Falbel, Daniel, JJ Allaire, Yuan Tang, Dirk Eddelbuettel, Nick Golding, Tomasz Kalinowski, RStudio, and Google Inc. “Tensorflow: R Interface to ‘TensorFlow.’” .Org. R-Project, November 9, 2021. https://cran.r-project.org/package=tensorflow.

Florida State University. “Florida State University Libraries – Research Data Services.” .Edu. Research Data Services. Accessed December 9, 2021. https://www.lib.fsu.edu/service/research-data-services.

Google Inc. “Effective Go.” .Dev. Go. Accessed December 9, 2021. https://go.dev/doc/effective_go.

———. “Google Sheets.” .Com. Google Sheets: Free Online Spreadsheet Editor. Accessed December 9, 2021. https://www.google.com/sheets/about/.

Grand Canyon University. “Why Is Programming Important?” .Edu. Engineering & Technology (blog), September 25, 2020. https://www.gcu.edu/blog/engineering-technology/computer-programming-importance#:~:text=Programming%20languages%20use%20classes%20and,do%20it%20automatically%20and%20accurately.

Hornik, Kurt. “R FAQ – What Is  R?” .Org. R Project – R FAQ, November 25, 2021. https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_003f.

International Organization for Standardization. “ISO/IEC 9075-1:2016 Technology — Database Languages — SQL — Part 1: Framework (SQL/Framework).” .Org. International Organization for Standardization, December 2016. https://www.iso.org/standard/63555.html.

Irizarry, Rafael, and Michael Love. “Statistics and R.” .Edu. Harvard University, October 1, 2020. https://pll.harvard.edu/course/statistics-and-r?delta=0.

———. “Statistics and R.” .Edu. Harvard University – Professional and Lifelong Learning. Accessed December 9, 2021. https://online-learning.harvard.edu/course/statistics-and-r?delta=0.

Kaggle. “Kaggle Courses.” .Com. Kaggle, 2010. https://www.kaggle.com/learn.

Kent State University. “Statistical & Qualitative Data Analysis Software: About SAS.” .Edu. Kent State University – University Libraries, September 29, 2021. https://libguides.library.kent.edu/statconsulting/SAS.

Khan, Sal. “Khan Academy.” .Org. Khan Academy, 2005. https://www.khanacademy.org/.

Laine, Harri. “History of SQL.” .Fi. History of SQL, January 14, 2002. https://www.cs.helsinki.fi/u/laine/tuelip/sql_material/sql_history.html.

Levy, Rod. “The Best Paying and Most In-Demand Programming Languages in 2020.” .Org. Code Platoon, June 30, 2020. https://www.codeplatoon.org/best-paying-most-in-demand-programming-languages-2020/.

LinkedIn Corporation. “LinkedIn Learning.” .Com. LinkedIn. Accessed January 6, 2022. https://www.linkedin.com/learning/.

Data-Flair. “List of R Packages – Master All the Core Packages of R Programming!” .Training. Accessed December 7, 2021. https://data-flair.training/blogs/list-of-r-packages/.

Manns, Martin, and The pyspread team. Pyspread (version 2.0.1). Python, 2021. https://pyspread.gitlab.io/.

Meinke, Hannah. “Why Learn to Code? The Surprisingly Broad Benefits of Coding.” .Edu. Rasmussen University Technology Blog (blog), August 24, 2020. https://www.rasmussen.edu/degrees/technology/blog/why-learn-to-code/.

Metz, Cade. “Google Just Open Sourced TensorFlow, Its Artificial Intelligence Engine.” Wired Magazine, November 9, 2015. https://www.wired.com/2015/11/google-open-sources-its-artificial-intelligence-engine/.

Microsoft Corporation. Microsoft Excel (version 2019). Microsoft Excel. Accessed December 9, 2021. https://www.microsoft.com/en-us/microsoft-365/excel.

Miquido. “Best Golang Applications: 6 Companies Using the Go Language.” Miquido, June 5, 2020. https://www.miquido.com/blog/top-golang-apps-best-apps-made-with-golang/. 

Mishra, Bipin. “To Go or Not to: The Pros and Cons of Programming in Golang.” Top Mobile App Development Company India & USA, January 11, 2022. https://www.mindinventory.com/blog/pros-and-cons-programming-in-golang/. 

Moler, Cleve, and MathWorks. “A Brief History of MATLAB.” .Com. MathWorks(R) Technical Articles and Newsletter, 2018. https://www.mathworks.com/company/newsletters/articles/a-brief-history-of-matlab.html.

Nissen, Jakob Nybo. “What’s Bad About Julia?” .Com. Viralinstruction (blog), July 25, 2021. https://viralinstruction.com/posts/badjulia/.

Oracle and/or its affiliates. “Java.” .Com. Oracle Java Documentation. Accessed December 9, 2021. https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html.

Osadchiy, Victor. “Why Use the Go Language for Your Project?” What Is Go Language and Its Pros and Cons for Your Project.” Accessed January 13, 2022. https://yalantis.com/blog/why-use-go/. 

Patil, Devaji. “Applications of MATLAB.” .Org. GeeksforGeeks (blog), December 9, 2019. https://www.geeksforgeeks.org/applications-of-matlab/.

Pletcher, Scott. “What Is Go? An Intro to Google’s Go Programming Language (Aka Golang).” A Cloud Guru, September 28, 2021. https://acloudguru.com/blog/engineering/what-is-go-an-intro-to-googles-go-programming-language-aka-golang. 

Potomac IT. “10 Reasons Why You Should Learn Programming.” .Edu. 10 Reasons Why You Should Learn Programming (blog), May 23, 2020. https://potomac.edu/why-learn-programming/.

Data-Flair. “Pros and Cons of R Programming Language – Unveil the Essential Aspects!” .Training. Accessed December 7, 2021. https://data-flair.training/blogs/pros-and-cons-of-r-programming-language/.

Python Software Foundation. Python (version 3.10). Python Software Foundation, 1991. https://www.python.org/.

R Core Team. R: A Language and Environment for Statistical Computing (version 4.1.2). Vienna, Austria: R Foundation for Statistical Computing, 2021. https://www.r-project.org/.

Refsnes Data. “W3 Schools.” .Com. W3 Schools, 1999. https://www.w3schools.com/.

SAS Institute. “Company History | SAS.” .Com. SAS. Accessed December 19, 2021. https://web.archive.org/web/20131023182559/http://www.sas.com/company/about/history.html.

Southern Illinois University. “SAS – Salukitech.” .Edu. oit.siu.edu, August 30, 2021. https://oit.siu.edu/salukitech/software/software-applications/sas.php.

Srivastava, Simran. “Advantages and Disadvantages of SQL.” ..Org. GeeksforGeeks (blog), July 7, 2021. https://www.geeksforgeeks.org/advantages-and-disadvantages-of-sql/.

Statler, Tim. “Is Java Hard to Learn for a Beginner? The Hard Facts.” Comp Sci Central, February 5, 2020. https://compscicentral.com/is-java-hard-to-learn/. 

Sugi, YK. “What Exactly Can You Do with Python? Here Are Python’s 3 Main Applications.” .Org. FreeCodeCamp (blog), June 15, 2018. https://www.freecodecamp.org/news/what-can-you-do-with-python-the-3-main-applications-518db9a68a78/.

Techvidvan. “Advantages and Disadvantages of TensorFlow.” .Com. Techvidvan, 2021. https://techvidvan.com/tutorials/pros-and-cons-of-tensorflow/.

The Editors of Encyclopedia Britannica. “Java: Computer Programming Language.” In Encyclopedia Britannica. Computers. Accessed December 9, 2021. https://www.britannica.com/technology/Java-computer-programming-language.

University of California – Berkeley. “What Is Coding? 5 Key Advantages of Learning How to Code.” .Edu. Berkeley Bootcamps Coding (blog). Accessed December 7, 2021. https://bootcamp.berkeley.edu/blog/what-is-coding-key-advantages/.Zaręba, Adam. “What Is Java Used for in Today’s Programming?” .Com. Software Hut (blog), July 16, 2021. https://softwarehut.com/blog/business/what-is-java-used-for.

Leave a Reply

Powered by WordPress.com.

Up ↑

Discover more from FSULIB

Subscribe now to keep reading and get access to the full archive.

Continue reading