Difference between revisions of "Data Analysis"

From TeleCafeWiki
Jump to navigation Jump to search
(Added See Also and References sections.)
(Deposited selected content from Computer Productivity Hacks.)
Line 1: Line 1:
 +
{{RightTOC}}
 
== Business Intelligence ==
 
== Business Intelligence ==
 
* [http://2012books.lardbucket.org/books/getting-the-most-out-of-information-systems-a-managers-guide-v1.1/s15-06-the-business-intelligence-tool.html The Business Intelligence Toolkit]
 
* [http://2012books.lardbucket.org/books/getting-the-most-out-of-information-systems-a-managers-guide-v1.1/s15-06-the-business-intelligence-tool.html The Business Intelligence Toolkit]
 
: A section from the Creative Commons book [http://2012books.lardbucket.org/books/getting-the-most-out-of-information-systems-a-managers-guide-v1.1/index.html Getting the Most Out of Information Systems: A Manager's Guide].
 
: A section from the Creative Commons book [http://2012books.lardbucket.org/books/getting-the-most-out-of-information-systems-a-managers-guide-v1.1/index.html Getting the Most Out of Information Systems: A Manager's Guide].
 +
 +
== Data Analysis Tools ==
 +
* [http://d3js.org/ D3.js - Data-Driven Documents]
 +
: '''D3.js''' is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
 +
 +
* [https://www.rstudio.com/ide/download/ Download RStudio]
 +
: Take control of your R code. RStudio is the premier integrated development environment for R. It is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or over the web with RStudio Server. Download RStudio (for Windows, Mac, or Linux).
 +
 +
* [http://openrefine.org/ OpenRefine] (Formerly [http://code.google.com/p/google-refine/Google Refine].)
 +
: '''[[wikipedia:OpenRefine|Open Refine]]''' is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as [[wikipedia:Data wrangling|data wrangling]]. It is similar to [[wikipedia:Spreadsheet|spreadsheet]] applications (and can work with spreadsheet file formats), however, it behaves more like a database.
 +
 +
* [http://vis.stanford.edu/wrangler/ Data Wrangler] (Stanford Visualization Group)
 +
: Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...
 +
 +
* [http://htsql.org/ HTSQL—A Database Query Language]
 +
: HTSQL is designed for data analysts and other ''accidental programmers'' who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is ''free and open source'' software.
 +
 +
* [http://www.cc.gatech.edu/gvu/ii/jigsaw/ Jigsaw: Visual Analytics for Exploring and Understanding Document Collections]
 +
: Jigsaw is a visual analytics system to help analysts and researchers better explore, analyze, and make sense of such document collections.
 +
 +
== Regular Expressions ==
 +
* [[wikipedia:Regular expression|Regular expression]] (Wikipedia)
 +
: In theoretical computer science and formal language theory, a regular expression (abbreviated <tt>regex</tt> or <tt>regexp</tt> and sometimes called a ''rational expression'') is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.
 +
 +
=== RegEx Tutorials ===
 +
* [http://www.regular-expressions.info/tutorial.html Regular Expressions Tutorial: Learn How to Use and Get The Most out of Regular Expressions]
 +
: Any non-trivial <tt>regex</tt> looks daunting to anybody not familiar with them. But with just a bit of experience, you will soon be able to craft your own regular expressions like you have never done anything else.
 +
 +
* [http://www.rexegg.com/regex-uses.html The Many Uses of Regex]
 +
: Regex is the gift that keeps giving. Once you learn it, you discover it comes in handy in many places where you hadn't planned to use it.
 +
 +
=== RegEx Tools ===
 +
* [https://regex101.com Regex101: Online regex tester and debugger: JavaScript, Python, PHP, and PCRE]
 +
: Regex101 allows you to create, debug, test and have your expressions explained for PHP, PCRE, JavaScript and Python. The website also features a community where you can share useful expressions.
 +
 +
* [http://www.regexr.com/ RegExr: Learn, Build, & Test RegEx]
 +
: Regular expression tester with syntax highlighting, contextual help, video tutorial, reference, and searchable community patterns.
 +
 +
* [http://pythex.org/ Pythex: a Python regular expression editor]
 +
: Pythex is a real-time regular expression editor for Python, a quick way to test your regular expressions.
 +
 +
* [http://regexpal.com/ Regex Tester]
 +
: JavaScript regex tester. Highlights matches on the fly.
 +
 +
== Text Extraction ==
 +
=== PDF Conversion ===
 +
* [http://pdftotext.org/ pdftotext.org]
 +
pdftotext.org is the best online service for easily extracting text from your PDF files. Conversion from PDF to TXT is really fast thanks to our in-browser conversion architecture. Your PDF files are never uploaded to the Internet, so even private PDF files are safe to convert with this service. The conversion is done locally in your browser – you can even convert when you are offline! There is no need for any registration or sign-up, and the service will always be free to use.
 +
 +
* [https://github.com/coolwanglu/pdf2htmlEX pdf2htmlEX]
 +
: Convert PDF to HTML without losing text or format.
 +
 +
* [http://stackoverflow.com/questions/6187250/pdf-text-extraction PDF TEXT Extraction]
 +
: Lists several options.
 +
 +
* [http://capture2text.sourceforge.net/ Capture2Text]
 +
: Capture2Text enables users to do the following:
 +
# Optical Character Recognition (OCR)
 +
# Speech Recognition
 +
 +
* [https://sourceforge.net/projects/doctotext/ SILVERCODERS DocToText]
 +
: Extracts plain text from documents in all popular formats.
 +
 +
* [https://sourceforge.net/projects/detexter/ Detexter]
 +
: Detexter is an app designed to extract text from PDF files.
 +
 +
=== Data Scrape ===
 +
* [https://docs.google.com/a/evolvnet.com/document/d/18Q2THQvYCG2_n6nKVsZRHlaPG9iJ9NvLezOOQbEuAJs/edit?hl=en Almost Scraping: Web Scraping for Non-Programmers]
 +
: Tools and tips compiled by journalists from PBS and Omaha World-Herald.
 +
 +
* [http://blog.scrapinghub.com/2014/01/18/open-source-at-scrapinghub/ Open Source at Scrapinghub]
 +
: Scrapinghub's list of open source scraping projects.
 +
 +
* [http://sitestalker.net/ Sitestalker]
 +
: Monitor website links with ease. Sitestalker supervises websites and notifies you when your desired content hits the web.Stop wasting your time constantly refreshing websites.
 +
: Sitestalker is great for:
 +
:: Finding jobs
 +
:: Searching for an apartment
 +
:: Getting the best bargains
 +
:: Clipping
 +
 +
* [http://www.notprovided.eu/six-tools-web-scraping-use-data-journalism-creating-insightful-content/ Six tools for web scraping – To use for data journalism & creating insightful content]
 +
: Tools for gathering data from public sources.
 +
 +
=== Text Search ===
 +
* [http://geekdadaji.com/ geekDadaji - A SEARCH INITIATIVE]
 +
: Makes tools to search text content, including:
 +
# [https://sourceforge.net/projects/falcontextsearch/ FALCON - Text Search Java Project]: JSON based text search Java Project
 +
# [https://sourceforge.net/projects/hawksearch/ HAWK - PDF Text Search Java Project]: Taking initiative for Document Text Search
 +
 +
* [http://www.foolabs.com/xpdf/home.html Xpdf: A PDF Viewer for X]
 +
: Xpdf is an open source viewer for Portable Document Format (PDF) files.
 +
: Windows installer: [http://www.compgeom.com/~piyush/scripts/scripts.html Short Programs/Scripts] (Look for the xpdf3.exe / poppler.exe links in left sidebar.)
  
 
== See Also ==
 
== See Also ==

Revision as of 13:44, 14 May 2015

Business Intelligence

A section from the Creative Commons book Getting the Most Out of Information Systems: A Manager's Guide.

Data Analysis Tools

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
Take control of your R code. RStudio is the premier integrated development environment for R. It is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or over the web with RStudio Server. Download RStudio (for Windows, Mac, or Linux).
Open Refine is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however, it behaves more like a database.
Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...
HTSQL is designed for data analysts and other accidental programmers who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is free and open source software.
Jigsaw is a visual analytics system to help analysts and researchers better explore, analyze, and make sense of such document collections.

Regular Expressions

In theoretical computer science and formal language theory, a regular expression (abbreviated regex or regexp and sometimes called a rational expression) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.

RegEx Tutorials

Any non-trivial regex looks daunting to anybody not familiar with them. But with just a bit of experience, you will soon be able to craft your own regular expressions like you have never done anything else.
Regex is the gift that keeps giving. Once you learn it, you discover it comes in handy in many places where you hadn't planned to use it.

RegEx Tools

Regex101 allows you to create, debug, test and have your expressions explained for PHP, PCRE, JavaScript and Python. The website also features a community where you can share useful expressions.
Regular expression tester with syntax highlighting, contextual help, video tutorial, reference, and searchable community patterns.
Pythex is a real-time regular expression editor for Python, a quick way to test your regular expressions.
JavaScript regex tester. Highlights matches on the fly.

Text Extraction

PDF Conversion

pdftotext.org is the best online service for easily extracting text from your PDF files. Conversion from PDF to TXT is really fast thanks to our in-browser conversion architecture. Your PDF files are never uploaded to the Internet, so even private PDF files are safe to convert with this service. The conversion is done locally in your browser – you can even convert when you are offline! There is no need for any registration or sign-up, and the service will always be free to use.

Convert PDF to HTML without losing text or format.
Lists several options.
Capture2Text enables users to do the following:
  1. Optical Character Recognition (OCR)
  2. Speech Recognition
Extracts plain text from documents in all popular formats.
Detexter is an app designed to extract text from PDF files.

Data Scrape

Tools and tips compiled by journalists from PBS and Omaha World-Herald.
Scrapinghub's list of open source scraping projects.
Monitor website links with ease. Sitestalker supervises websites and notifies you when your desired content hits the web.Stop wasting your time constantly refreshing websites.
Sitestalker is great for:
Finding jobs
Searching for an apartment
Getting the best bargains
Clipping
Tools for gathering data from public sources.

Text Search

Makes tools to search text content, including:
  1. FALCON - Text Search Java Project: JSON based text search Java Project
  2. HAWK - PDF Text Search Java Project: Taking initiative for Document Text Search
Xpdf is an open source viewer for Portable Document Format (PDF) files.
Windows installer: Short Programs/Scripts (Look for the xpdf3.exe / poppler.exe links in left sidebar.)

See Also

References