Difference between revisions of "Computer Productivity Hacks"

From TeleCafeWiki
Jump to navigation Jump to search
(→‎Data Scrape: Open Source at Scrapinghub)
(More wget tools and links; OpenRefine, Data Wrangler.)
Line 13: Line 13:
 
* [http://lifehacker.com/161202/geek-to-live--mastering-wget Geek to Live: Mastering Wget]
 
* [http://lifehacker.com/161202/geek-to-live--mastering-wget Geek to Live: Mastering Wget]
 
: Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command: <code>wget -m http://website.tld</code>
 
: Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command: <code>wget -m http://website.tld</code>
 +
 +
* [http://fosswire.com/post/2008/04/more-advanced-wget-usage/ More advanced wget usage]
 +
: Tips for mirroring specific directories, update only changed files, etc.
 +
 +
* [https://addons.mozilla.org/en-US/firefox/addon/cliget/ cliget] (A Firefox add-on.)
 +
: Download files using curl or wget. This addon generates curl/wget commands that emulate the request as though it's coming from your browser allowing you to download protected files directly to a separate machine (e.g. server).
  
 
=== Windows Command Line ===
 
=== Windows Command Line ===
Line 85: Line 91:
  
 
== Data Analysis Tools ==
 
== Data Analysis Tools ==
 +
* [http://openrefine.org/ OpenRefine] (Formerly [http://code.google.com/p/google-refine/Google Refine].)
 +
'''[[wikipedia:OpenRefine|Open Refine]]''' is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as [[wikipedia:Data wrangling|data wrangling]]. It is similar to [[wikipedia:Spreadsheet|spreadsheet]] applications (and can work with spreadsheet file formats), however, it behaves more like a database.
 +
 +
* [http://vis.stanford.edu/wrangler/ Data Wrangler] (Stanford Visualization Group)
 +
: Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...
 +
 
* [http://htsql.org/ HTSQL&mdash;A Database Query Language]
 
* [http://htsql.org/ HTSQL&mdash;A Database Query Language]
 
: HTSQL is designed for data analysts and other ''accidental programmers'' who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is ''free and open source'' software.
 
: HTSQL is designed for data analysts and other ''accidental programmers'' who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is ''free and open source'' software.

Revision as of 13:43, 29 March 2014

Command Line

Mounting shared drives and connecting to remote resources is something you can easily do from the Windows GUI. With our quick guide to the command prompt, however, you can more easily automate large tasks.
Quick super fast course in using the command line. It is intended to be done rapidly in about a day or two, and not meant to teach you advanced shell usage.

WGET

GNU Wget is a free network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols. It works non-interactively, thus enabling work in the background, after having logged off.
Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command: wget -m http://website.tld
Tips for mirroring specific directories, update only changed files, etc.
Download files using curl or wget. This addon generates curl/wget commands that emulate the request as though it's coming from your browser allowing you to download protected files directly to a separate machine (e.g. server).

Windows Command Line

Cmdlets are the heart-and-soul of Windows PowerShell, Microsoft's latest command shell/scripting language.
Robocopy: A robust file copy command for the Windows command line.
Each command is linked to more info about the particular command.
The xcopy command is a Command Prompt command used to copy one or more files and/or folders from one location to another location.

Generate File List

Example: C:\Users\me\Downloads\MyFolder> dir /b > filelist.txt
(The text in orange shows the command used once you've navigated into the directory from which you want to generate the list of file names.)
This tutorial contains several working answers for using Windows PowerShell to list files and folders.
Works with cmd.exe, but doesn't seem to work with Windows PowerShell.

Folder & File Compression

Example: for /d %%X in (*) do "c:\Program Files\7-Zip\7z.exe" a "%%X.zip" "%%X\"
To compress a folder without using any particular compression software.

Text Extraction

Capture2Text enables users to do the following:
  1. Optical Character Recognition (OCR)
  2. Speech Recognition
Lists several options.
Extracts plain text from documents in all popular formats.
Detexter is an app designed to extract text from PDF files.

Data Scrape

Tools and tips compiled by journalists from PBS and Omaha World-Herald.
Scrapinghub's list of open source scraping projects.
Tools for gathering data from public sources.

Text Search

Makes tools to search text content, including:
  1. FALCON - Text Search Java Project: JSON based text search Java Project
  2. HAWK - PDF Text Search Java Project: Taking initiative for Document Text Search
Xpdf is an open source viewer for Portable Document Format (PDF) files.
Windows installer: Short Programs/Scripts (Look for the xpdf3.exe / poppler.exe links in left sidebar.)

PDF Conversion

Convert PDF to HTML without losing text or format.

Data Analysis Tools

Open Refine is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however, it behaves more like a database.

Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...
HTSQL is designed for data analysts and other accidental programmers who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is free and open source software.
Jigsaw is a visual analytics system to help analysts and researchers better explore, analyze, and make sense of such document collections.

Maintenance

Boot Disks

Rufus is an utility that helps format and create bootable USB flash drives, such as USB keys/pendrives, memory sticks, etc.

Network Issues

Post reviews various "fixes" found all over the web, and which "fix" actually worked for the post's author.
Path MTU Discovery (PMTUD) in Windows just doesn’t seem to figure out the MTU for a given path. So Windows uses the default. For the most part this doesn’t affect anyone. But failure of PMTUD will result in some websites not loading correctly, having trouble connecting to normally reliable online services and general Internet weirdness.

Google Drive

Select what you want to strike and click Alt+Shift+5. (Option+Shift+5 for Mac).
Ctrl+? to see other such keyboard shortcuts.

See Also