2014年4月29日星期二

Lecture about Big Data

Business sas
what we need?
statistics, data mining, forecasting, text analyzing, optimization, visualize

Who owns data?
companies own data, but not using it. dark data
earned paid open

How to analysis?
traditional : database -> analysis
in database 
in memory

What to do?
invest in and nurture the concept of  data analytic

What's the value?
help improve business
machine learning helps retrieve info

Hal Varian 

Technology
nell: never ending learning language
watson: 机器学习能力的一种体现,对自然语言的理解能力和联想能力

data visualization helps understanding
power of the plane

THE WORLD'S TOP 10 MOST INNOVATIVE COMPANIES IN BIG DATA





2014年4月27日星期日

Python windows & notepad++ configuration

1, Download & Install Python
https://www.python.org/downloads/
Don't forget to add python to path when installing...

2, Add HTML tag plugin to notepad++


3, setting notepad++ run command
-------------------------------------------------------------
下面介绍一下如何配置Python环境。
1.运行Notepad++,选择”运行“;
2.在弹出的窗口里填入:cmd /k cd "$(CURRENT_DIRECTORY)" &  python "$(FULL_CURRENT_PATH)" & ECHO. & PAUSE & EXIT 3.点击保存,填入这个命令的名称(随意取):Run Python;
4.定义这个命令的快捷键(注意不要和已有的快捷键冲突),保存即可。然后再选择”运行“,发现多出了”Run Python“这一项。
  接下来便可以根据自己的喜好配置Notepad++中的Python着色方案。
-------------------------------------------------------------
Install extend packages
http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Dimension and axis

In numpy arrays, dimensionality refers to the number of axes needed to index it, not the dimensionality of any geometrical space. For example, you can describe the locations of points in 3D space with a 2D array:
#axis可以看做是组成数据的层次,有点像树,比如第一层是2;第二层是3,即每个第二层分支有3个子叶,那么axis是(2, 3)。
array([[0, 0, 0],
       [1, 2, 3],
       [2, 2, 2],
       [9, 9, 9]])
Which has shape of (4, 3) and dimension 2. But it can describe 3D space because the length of each row (axis 1) is three, so each row can be the x, y, and z component of a point's location. The length of axis 0 indicates the number of points (here, 4). However, that is more of an application to the math that the code is describing, not an attribute of the array itself. In mathematics, the dimension of a vector would be its length (e.g., x, y, and z components of a 3d vector), but in numpy, any "vector" is really just considered a 1d array or varying length. The array doesn't care what the dimension of the space (if any) being described is.
You can play around with this, and see the number of dimensions and shape of an array like so:
In [262]: a = np.arange(9)

In [263]: a
Out[263]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [264]: a.ndim    # number of dimensions
Out[264]: 1

In [265]: a.shape
Out[265]: (9,)

In [266]: b = np.array([[0,0,0],[1,2,3],[2,2,2],[9,9,9]])

In [267]: b
Out[267]: 
array([[0, 0, 0],
       [1, 2, 3],
       [2, 2, 2],
       [9, 9, 9]])

In [268]: b.ndim
Out[268]: 2

In [269]: b.shape
Out[269]: (4, 3)
Arrays can have many dimensions, but they become hard to visualize above two or three:
In [276]: c = np.random.rand(2,2,3,4)

In [277]: c
Out[277]: 
array([[[[ 0.33018579,  0.98074944,  0.25744133,  0.62154557],
         [ 0.70959511,  0.01784769,  0.01955593,  0.30062579],
         [ 0.83634557,  0.94636324,  0.88823617,  0.8997527 ]],

        [[ 0.4020885 ,  0.94229555,  0.309992  ,  0.7237458 ],
         [ 0.45036185,  0.51943908,  0.23432001,  0.05226692],
         [ 0.03170345,  0.91317231,  0.11720796,  0.31895275]]],


       [[[ 0.47801989,  0.02922993,  0.12118226,  0.94488471],
         [ 0.65439109,  0.77199972,  0.67024853,  0.27761443],
         [ 0.31602327,  0.42678546,  0.98878701,  0.46164756]],

        [[ 0.31585844,  0.80167337,  0.17401188,  0.61161196],
         [ 0.74908902,  0.45300247,  0.68023488,  0.79672751],
         [ 0.23597218,  0.78416727,  0.56036792,  0.55973686]]]])

In [278]: c.ndim
Out[278]: 4

In [279]: c.shape
Out[279]: (2, 2, 3, 4)

VI Keyboard Shortcuts

Ref: http://www.keyxl.com/aaab462/105/VIM-Text-Editor-keyboard-shortcuts.htm

VI Keyboard Shortcuts

 write buffer to file:
:w
Insert
iInserts text to the left of the cursor.
IInserts text at the beginning of the line, no matter where the cursor is positioned on the current line.
Append
aBegins inserting after the character (append) on which the cursor is positioned.
ABegins inserting at the end of the current line, no matter where the cursor is positioned on that line.
Open
oBegins inserting text on a new, empty line that is opened for you, below the current line. This is the only command that will allow you to insert text BELOW the LAST line of the file.
OBegins inserting text on a new, empty line that is opened for you, above the current line. This is the only command that will allow you to insert text ABOVE the FIRST line of the file.
Deleting,copying and changing
dDelete text. (see explanation above)
yCopy text (that is, yank it into a holding area for later use). (see explanation above)
cChange text from one thing to another, which you will type. (see explanation above)
!Filter text through a program.
<Shift a region of text to the left.
>Shift a region of text to the right.
Single Key Movements
hMove cursor to the left one character.
lMove cursor to the right one character.
jMove cursor down one line.
kMove cursor up one line.
^Move cursor to the beginning of the line.
$Move cursor to the end of the current line.
1GMove cursor to the first line of your document. Other numbers will move to the line specified by number (ex. 50G goes to the 50th line).
GMove cursor to the last line of your file.
CTRL UMove cursor up in file 12 lines. Hold down the key marked CTRL (stands for control) and type U. CTRL is like another shift key.
CTRL DMove cursor down in file 15 lines.
wMove cursor forward to the next word, stopping at punctuation.
WMove cursor forward to the next word, ignoring punctuation.
eMove cursor forward to the end of the word, stopping at punctuation.
EMove cursor forward to the end of the word, ignores punctuation.
bMove cursor backwards to the previous word, stopping at punctuation.
BMove cursor backwards to the previous word, ignores punctuation.
HMove cursor to the top line of the screen, (as opposed to the top of the document which may not be the same place).
MMove cursor to the middle of the screen.
LMove cursor to the last line on the screen.
%Move cursor to the matching parenthesis, bracket or brace. Great for debugging programs.
(Move cursor to the beginning of the previous sentence (where a punctuation mark and two spaces define a sentence).
)Move cursor to the beginning of the next sentence.
{Move cursor to the beginning of the current paragraph.
}Move cursor to the beginning of the next paragraph.
;Repeat the last f or F command (see below).
Almost Single Key Movements
'Move cursor to a previously marked location in the file. (ex. ma marks the location with the letter a, so a (apostrophe a) moves back to that location).
fFind the character corresponding to the next keystroke typed. Move the cursor to the next occurrence of that character (on the current line only).
FSame as f but movement is backwards.
Useful
xDelete character(s) to the right of the cursor, starting with the one beneath it.
rReplace the character under the cursor with the next character you type. This can be a very useful command. If you wanted to split up a line between two words, you might put the cursor on the blank space before the word you would like to go on the next line and type r . This would replace the space between the words with a carriage return and put the rest of the line onto a new line.
JJoin lines; the opposite of the line splitting operation above. This will join the current line with the next line in your file. Also very useful.
RReplace lines; puts you in INSERT mode but types over the characters that are already on the current line.
pPaste line(s) you deleted (or yanked) back into the file. This is an excellent command if you want to move a few lines somewhere else in your file. Just type 3dd to delete three lines, for example, and then move to where you want those lines to be and type p to paste the lines back into your file below the cursor.
.The period . command repeats the last text modification command, whatever it may have been (insert, deletion, etc).
:r filename RETURNRead a file into the current file being edited. The file be added gets placed below the current cursor position. Please note the colon : before the r in this command.
CTRL LRedraw the screen. If somebody writes to you while you are in the middle of vi and junk appears all over your screen, dont panic, it did not hurt your file, but you will have to hold down the CTRL key and type L to clean it up (CTRL L).
d$Delete (including the current character), to the end of the line.
d^Delete (excluding the current character), to the beginning of the line.
dwDelete a word(s), stops at punctuation.
dWDelete a word(s), ignoring punctuation.
deDelete to the end of next word.
ddDelete a line(s).
dGDelete from the current line to the end of the document. CAREFUL: Slightly dangerous.
dHDelete from the current line to the line shown at the top of the screen.
Search and Replace
/theFinds the next occurence of the. This will also find their, them, another, etc.
?theFinds the previous occurence of the.
nRepeats the last search command. Finds the Next occurence.
d/theDeletes until the next occurence of the. This is to demonstrate how the delete prefix can be used with any cursor movement command.
:g/oldword/s//newword/gcThis will find all occurences of oldword and replace them with newword. The optional c at the end of the command tells vi that you would like to confirm each change. Vi will want you to type in y to make the change or n to skip that replacement. Great for spelling fixes.
Exit
ESC :wq RETURNSave and exit VI
ESC :q! RETURNExit WITHOUT saving changes