scipy interview questions
Top scipy frequently asked interview questions
They both seem exceedingly similar and I'm curious as to which package would be more beneficial for financial data analysis.
Source: (StackOverflow)
How do I install SciPy on my system?
For the NumPy part (that SciPy depends on) there is actually an installer for 64 bit Windows: numpy-1.3.0.win-amd64-py2.6.msi (is direct download URL, 2310144 bytes).
Running the SciPy superpack installer results in this
message in a dialog box:
Cannot install. Python version 2.6 required, which was not found in the registry.
I already have Python 2.6.2 installed (and a working Django installation
in it), but I don't know about any Registry story.
The registry entries seem to already exist:
REGEDIT4
[HKEY_LOCAL_MACHINE\SOFTWARE\Python]
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore]
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6]
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Help]
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Help\Main Python Documentation]
@="D:\\Python262\\Doc\\python262.chm"
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath]
@="D:\\Python262\\"
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\InstallPath\InstallGroup]
@="Python 2.6"
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\Modules]
[HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\PythonPath]
@="D:\\Python262\\Lib;D:\\Python262\\DLLs;D:\\Python262\\Lib\\lib-tk"
What I have done so far:
Step 1
Downloaded the NumPy superpack installer
numpy-1.3.0rc2-win32-superpack-python2.6.exe
(direct download URL, 4782592 bytes). Running this installer
resulted in the same message, "Cannot install. Python
version 2.6 required, which was not found in the registry.".
Update: there is actually an installer for NumPy that works - see beginning of the question.
Step 2
Tried to install NumPy in another way. Downloaded the zip
package numpy-1.3.0rc2.zip (direct download URL, 2404011 bytes),
extracted the zip file in a normal way to a temporary
directory, D:\temp7\numpy-1.3.0rc2 (where setup.py and
README.txt is). I then opened a command line window and:
d:
cd D:\temp7\numpy-1.3.0rc2
setup.py install
This ran for a long time and also included use of cl.exe
(part of Visual Studio). Here is a nearly 5000 lines long
transcript (230 KB).
This seemed to work. I can now do this in Python:
import numpy as np
np.random.random(10)
with this result:
array([ 0.35667511, 0.56099423, 0.38423629, 0.09733172, 0.81560421,
0.18813222, 0.10566666, 0.84968066, 0.79472597, 0.30997724])
Step 3
Downloaded the SciPy superpack installer, scipy-0.7.1rc3-
win32-superpack-python2.6.exe (direct download URL, 45597175
bytes). Running this installer resulted in the message
listed in the beginning
Step 4
Tried to install SciPy in another way. Downloaded the zip
package scipy-0.7.1rc3.zip (direct download URL, 5506562
bytes), extracted the zip file in a normal way to a
temporary directory, D:\temp7\scipy-0.7.1 (where setup.py
and README.txt is). I then opened a command line window and:
d:
cd D:\temp7\scipy-0.7.1
setup.py install
This did not achieve much - here is a transcript (about 95
lines).
And it fails:
>>> import scipy as sp2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named scipy
Platform: Python 2.6.2 installed in directory D:\Python262,
Windows XP 64 bit SP2, 8 GB RAM, Visual Studio 2008
Professional Edition installed.
The startup screen of the installed Python is:
Python 2.6.2 (r262:71605, Apr 14 2009, 22:46:50) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Value of PATH, result from SET in a command line window:
Path=D:\Perl64\site\bin;D:\Perl64\bin;C:\Program Files (x86)\PC Connectivity Solution\;D:\Perl\site\bin;D:\Perl\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;d:\Program Files (x86)\WinSCP\;D:\MassLynx\;D:\Program Files (x86)\Analyst\bin;d:\Python262;d:\Python262\Scripts;D:\Program Files (x86)\TortoiseSVN\bin;D:\Program Files\TortoiseSVN\bin;C:\WINDOWS\system32\WindowsPowerShell\v1.0;D:\Program Files (x86)\IDM Computer Solutions\UltraEdit\
Source: (StackOverflow)
At the heart of an application (written in Python and using NumPy) I need to rotate a 4th order tensor. Actually, I need to rotate a lot of tensors many times and this is my bottleneck. My naive implementation (below) involving eight nested loops seems to be quite slow, but I cannot see a way to leverage NumPy's matrix operations and, hopefully, speed things up. I've a feeling I should be using np.tensordot
, but I don't see how.
Mathematically, elements of the rotated tensor, T', are given by: T'ijkl = Σ gia gjb gkc gld Tabcd with the sum being over the repeated indices on the right hand side. T and Tprime are 3*3*3*3 NumPy arrays and the rotation matrix g is a 3*3 NumPy array. My slow implementation (taking ~0.04 seconds per call) is below.
#!/usr/bin/env python
import numpy as np
def rotT(T, g):
Tprime = np.zeros((3,3,3,3))
for i in range(3):
for j in range(3):
for k in range(3):
for l in range(3):
for ii in range(3):
for jj in range(3):
for kk in range(3):
for ll in range(3):
gg = g[ii,i]*g[jj,j]*g[kk,k]*g[ll,l]
Tprime[i,j,k,l] = Tprime[i,j,k,l] + \
gg*T[ii,jj,kk,ll]
return Tprime
if __name__ == "__main__":
T = np.array([[[[ 4.66533067e+01, 5.84985000e-02, -5.37671310e-01],
[ 5.84985000e-02, 1.56722231e+01, 2.32831900e-02],
[ -5.37671310e-01, 2.32831900e-02, 1.33399259e+01]],
[[ 4.60051700e-02, 1.54658176e+01, 2.19568200e-02],
[ 1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
[ 2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
[[ -5.35577630e-01, 1.95558600e-02, 1.31108757e+01],
[ 1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
[ 1.31108757e+01, -6.67615000e-03, 6.90486240e-01]]],
[[[ 4.60051700e-02, 1.54658176e+01, 2.19568200e-02],
[ 1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
[ 2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
[[ 1.57414726e+01, -3.86167500e-02, -1.55971950e-01],
[ -3.86167500e-02, 4.65601977e+01, -3.57741000e-02],
[ -1.55971950e-01, -3.57741000e-02, 1.34215636e+01]],
[[ 2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
[ -1.49072770e-01, -3.63410500e-02, 1.32039847e+01],
[ -7.38843000e-03, 1.32039847e+01, 1.38172700e-02]]],
[[[ -5.35577630e-01, 1.95558600e-02, 1.31108757e+01],
[ 1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
[ 1.31108757e+01, -6.67615000e-03, 6.90486240e-01]],
[[ 2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
[ -1.49072770e-01, -3.63410500e-02, 1.32039847e+01],
[ -7.38843000e-03, 1.32039847e+01, 1.38172700e-02]],
[[ 1.33639532e+01, -1.26331100e-02, 6.84650400e-01],
[ -1.26331100e-02, 1.34222177e+01, 1.67851800e-02],
[ 6.84650400e-01, 1.67851800e-02, 4.89151396e+01]]]])
g = np.array([[ 0.79389393, 0.54184237, 0.27593346],
[-0.59925749, 0.62028664, 0.50609776],
[ 0.10306737, -0.56714313, 0.8171449 ]])
for i in range(100):
Tprime = rotT(T,g)
Is there a way to make this go faster? Making the code generalise to other ranks of tensor would be useful, but is less important.
Source: (StackOverflow)
When I use matplotlib's imshow() method to represent a small numpy matrix, it ends up doing some smoothing between pixels. Is there any way to disables this? It makes my figure's misleading in presentations.
The figure above is a 28x28 image, so I should be seeing large squares of single colors representing each pixel (as matlab would display it when using imagesc()). But Instead, the pixels seem to be blurred with neighboring pixels. Is there a way to disable this behavior?
Source: (StackOverflow)
Does anyone have successful experience reading binary Matlab .mat files in Python?
(I've seen that scipy
has alleged support for reading .mat files, but I'm unsuccessful with it. I installed scipy
version 0.7.0, and I can't find the loadmat()
method)
Source: (StackOverflow)
scipy
appears to provide most (but not all [1]) of numpy
's functions in its own namespace. In other words, if there's a function named numpy.foo
, there's almost certainly a scipy.foo
. Most of the time, the two appear to be exactly the same, oftentimes even pointing to the same function object.
Sometimes, they're different. To give an example that came up recently:
numpy.log10
is a ufunc that returns NaNs for negative arguments;
scipy.log10
returns complex values for negative arguments, and doesn't appear to be a ufunc.
The same can be said about log
, log2
and logn
, but not about log1p
[2].
On the other hand, numpy.exp
and scipy.exp
appear to be different names for the same ufunc. This is also true of scipy.log1p
and numpy.log1p
.
Another example is numpy.linalg.solve
vs scipy.linalg.solve
. They're similar, but the latter offers some additional features over the former.
Why the apparent duplication? If this is meant to be a wholesale import of numpy
into the scipy
namespace, why the subtle differences in behaviour and the missing functions? Is there some overarching logic that would help clear up the confusion?
[1] numpy.min
, numpy.max
, numpy.abs
and a few others have no counterparts in the scipy
namespace.
[2] Tested using numpy
1.5.1 and scipy
0.9.0rc2.
Source: (StackOverflow)
It is possible to install NumPy with pip using pip install numpy
.
Is there a similar possibility with SciPy? (Doing pip install scipy
does not work.)
Update
The package SciPy is now available to be installed with pip
, so the question is not relevant anymore.
Source: (StackOverflow)
I wonder if there is a direct way to import the contents of a csv file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame? Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?
Source: (StackOverflow)
Lets assume we have a dataset which might be given approximately by
import numpy as np
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
Therefore we have a variation of 20% of the dataset. My first idea was to use the UnivariateSpline function of scipy, but the problem is that this does not consider the small noise in a good way. If you consider the frequencies, the background is much smaller than the signal, so a spline only of the cutoff might be an idea, but that would involve a back and forth fourier transformation, which might result in bad behaviour.
Another way would be a moving average, but this would also need the right choice of the delay.
Any hints/ books or links how to tackle this problem?

Source: (StackOverflow)
Is there any general-purpose form of short-time Fourier transform with corresponding inverse transform built into SciPy or NumPy or whatever?
There's the pyplot specgram
function in matplotlib, which calls ax.specgram()
, which calls mlab.specgram()
, which calls _spectral_helper()
:
#The checks for if y is x are so that we can use the same function to
#implement the core of psd(), csd(), and spectrogram() without doing
#extra calculations. We return the unaveraged Pxy, freqs, and t.
but
This is a helper function that implements the commonality between the
204 #psd, csd, and spectrogram. It is
NOT meant to be used outside of mlab
I'm not sure if this can be used to do an STFT and ISTFT, though. Is there anything else, or should I translate something like these MATLAB functions?
I know how to write my own ad-hoc implementation; I'm just looking for something full-featured, which can handle different windowing functions (but has a sane default), is fully invertible with COLA windows (istft(stft(x))==x
), tested by multiple people, no off-by-one errors, handles the ends and zero padding well, fast RFFT implementation for real input, etc.
Source: (StackOverflow)
I write scientific software in Numpy/Scipy/Matplotlib. Having developed applications on my home computer, I am now interested in writing simple web applications. Example: user uploads image or audio file, my program processes it using Numpy/Scipy, and output is displayed on the browser using Matplotlib, or perhaps the user can download a processed file.
I already pay for hosting that does have Python 2.4.3 installed, but no Numpy/Scipy. I don't have shell access via command line, either. Just drag-and-drop FTP. Pretty limited, but I can get simple Python/CGI scripts working.
Surprisingly, a web search revealed few suitable options for web hosting with these capabilities already built in. (Please guide me if I am wrong.) I am learning about the Google App Engine, but I still don't have a full understanding about its tools and limitations. What the web did tell me is that others have similar concerns.
Hoping for solutions, I thought I would ask these simple questions to the awesome SO community:
Is there a simple way of installing numpy (or any third-party package/library) onto my already hosted space? I know the Python path on my hosted space, and I know the relevant Python/Numpy directories on my home computer. Can I simply copy files over and have it work? Both local and remote systems run Ubuntu.
What hosting sites exist (either free or paid) which have Numpy/Matplotlib installed or, if not installed, the possibility of installing it? Are there any documented sites that you can reference with working applications, no matter how simple?
Can Google App Engine help me in any way? Or is it totally for something else? Have you or others used it to write scientific applications in Python/Numpy? If so, could you reference them?
Thank you for your help.
EDIT: After the useful answers below, I bought the $20 plan at Slicehost, and I love it so far! (I first tried Amazon EC2. I must be stupid, but I just couldn't get it to work.) Setting up the Ubuntu server with Apache took mere hours (and I'm an Apache novice). It allows me to do exactly what I wanted with Python plus much more. I now have my own remote repository for version control, too. Thanks again!
EDIT 2: Nearly two years later, I tried Linode and EC2 (again). Linode is great. EC2 seemed easier this time around -- maybe it's just added experience, or maybe it's the improvements that Amazon made to the AWS management console. For those interested in Numpy/Scipy/Matplotlib/Audiolab, here is my Ubuntu cheat sheet whenever I launch an EC2 instance:
ec2:~$ sudo aptitude install build-essential python-scipy ipython
python-matplotlib python-dev python-setuptools libsndfile-dev
libasound2-dev mysql-server python-mysqldb
Upload scikits.audiolab-0.11.0
ec2:~/scikits.audiolab-0.11.0$ sudo python setup.py install
ec2:~$ sudo rm -rf scikits.audiolab-0.11.0
ec2:~$ nano .ipython/ipy_user_conf.py
ip.ex('import matplotlib; matplotlib.use("Agg"); import scipy, pylab,
scipy.signal as sig, scipy.linalg as lin, scipy.sparse as spar,
os, sys, MySQLdb, boto; from scikits import audiolab')
import ipy_greedycompleter
import ipy_autoreload
Source: (StackOverflow)
After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:
# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()
# Normalize image to between 0 and 255
image = image/(image.max()/255.0)
Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize()
doesn't seem to be related.
Source: (StackOverflow)
I can't seem to find any python libraries that do multivariate regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).
For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93 4.32 0.21 0.50
-8.0 -4.46 -7.65 -0.94 11.40 4.43 0.16 0.49
-8.0 -11.54 -10.03 -1.03 18.18 4.28 0.21 0.55
How would I regress these in python, to get the linear regression formula:
Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c
Source: (StackOverflow)
In a project using scipy and numpy, should I use scipy.pi
, numpy.pi
, or math.pi
?
Ten minutes on google and Stack Overflow doesn't give an answer.
Source: (StackOverflow)
is there a more efficient way to take an average of an array in prespecified bins? for example, i have an array of numbers and an array corresponding to bin start and end positions in that array, and I want to just take the mean in those bins? I have code that does it below but i am wondering how it can be cut down and improved. thanks.
from scipy import *
from numpy import *
def get_bin_mean(a, b_start, b_end):
ind_upper = nonzero(a >= b_start)[0]
a_upper = a[ind_upper]
a_range = a_upper[nonzero(a_upper < b_end)[0]]
mean_val = mean(a_range)
return mean_val
data = rand(100)
bins = linspace(0, 1, 10)
binned_data = []
n = 0
for n in range(0, len(bins)-1):
b_start = bins[n]
b_end = bins[n+1]
binned_data.append(get_bin_mean(data, b_start, b_end))
print binned_data
Source: (StackOverflow)