Rating: **5.0**/5.0 (21 Votes)

Category: Essay

I apologize upfront if this question is too broad. I come from the MATLAB world and have relatively little experience with Python.

After having spent some time reading about several Python-based environments and distributions for scientific computing, I feel that I still don't fully understand the landscape of solutions or the precise relationship between some notable packages, including:

- Do any of the above packages provide similar functionality? Do they complement each other?
- Does the installation of any of them include or require the installation of any of the others? If so, which ones include or require which?

Less importantly, are there any other packages similar to the ones above that provide similar functionality?

Thanks in advance

Scientific computing with Python is taking a plain vanilla language and bolting on a bunch of modules, each of which implement some aspect of the functionality of MATLAB. As such the experience with Python scientific programming is a little incohesive c.f. MATLAB. However Python as a language is much cleaner. So it goes.

The basic necessary modules for scientific computing in Python are Numpy. Matplotlib. SciPy and if you are doing 3d plotting, then Mayavi/VTK. These modules all depend on Numpy.

*Numpy* Implements a new array type that behave similar to MATLAB arrays (i.e. fast vector calculations). It also defines a load of functions to do these calculations which are usually named the same as similar functions in MATLAB.

*Matplotlib* Allows for 2d plotting with very similar commands to MATLAB. Matplotlib also defines *pylab*. which is a module that - with a single import - brings most of the Numpy and Matplotlib functions into the global namespace. This is useful for rapid/interactive scripting where you don't want to be typing lots of namespace prefixes.

*SciPy* is a collection of Python modules arranged under the SciPy umbrella that are useful to scientists. Fitting routines are supplied in SciPy modules. Numpy is part of Scipy.

*Spyder* is a desktop IDE (based on QT) that loosely tries to emulate MATLAB IDE. It is part of the Python-XY distribution.

*IPython* provides an enhanced interactive Python shell which is useful for trying out code and running your scripts and interacting with the results. It can now be served to a web interface as well as the traditional console. It is also embedded in the Spyder IDE.

Getting all these modules running on your computer can be time consuming and so there are a few distributions that package them (plus many other modules) up for you.

*Python-XY*. *WinPython*. *Enthought* and more recently *Anaconda* are all full package distributions that include all the core modules, although Enthought does not come with Spyder.

*Sage* is another programming environment which is served over the web or via a command line and also comes as a full package including lots of other modules. Traditionally it came as a VMWare image based on an install of Linux. Although you are writing Python in the Sage environment, it's a little different to ordinary Python programming, it kind of defines its own language and methodology based on Python.

If you are using Windows I would install WinPython. It installs everything that you need including Scipy and Spyder (which is the best replacement for MATLAB for Python IMHO) and because it is designed to be standalone it will not interfere with other installs of Python you may have on your system. If you are on OSX, Enthought is probably the best way to go - Spyder can be installed separately using e.g. MacPorts. For Linux you can install the components (Numpy, SciPy, Spyder, Matplotlib) separately.

I personally don't like the Sage way of working with Python 'hidden under the hood' but you may prefer that.

This link may be usefull: https://www.cfa.harvard.edu/

It's the page of an astrophysicist at Harvard. It gives the point of view of someone switching from ITT-VIS IDL to python, on OS-X (but most tips also work on other operating systems).

*EDIT:* It seems the page was taken down. An alternative good introduction to python for a scientist/engineer is in this document (big PDF warning): http://stsdas.stsci.edu/perry/pydatatut.pdf Hope this one will not be taken down!

*Size* 2.7Mb *Date* Jan 3, 2007

1 Analytic and Numeric Solutions; Chaos

Many equations that describe the behavior of physical systems cannot be solved analytically. In fact, it is said that “most” can not. Numerical methods enable us to obtain solutions that would otherwise elude us. The results may be valuable not only because they deliver quantitative answers; they can also provide new insight. A pocket calculator or a short computer program suﬃces for a simple demonstration. If we repeatedly take the sine function starting with an arbitrary value, xn+1 = sin(xn ), the number will decrease and slowly approach zero. For example, x = 1.000, 0.841, 0.746, 0.678, 0.628. (The values are rounded to three digits.) The sequence decreases because sin(x)/x < 1 for any x = 0. Hence, with each iteration the value becomes smaller and smaller and approaches a constant. But if we try instead xn+1 = sin(2.5xn ) the iteration is no longer driven toward a constant. For example, x = 1.000, 0.598, 0.997, 0.604, 0.998, 0.602, 0.998, 0.603, 0.998, 0.603, 0.998, 0.603. The iteration settles into a periodic behavior. There is no reason for the iteration to approach anything at all. For example, xn+1 = sin(3xn ) produces x = 1.000, 0.141, 0.411, 0.943, 0.307, 0.796, 0.685, 0.885, 0.469, 0.986, 0.181, 0.518. One thousand iterations later x = 0.538, 0.999, 0.144, 0.418, 0.951, 0.286. This sequence does not approach a constant value, it does not grow indeﬁnitely, and it is not periodic, even when continued over many more iterations. A behavior of this kind is called “chaotic.” Can it be true that the iteration does not settle to a constant or into a periodic pattern, or is this an artifact of numerical inaccuracies? Consider the simple iteration yn+1 = 1 − |2yn − 1| known as “tent map.” 1.

Table 2-I: Newton’s method applied to sin(3x) − x = 0 with two diﬀerent starting values.

Chapter 2 n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 yn y1 = 1./3. 1 0.333333 0.111111 0.0370371 0.0123458 0.00411542 0.00137213 0.000458031 0.000153983 5.39401E-05 2.32047E-05 1.81843E-05 2.69602E-05 5.07843E-05 0.000100523.

3 Roundoﬀ and Numb er Representation

In a computer every real number is represented by a sequence of bits, most commonly 32 bits (4 bytes). One bit is for the sign, and the distribution of bits for mantissa and exponent can be platform dependent. Almost universally however a 32-bit number will have 8 bits for the exponent and 23 bits for the mantissa, leaving one bit for the sign (as illustrated in ﬁgure 1). In the decimal system this corresponds to a maximum/minimum exponent of ±38 and approximately 7 decimal digits (at least 6 and at most 9). For a 64-bit number (8 bytes) there are 11 bits for the exponent (±308) and 52 bits for the mantissa, which gives around 16 decimal digits of precision (at least 15 and at most 17). |0|01011110|00111000100010110000010| sign exponent mantissa +1.23456E-6 sign mant. exp.

Chapter 3 single double bytes 4 8 bits for mantissa 23 52 bits for exponent 8 11 signiﬁcant decimals 6–9 15–17 maximum ﬁnite 3.4E38 1.8E308 minimum normal 1.2E-38 2.2E-308 minimum subnormal 1.4E-45 4.9E-324.

It is helpful to reserve a few bit patterns for “exceptions.” There is a bit pattern for numbers exceeding the maximum representable number, a bit pattern for Inf (inﬁnity), -Inf, and NaN (not a number). For example, 1./0. will produce Inf. An overﬂow is also an Inf. There is a positive and a negative zero. If a zero is produced as an underﬂow of a tiny negative.

Problem: The convergence test indicates that ||u2N − uN || → 0 as the resolution N goes to inﬁnity (roundoﬀ ignored). Does this mean limN →∞ ||uN − u|| → 0, where u is the exact, correct answer.

Co de Sources. • The Guide to Available Mathematical Software http://math.nist.gov maintains a directory of subroutines from numerous public and proprietary repositories. • NETLIB at www.netlib.org oﬀers free sources by various authors and of varying quality. • A more specialized, refereed set of routines is available to the public from the Col lected Algorithms of the ACM at www.acm.org/calgo. • Numerical Recipes, www.nr.com, explains and provides a broad and selective collection of reliable subroutines. (Sporadic weaknesses in the ﬁrst edition are corrected in the second.) Each program is available in C, C++, Fortran 77, and Fortran 90. 34.

Recommended References: Patterson & Hennessy, Computer Organization and Design: The Hardware/Software Interface.

Stars indicate nonzero elements and blank elements are zero. Eliminating the ﬁrst column takes about N 2 ﬂoating-point operations, the second column (N − 1)2. the third column (N − 2)2. and so on. This yields a total of about N 3 /3 ﬂoating-point operations. (One way to see that is to approximate the sum by an integral, and the integral of N 2 is N 3 /3.) Once triangular form is reached, the value of one variable is known and can be substituted in all other equations, and so on. These substitutions require only O(N 2 ) operations. A count of N 3 /3 is less than the.

Recommended References: For generation and testing of random numbers see Knuth, The Art of Computer Programming, Vol. 2. Methods for generating probability distributions are found in Devroye, Non-Uniform Random Variate Generation, which is also available on the web at http://cg.scs.carleton.ca/˜luc/rnbookindex.html.

Figure 3 shows the magnetization as a function of temperature obtained with such a program. Part (a) is for the one-dimensional Ising model and the spins are initialized in random orientations. The scatter of points at low temperatures arises from insuﬃcient equilibration and averaging times. In one dimension the magnetization vanishes for any.

Entertainment: One good example of an online applet that demonstrates the spin ﬂuctuations in the two-dimensional Ising model.

Chapter 12 AGTGGACTTTGACAGA AGTGGACTTAGATTTA TGGATCTTGACAGATT AGTTGACTTACGTGCA ATCGATCTATTCACCG.

There are two ma jor distinct types of PDEs. One type describes the evolution over time, or any other variable, starting from an initial conﬁguration. Physical examples are the propagation of sound waves (wave equation) and the spread of heat in a medium (diﬀusion equation or heat equation). These are “initial value problems.” The other group are static solutions constrained by boundary conditions. Examples are the electric ﬁeld of charges at rest (Poisson equation) and the charge distribution of electrons in an atom (time-independent Schrodinger equation). These ¨ are “boundary value problems.” The same distinction can already be made for ordinary diﬀerential equations. For example, −f (x) = f (x) with f (0) = 1 and f (0) = −1 is an initial value problem, while the same equation with f (0) = 1 and f (1) = −1 is a boundary value problem. 74.

*Size* 2.7Mb *Date* Jan 3, 2007

iv basic linear algebra, or introductory physics. The last two and a half chapters involve multivariable calculus and can be omitted by anyone who does not have this background. Prior knowledge of numerical analysis and a programming language are optional. The book can be roughly divided into two parts. The ﬁrst half deals with small computations and the second mainly with large computations. The reader is exposed to a wide range of approaches, conceptional ideas, and practical issues. Although the book is focused on physicists, all but a few chapters are accessible to and relevant for a much broader audience in the physical sciences. Sections with a ∗ symbol are speciﬁally intended for physicists and chemists. For better readability, references within the text are entirely omitted. Figure and table numbers are preﬁxed with the chapter number, unless the reference occurs in the text of the same chapter. Bits of entertainment, problems, dialogs, and quotes are used for variety of exposition. Problems at the end of several of the chapters do not require paper and pencil, but should stimulate thinking. Numerical results are commonly viewed with suspicion, and often rightly so, but it all depends how well they are done. The following anecdote is appropriate. Five physicists carried out a challenging analytic calculation and obtained ﬁve diﬀerent results. They discussed their work with each other to resolve the discrepancies. Three realized mistakes in their analysis, but the others still ended up with two diﬀerent answers. Soon after, the calculation was done numerically and the result did not agree with any of the ﬁve analytic calculations. The numeric result turned out to be the only correct answer. ¨ Norbert Schorghofer Honolulu, Hawaii August, 2006.

Chapter 2 n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 yn y1 = 1./3. 1 0.333333 0.111111 0.0370371 0.0123458 0.00411542 0.00137213 0.000458031 0.000153983 5.39401E-05 2.32047E-05 1.81843E-05 2.69602E-05 5.07843E-05 0.000100523.

3 Roundoﬀ and Numb er Representation

In a computer every real number is represented by a sequence of bits, most commonly 32 bits (4 bytes). One bit is for the sign, and the distribution of bits for mantissa and exponent can be platform dependent. Almost universally however a 32-bit number will have 8 bits for the exponent and 23 bits for the mantissa, leaving one bit for the sign (as illustrated in ﬁgure 1). In the decimal system this corresponds to a maximum/minimum exponent of ±38 and approximately 7 decimal digits (at least 6 and at most 9). For a 64-bit number (8 bytes) there are 11 bits for the exponent (±308) and 52 bits for the mantissa, which gives around 16 decimal digits of precision (at least 15 and at most 17). |0|01011110|00111000100010110000010| sign exponent mantissa +1.23456E-6 sign mant. exp.

Chapter 3 single double bytes 4 8 bits for mantissa 23 52 bits for exponent 8 11 signiﬁcant decimals 6–9 15–17 maximum ﬁnite 3.4E38 1.8E308 minimum normal 1.2E-38 2.2E-308 minimum subnormal 1.4E-45 4.9E-324.

Recommended References: The “father” of the IEEE 754 standard is William Kahan, who has a description of the standard and other roundoﬀ-related notes online at www.cs.berkeley.edu/˜wkahan. A technical summary is provided by David Goldberg What every computer scientist should know about ﬂoating point arithmetic. It can be found all over the internet, for example at http://docs-pdf.sun.com/800-7895/ 800-7895.pdf.

Stars indicate nonzero elements and blank elements are zero. Eliminating the ﬁrst column takes about N 2 ﬂoating-point operations, the second column (N − 1)2. the third column (N − 2)2. and so on. This yields a total of about N 3 /3 ﬂoating-point operations. (One way to see that is to approximate the sum by an integral, and the integral of N 2 is N 3 /3.) Once triangular form is reached, the value of one variable is known and can be substituted in all other equations, and so on. These substitutions require only O(N 2 ) operations. A count of N 3 /3 is less than the.

Random number generators are not truly random, but use deterministic rules to generate “pseudorandom” numbers, for example xi+1 = (23xi )mod(108 +1), meaning the remainder of 23xi /100000001. The starting value x0 is called the “seed.” Pseudorandom number generators can never ideally satisfy all desired statistical properties. For example, since there are only ﬁnitely many computer representable numbers they will ultimately always be periodic, though the period can be extremely long. Random number generators are said to be responsible for many wrong computational results. Particular choices of the seed can lead to short periods. Likewise, the coeﬃcients in formulas like the one above need to be chosen carefully. Many implementations of pseudorandom number generators were simply badly chosen or faulty. The situation has however improved and current random number generators suﬃce for almost any practical purpose. Source code routines seem to be universally better than built-in random number generators provided by libraries. Pseudorandom number generators produce a uniform distribution of numbers in an interval, typically either integers or real numbers in the interval from 0 to 1 (without perhaps one or both of the endpoints). How do we obtain a diﬀerent distribution? A new probability distribution, p(x), can be related to a given one, q (y ), by a transformation y = y (x). The probability to be between x and x + dx is p(x)dx. By construction, this equals the probability to be between y and y + dy. Hence, |p(x)dx| = |q (y )dy |, where the absolute values are needed because y could 56.

Figure 3 shows the magnetization as a function of temperature obtained with such a program. Part (a) is for the one-dimensional Ising model and the spins are initialized in random orientations. The scatter of points at low temperatures arises from insuﬃcient equilibration and averaging times. In one dimension the magnetization vanishes for any.

Entertainment: One good example of an online applet that demonstrates the spin ﬂuctuations in the two-dimensional Ising model.

Trees, which we have encountered in the heapsort algorithm, are a “data structure.” Arrays are another, simple data structure. A further possibility is to store pointers to data, that is, every data entry includes a reference to where the next entry is stored. Such a storage arrangement is called “list.” Inserting an element in a sequence of data is faster when the data are stored as a list rather than as an array. On the other hand, accessing the last element is faster in an array than in a list. Lists cause cache misses (described in chapter 9), because sequential elements are.

Some basic programming background, be it C/C++, Fortran, matlab, mathematica. (enough to understand the logic of programming, control statements, basic data structures, etc.) would be useful.

This is intended to be a 1-credit class. The primary method of evaluation is class participation.

To make the most of this class, you should have python installed on a laptop that you can bring to the seminar. On Linux machines, you can get python and the needed libraries through your package manager. For Mac and Windows, you might want to consider the free distributions provided by Enthought Canopy or Anaconda. These both install everything you need.

All of the course slides (in LibreOffice flat XML format), scripts, and IPython notebooks are availble on the course github page: https://github.com/sbu-python-class/python-science

*10 Simple Rules for the Care and Feeding of Scientic Data*by Goodman et al.*How to Scale a Code in the Human Dimension*by Matt Turk*Practices in source code sharing in astrophysics*by L. Shamir et al.*Best Practices for Scientific Computing*by G. Wilson et al.*Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research*by V. Stodden and Sheila Miguez*Reliability in the Face of Complexity; The Challenge of High-End Scientific Computing*by G. Ferland*The Nature of Scientific Proof in the Age of Simulations*by K. Heng

- Astronomy resources:
*Astropy: A community Python package for astronomy*. an article describing a community Astronomy package for python- AstroPython

- Atmospheric Sciences resources:
- PyAOS. a list of python resources for Atmospheric Sciences

- Biology resources:
*Python—All a Scientist Needs*. an article describing how python is used in bioinformatics- Biopython. a set of tools for computational biology

- Cognitive Science resources:
- pycogsci :a blog providing information about how python is used in Cognitive Science

- Ocean and marine sciences resources:
- OceanPython.org. a blog for the ocean and marine sciences communities

- Physics resources:
- QuTiP. the Quantum Toolbok in Python

- Social sciences resources:
- NetworkX. a library for exploring the structure and complexity of social networks

- Solar physics resources:
- SunPy. a library providing routines to analyze solar data

- Psychology resources:
- PyschoPy. psychology software to "allow the presentation of stimuli and collection of data for a wide range of neuroscience, psychology and psychophysics experiments."

Note: this information will be updated continuously throughout the semester, so it is best to look at the relevant topics just before the class meeting.

- Readings:
- The official
*SciPy Tutorial* - The
*SciPy cookbook* *SciKits*are additional toolkits for SciPy which provide extra functionality*SciPy Central*user-submitted SciPy snippets*NumPy for Matlab Users**Deterministic Nonperiodic Flow*by E. N. Lorenz—this is the system we integrated when discussing ODEs*A simple example of an ill-conditioned matrix*by G. J. Tee

- The official
- Lecture slides: scipy.pdf
- Lecture IPython notebooks: scipy-basics.ipynb
- Other examples:
- Gaussian elimination with pivoting: gauss.py (main module), gauss-test.py (test routine) matmul.py (auxillary routine)

- Readings:
*Interfacing with C*by Valentin Haenel as part of his SciPy lecture notes. This is a very nice comparison of different methods*Speeding up Python (NumPy, Cython, and Weave)*by T. Oliphant- C-API:
*Extending Python with C or C++*. this is the "hard" way to do things.

- ctypes:
*Ctypes Cookbook*. ctypes makes it easy to call existing C code.

- f2py:
*f2py Users Guide**F2PY: a tool for connecting Fortran and Python programs*

- Cython:
*Cython, C-Extensions for Python*the official project page*Cython: The Best of Both Worlds*by S. Behnet et al. (alternate links: [here] )

- Lecture slides: extensions.pdf
- Example codes:
- C-API: test-C-API.py numpy-ex.c setup.py
- ctypes: test-ctypes.py cfunc_multid.c Makefile
- f2py: test_f2py.py numpy_in_f.f90 Makefile
- Cython: test_cy.py square.pyx setup.py
- Timing comparison for Laplace smoothing (this extends the comparison from the blog entry by T. Oliphant listed above): Makefile laplace_CAPI.c laplace_C.c laplace_cython.pyx laplace_fortran.f90 laplace.py setup.py
- Calling an external command and capturing both stdout and stderr: githash.py

The book serves as a first introduction to computer programming of scientific applications, using the high-level Python language. The exposition is example and problem-oriented, where the applications are taken from mathematics, numerical calculus, statistics, physics, biology and finance. The book teaches “Matlab-style” and procedural programming as well as object-oriented programming. High school mathematics is a required background and it is advantageous to study classical and numerical one-variable calculus in parallel with reading this book. Besides learning how to program computers, the reader will also learn how to solve mathematical problems, arising in various branches of science and engineering, with the aid of numerical methods and programming. By blending programming, mathematics and scientific applications, the book lays a solid foundation for practicing computational science.

*“We are pleased to bring the AIMS model to Tanzania. We bring together top global scholars in math and science to teach and research with Africa’s brightest students”. Our graduates then use these skills to tackle African development issues ranging from disease prevention to environmental degradation, education and poverty.*

AIMS graduates have a broad-based training and are talented problem solvers and innovators, which is just what this continent needs.”

Course taught by: Dr. Emile Chimusa (emile(at)aims.ac.tz) from University of Cape Town (South Africa).

*SPECIFIC OUTCOMES ADDRESSED*

- Generally speaking: develop numerical/scientific computing and problem-solving skills and approaches through writing computers scripts.

- Understand the three types of control structures (sequence, repetition and selection), as building blocks for all scripts.

- Manipulate basic objects and data structures.

- Understand the concepts of variable assignment, different data types, the memory allocation model, functions and function calls, with the mechanics of argument passing.

- Appreciate the importance of writing programs with I/O capabilities.

- Introduction to object-oriented programming.

- Effectively write computer programs. The question of the target programming language to be chosen here seems to be resolved into a growing consensus around Python.

*Introduction to algorithms*. 3rd edition, by Thomas Cormen, Charles Leiserson and colleagues (MIT Press, 200 9).

Python: Built-in Data Types.

Sage: introduction to computational Mathematics.

String Manipulation and if Statements

Derived Data Types (Lists, tuples, sets, and dictionaries) and more Control statements.

Writing Functions, File Input/output and Exception Handling.

Modules and more about graphics.

Scipy-Numpy Arrays and Introduction to Python Classes and Objects.

Throughout the course, use an interactive Python shell to demonstrate concepts, plus a simple text editor later on, once the students start writing functions.

This section “practical component” follows the same structure as the previous section “Theory lectures”: practicals just aim at having the students manipulate the concepts seen in the lectures, right after they were introduced to them.

*BACKGROUND KNOWLEDGE REQUIRED*

Basic general-purpose scientific knowledge, linear algebra and basic arithmetic/calculus skills, and some familiarity with computers.

Notes

- Homeworks will be assigned every week. Homework problems will consist of a mix of general problems, programming assignments, problems related to the class project.

- Grading

*Final Projects: 50%**Weekly Home works: 50%*

- Collaboration Policy:

Students may discuss the homework problems with other students or use other resources such as textbooks or the Internet. However, Students must not obtain answers directly from anyone else. All home works will be submitted individually.

- Final Project: working in Groups

*Identify how different variables work together to create the dynamics of the*

system.

*Reduce the dimensionality of the data.**Decrease redundancy in the data.**Filter some of the noise in the data.**Compress the data.*

T he dynamic relationship between predators and their prey has long been and will continue to be one of the dominant themes in both ecology and mathematical ecology due to its universal existence and importance. The dynamic relationship between predators and their prey has long been and will continue to be one of the dominant themes in both ecology and mathematical ecology due to its universal existence and importance. The aim of this project is to compare the computational approaches of Stability Analysis of a Predator-Prey Model using python and sage.

Finding patterns of social interaction within a population has wide-ranging applications including: disease modelling, cultural and information transmission, and behavioural ecology. Social interactions are often modelled with networks. A key characteristic of social interactions is their continual change. However, most past analyses of social networks are essentially static in that all information about the time that social interactions take place is discarded. The aim of this short project is to use Networkx package in python and some existence algorithm to illustrate the mathematical and computational framework that enables analysis of dynamic social networks and that explicitly makes use of information about when social interactions occur.

Fractals is a new branch of mathematics and art. Perhaps this is the reason why most people recognize fractals only as pretty pictures useful as backgrounds on the computer screen or original postcard patterns. But what are they really? Most physical systems of nature and many human artefacts are not regular geometric shapes of the standard geometry derived from Euclid. Fractal geometry offers almost unlimited waysof describing, measuring and predicting these natural phenomena. But is it possible to define the whole world using mathematical equations? Fractal geometry has permeated many area of science, such as astrophysics, biological sciences, and has become one of the most important techniques in computer graphics. This project aims at discussion the mathematical and computational of most famous fractals. The discussion may brings the computational aspects using turtle or other python packages, how those fractals were created and explains the most important fractal properties, which make fractals useful for different domain of science.

Tuberculosis (TB) remains a source of morbidity and mortality worldwide, particularly in developing countries. One-third of the world’s individuals are infected with TB, but only 10% go on to develop active TB during their lifetime. In addition, twin studies in humans and animal models also demonstrate a strong genetic influence on TB susceptibility. This suggest that genetic factors may play an important role in TB susceptibility in determining both the host response and the outcome of infection. The second highest incidence of TB in the world is in the Western, Eastern and Northern Cape in South Africa, particularly in the mixed South African Coloured population. This project aims at looking at ancestry-specific TB risk using the genetic data of the mixed South African Coloured population. It also aims at evaluating the genetic ancestry of samples of TB cases and controls from this population. Importantly, it will examine whether the genetic contribution can increase tuberculosis prevalence.

The forward-backward algorithm has very important applications to both hidden Markov models (HMMs) and conditional random fields (CRFs). It is a dynamic programming algorithm, and is closely related to the Viterbi algorithm for decoding with HMMs or CRFs. This project aims at describing the algorithm at a level of abstraction that applies to both HMMs. It will also describe its specific application and its computational aspect using python.

An ever-increasing number of scientific studies are generating larger, more complex, and multi-modal datasets. This results in data analysis tasks becoming more demanding. To help tackle these new challenges, more disciplines now need to incorporate advanced visualization techniques into their standard data processing and analysis methods. While many systems have been developed to allow scientists to explore, analyse, and visualize their data, many of these solutions are domain specific, limiting their scope as general processing tools. This project aims at discussing a development environment suitable to both computational and visualization tasks. It will describe basic mathematics and computational signal processing and visualization using python and applications neuroscience.

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It is one of several statistical tools available for reducing the dimensionality of a data set. The major goal of principal components analysis is to reveal hidden structure in a data set. This project will discuss the mathematical and computational aspect of PCA including

Dependent data arise in many studies. Frequently adopted sampling designs, such as cluster, multilevel, spatial, and repeated measures, may induce this dependence, which the analysis of the data needs to take into due account. This project involves the exploration of the generalization of mixed models, its applications and computational aspect using python. In addition the project will discuss the computational of parameters in mixed models using Monte Carlo Expectation Maximization algorithm.

Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. This project will involve the discussion of graph theory graph theory universe to model and visualize Human Protein-Protein Interactions and will discuss ways in which they can be used to reveal hidden properties and features of a network using networkx in Python and R.

NetworkX is a *Python* package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. An update (1.0 RC1) has been released few days ago and you can grab the package HERE .

I took advantage of this update to test NetworkX with GeeXLab. I created a simple graph (path_graph ) and I played with Fruchterman-Reingold algorithm to position graph’s nodes.

*Fruchterman-Reingold* is an algorithm that attempts to produce aesthetically pleasing, two-dimensional pictures of graphs by doing simplified simulations of physical systems. A more detailed definition can be found HERE .

The Fruchterman-Reingold Algorithm is a force-directed layout algorithm. The idea of a force directed layout algorithm is to consider a force between any two nodes. In this algorithm, the nodes are represented by steel rings and the edges are springs between them. The attractive force is analogous to the spring force and the repulsive force is analogous to the electrical force. The basic idea is to minimize the energy of the system by moving the nodes and changing the forces between them. For more details refer to the Force Directed algorithm.

I’m absolutly not an expert in graph, I’m rather even a newbie, but it’s cool to see how we can play with a scientific library in GeeXLab. Thanks a lot Python! Just for the fun, it would be nice to see such a NetworkX graph rendered with cool lighting and other eye catching post processing effects. I’ll do it shortly, as soon as the alpha version of GeeXLab will be out…

NetworkX is based on NumPy (Numerical Python). NumPy is the fundamental package needed for scientific computing with Python. You can get the last version (1.3.0) HERE .

This worked example fetches a data file from a web site, applies that file as input data for a differential equation modeling a vibrating system, solves the equation, and visualizes various properties of the solution and the input data. The following programming topics are illustrated: downloading files from a web site, working with `numpy` arrays, flexible storage of objects in lists, easy storage of objects in files (persistence), signal processing and FFT, and curve plotting of data.

The task is to make a simulation program that can predict how a (simple) mechanical system oscillates in response to environmental forces. Introducing as some displacement of the system at time , application of Newton’s second law of motion to the system often results in the following type of equation for :

Vehicle on a bumpy road

Another example regards the vertical shaking of a building due to earthquake-induced movement of the ground. If the vertical displacement of the ground is recorded as a function , this results in a vertical force . The soil foundation acts as a spring and damper on the building, modeled through the damping parameter and normally a linear spring term .

In both cases we drop the effect of gravity, which is just a constant compression of the spring.

The implementation of the computational algorithm can make use of an array `u` to represent as `u[n]`. The force is assumed to be available as an array element `F[n]`. The following Python function computes `u` given an array `t` with time points , the initial displacement `I`. mass `m`. damping parameter `b`. restoring force `f(u)`. environmental forces `F` as an array (corresponding to `t` ).

*Dissection of the Code.* Functions in Python start with `def`. followed by the function name and the list of input objects separated by comma. The function body is indented, and the first non-indented line signifies the end of the function body block. The string, enclosed in triple double-quotes, right after the function definition, is a *doc string* used for documenting the function. Various tools can extract function definitions and doc strings to automatically produced manuals.

Array functionality is offered by the `numpy` packaged, here imported under the nickname `np`. This package contains MATLAB-like functionality. It is quite common to prefix a MATLAB-like function such as `zeros` by `np` ( `np.zeros` ), but one can also perform

and then write just `zeros` without any prefix. The advantage is that the code closely resembles similar MATLAB code.

The total number of elements in an array `t` is obtained by `t.size`. One could also use `len(t)`. but for multi-dimensional arrays `len` just gives the number of elements corresponding to the first index (number of rows in a matrix).

Arrays are indexed by square brackets, and indices always start at 0. For/do loops in Python are more general than those in Fortran, C, C++, and Java, as one can loop over the any set of objects with the syntax `for element in some_set`. In numerical code, it is common to loop over array indices, i.e. a set of integers. Such a set is produced by `range(start, stop, increment)`. which returns a list of integers `start, start+increment, start+2*increment`. and so on, up to *but not including*`stop`. Writing just `range(stop)` means `range(0, stop, 1)` .

Every variable in Python is an object. In particular, the `f` function above is a function object, transferred to the function as any other object, and called as any other function.

Considering the application where the present mathematical model describes the vibrations of a vehicle driving along a bumpy road, we need to establish the force array `F` from the shape of the road . Various shapes are available as a file with web address http://folk.uio.no/hpl/scripting/bumpy.dat.gz. The Python functionality for downloading this `gzip` compressed file as a local file `bumpy.dat.gz` and reading it into a `numpy` array goes as follows:

In general, `a[s:t:i,2]` gives a view (not a copy) to the part of the array `a` where the first index goes from `s` to `t`. *but not including the ``t`` value*. in increments of `i`. and the second index is fixed at 2. Just writing `:` for an index means all possible index values.

Here, `u` is a *local variable*. which lives just inside in the function, while `k` is a *global variable*. which must be initialized outside the function prior to calling `f` with any `u` argument.

Parameters can be set as

This choice corresponds to a velocity of 36 km/h and a mass of 60 kg, i.e. bicycle conditions.

For each shape we want to compute the corresponding vertical displacement using the mathematical model (1). This can be accomplished by looping over the columns of `h_data` and calling `forced_vibrations` for each column, i.e. each realization of the force . The major arrays from the computations are collected in a list `data`. `x`. `t`. and for each road shape, a 3-list `[h, a, u]` .

The code above is naturally implemented as a Python function:

Since the roads have a quite noise shape, the force looks very noise and the response to this excitation is quite noisy, see Figure *First realization of a bumpy road, with corresponding acceleration of the wheel and resulting vibrations* for an example. It may be useful to compute the root mean square value of the various realizations of and add this array to the `data` list of input and output data in the problem: